Characterization of cDNA clones selected by the GeneMark analysis from size-fractionated cDNA libraries from human brain.
We have conducted a sequencing project of human cDNAs which encode large proteins in brain. For selection of cDNA clones to be sequenced in this project, cDNA clones have been experimentally examined by in vitro transcription/translation prior to sequencing. In this study, we tested an alternative approach for picking up cDNA clones having a high probability of carrying protein coding region. This approach exploited 5'-end single-pass sequence data and the GeneMark program for assessing protein-coding potential, and allowed us to select 74 clones out of 14,804 redundant cDNA clones. The complete sequence data of these 74 clones revealed that 45% of them encoded proteins consisting of more than 500 amino acid residues while all the clones thus selected carried possible protein coding sequences as expected. The results indicated that the GeneMark analysis of 5'-end sequences of cDNAs offered us a simple and effective means to select cDNA clones with protein-coding potential although the sizes of the encoded proteins could not be predicted.