Complete nucleotide sequence of SV40 DNA.
The determination of the total 5,224 base-pair DNA sequence of the virus SV40 has enabled us to locate precisely the known genes on the genome. At least 15.2% of the genome is presumably not translated into polypeptides. Particular points of interest revealed by the complete sequence are the initiation of the early t and T antigens at the same position and the fact that the T antigen is coded by two non-contiguous regions of the genome; the T antigen mRNA is spliced in the coding region. In the late region the gene for the major protein VP1 overlaps those for proteins VP2 and VP3 over 122 nucleotides but is read in a different frame. The almost complete amino acid sequences of the two early proteins as well as those of the late proteins have been deduced from the nucleotide sequence. The mRNAs for the latter three proteins are presumably spliced out of a common primary RNA transcript. The use of degenerate codons is decidedly non-random, but is similar for the early and late regions. Codons of the type NUC, NCG and CGN are absent or very rare.