UniProt release 8.0
Published May 30, 2006
UniProtKB/Swiss-Prot major release (50.0)
Release 50.0 of 30-May-2006 of UniProtKB/Swiss-Prot contains 222'289 sequence entries, comprising 81'585'146 amino acids abstracted from 142'438 references.
15'220 sequences have been added since release 49.0, the sequence data of 953 existing entries has been updated and the annotations of 190'604 entries have been revised. This represents an increase of 8%.
Many improvements were carried out in the last 3 months:
- In order to improve the consistency of annotation of pre- and co-translational events, we have modified the syntax of the comment line topic ALTERNATIVE PRODUCTS, and the feature key VARSPLIC was replaced by VAR_SEQ.
- We have also replaced the CC line topic DATABASE by WEB RESOURCE to clarify the conceptual difference between the content of these lines and the DR (Database cross-Reference) lines.
- For viral UniProtKB entries, a new line type, the OH line, was introduced to indicate the host(s) either as a specific organism or taxonomic group of organisms.
- Cross-references to several databases have been added.
Replacement of the feature key VARSPLIC by VAR_SEQ
Pre-translational events have so far been represented by several feature keys, e.g. alternative splicing and promoter usage were annotated with the VARSPLIC feature key, alternative initiation with the INIT_MET feature key and RNA editing with the VARIANT feature key. In order to improve the consistency of annotation of pre- and co-translational events, we have removed the feature key VARSPLIC and introduced the new feature key VAR_SEQ for the description of alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. The INIT_MET feature key remains, but its usage is now restricted to the annotation of initiator methionine cleavage. We will continue to use the VARIANT feature key and the comment line topic RNA EDITING to describe RNA editing.
Syntax modification of the comment line (CC) topic ALTERNATIVE PRODUCTS
In order to improve the consistency of annotation of pre- and co-translational events, we have modified the syntax of the comment line topic ALTERNATIVE PRODUCTS. This modification allows programs to reconstruct alternative sequences according to the corresponding feature identifiers not only for alternative splicing events, as with the old syntax, but also for alternative promoter usage and alternative initiation events.
The new format of ALTERNATIVE PRODUCTS is:
CC -!- ALTERNATIVE PRODUCTS: CC Event=Event(, Event)*; Named isoforms=Number_of_isoforms; (CC Comment=Free_text;)? (CC Name=Isoform_name;( Synonyms=Synonym(, Synonym)*;)? CC IsoId=Isoform_identifier(, Isoform_identifer)*; CC Sequence=(Displayed|External|Not described|Feature_identifier(, Feature_identifier)*); (CC Note=Free_text;)?)+
Note: Variable values are represented in italics. Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?), may occur 0 or more times (*), or 1 or more times (+). Alternative values are separated by a pipe symbol (|).
The "Event" item lists one or a combination of the following values:
- Alternative promoter usage
- Alternative splicing
- Alternative initiation
- Ribosomal frameshifting
The "Note" item may specify the event(s), if there are several.
CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing, Alternative initiation; Named isoforms=3; CC Comment=Isoform 1 and isoform 2 arise due to the use of two CC alternative first exons joined to a common exon 2 at the same CC acceptor site but in different reading frames, resulting in two CC completely different isoforms; CC Name=1; Synonyms=p16INK4a; CC IsoId=O77617-1; Sequence=Displayed; CC Name=3; CC IsoId=O77617-2; Sequence=VSP_004099; CC Note=Produced by alternative initiation at Met-35 of isoform 1; CC Name=2; Synonyms=p19ARF; CC IsoId=O77618-1; Sequence=External; .. FT VAR_SEQ 1 34 Missing (in isoform 3). FT /FTId=VSP_004099.
Replacement of the comment line (CC) topic DATABASE by WEB RESOURCE
We have replaced the CC line topic DATABASE by WEB RESOURCE to clarify the conceptual difference between the content of these lines and the DR (Database cross-Reference) lines. At the same time we have simplified the format by suppressing the 'FTP=' field, which is no longer in use.
The format of the DATABASE topic was:
CC -!- DATABASE: NAME=ResourceName[; NOTE=FreeText][; WWW=WWWAddress][; FTP=FTPAddress].
The format of the WEB RESOURCE topic is:
CC -!- WEB RESOURCE: NAME=ResourceName[; NOTE=FreeText]; URL=WWWAddress.
The length of these lines may exceed 75 characters because long URL addresses are not wrapped into multiple lines.
Introduction of the new line type OH (Organism Host) for viral hosts
A virus is a living organism only if we consider it associated with its host. The viral taxonomy is arbitrarily based on the nature of viral genomes, and viruses of the same family can infect a wide range of hosts. There are numerous virus-host interactions, which we intend to annotate. We have therefore introduced to viral UniProtKB entries a new line type, the OH line, to indicate the host(s) either as a specific organism or taxonomic group of organisms.
The format of the OH line is:
OH NCBI_TaxID=TaxID; HostName.
The HostName consists of the official name and, optionally, a common name and/or synonym. The length of an OH line may exceed 75 characters.
OS Tomato black ring virus (strain E) (TBRV). OC Viruses; ssRNA positive-strand viruses, no DNA stage; Comoviridae; OC Nepovirus; Subgroup B. OX NCBI_TaxID=12277; OH NCBI_TaxID=4681; Allium porrum (Leek). OH NCBI_TaxID=4045; Apium graveolens (Celery). OH NCBI_TaxID=161934; Beta vulgaris (Sugar beet). OH NCBI_TaxID=38871; Fraxinus (ash trees). OH NCBI_TaxID=4236; Lactuca sativa (Garden lettuce). OH NCBI_TaxID=4081; Lycopersicon esculentum (Tomato). OH NCBI_TaxID=39639; Narcissus pseudonarcissus (Daffodil). OH NCBI_TaxID=3885; Phaseolus vulgaris (Kidney bean) (French bean). OH NCBI_TaxID=35938; Robinia pseudoacacia (Black locust). OH NCBI_TaxID=23216; Rubus (bramble). OH NCBI_TaxID=4113; Solanum tuberosum (Potato). OH NCBI_TaxID=13305; Tulipa. OH NCBI_TaxID=3603; Vitis.