UniProt release 12.0
Published July 24, 2007
UniProtKB/Swiss-Prot major release (54.0)
Release 54.0 of 24-Jul-07 of UniProtKB/Swiss-Prot contains 276'256 sequence entries, comprising 101'466'206 amino acids abstracted from 158'294 references. 7'104 sequences have been added since release 53.0: this represents a 3% increase. In addition, the sequence data of 690 existing entries have been updated and the annotations of 269'152 entries have been revised.
The following improvements were carried out in the last 2 months:
- We have introduced a new topic to indicate the evidences for the existence of a given protein. 5 levels of evidence have been defined: 1. evidence at protein level (e.g. clear identification by mass spectrometry), 2. evidence at transcript level (e.g. Northern blot), 3. inferred by homology (strong sequence similarity to known proteins in related species), 4. predicted and 5. uncertain (e.g. dubious sequences that could be the erroneous translation of a pseudogene).
- We have added cross-references to 3 new databases: DisProt, PeptideAtlas, and PharmGKB.
- We have enriched the controlled vocabulary for post-translational modification description with 14 new terms: 8 for the feature key 'Cross-link' (mostly for the description of cyclopeptides), 1 for the feature key 'Lipidation' and 5 for the feature key 'Modified residue'.
- We have added 13 new keywords, 6 have been modified and 1 deleted.
UniProt Knowledgebase release 12.0 includes Swiss-Prot release 54.0 and TrEMBL release 37.0.
Introduction of a new topic to document protein existence
Most protein sequences are derived from translations of gene predictions. Some of them exhibit strong sequence similarity to known proteins in closely related species. For other proteins there is experimental evidence, such as Edman sequencing, clear identification by mass spectrometry (MSI), X-ray or NMR structure, detection by antibodies, etc. To indicate these different levels of evidence for the existence of a protein, we have introduced a new topic in each entry. The 'protein existence' topic does not describe the accuracy or correctness of a sequence displayed in UniProtKB, but the evidence for the existence of a protein. It may happen that the protein sequence is not entirely accurate, especially for sequences derived from gene predictions from genomic sequences.
In the flat file, this information is provided in the new PE line. The format is:
PE Level: Evidence;With the following values:
- 1: Evidence at protein level
- 2: Evidence at transcript level
- 3: Inferred from homology
- 4: Predicted
- 5: Uncertain
PE 1: Evidence at protein level;
The document 'Criteria for protein existence' lists the criteria used to assign a PE level to entries.
The PE line is found between the DR and KW lines of UniProtKB entries.
Modification of the citation of direct submissions to UniProtKB
In the flat file, the format of the RL (Reference Location) line for submissions used to be:
RL Submitted (MMM-YYYY) to DatabaseName.
- MMM is the month
- YYYY is the year
- DatabaseName is one of:
- the EMBL/GenBank/DDBJ databases
- the PDB data bank
- the PIR data bank
We have replaced the DatabaseName value Swiss-Prot by UniProtKB.
Change in UniProtKB release frequency
UniProtKB release frequency used to be every other week. From now on, UniProtKB will be updated every 3 weeks.
Changes concerning keywords (KW line)
- Bacterial capsule -> Capsule
- Coated pits -> Coated pit
- Hypothetical protein