Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 12.0

Published July 24, 2007


UniProtKB/Swiss-Prot major release (54.0)

Release 54.0 of 24-Jul-07 of UniProtKB/Swiss-Prot contains 276'256 sequence entries, comprising 101'466'206 amino acids abstracted from 158'294 references. 7'104 sequences have been added since release 53.0: this represents a 3% increase. In addition, the sequence data of 690 existing entries have been updated and the annotations of 269'152 entries have been revised.

The following improvements were carried out in the last 2 months:

  • We have introduced a new topic to indicate the evidences for the existence of a given protein. 5 levels of evidence have been defined: 1. evidence at protein level (e.g. clear identification by mass spectrometry), 2. evidence at transcript level (e.g. Northern blot), 3. inferred by homology (strong sequence similarity to known proteins in related species), 4. predicted and 5. uncertain (e.g. dubious sequences that could be the erroneous translation of a pseudogene).
  • We have added cross-references to 3 new databases: DisProt, PeptideAtlas, and PharmGKB.
  • We have enriched the controlled vocabulary for post-translational modification description with 14 new terms: 8 for the feature key 'Cross-link' (mostly for the description of cyclopeptides), 1 for the feature key 'Lipidation' and 5 for the feature key 'Modified residue'.
  • We have added 13 new keywords, 6 have been modified and 1 deleted.

UniProt Knowledgebase release 12.0 includes Swiss-Prot release 54.0 and TrEMBL release 37.0.

UniProtKB News

Introduction of a new topic to document protein existence

Most protein sequences are derived from translations of gene predictions. Some of them exhibit strong sequence similarity to known proteins in closely related species. For other proteins there is experimental evidence, such as Edman sequencing, clear identification by mass spectrometry (MSI), X-ray or NMR structure, detection by antibodies, etc. To indicate these different levels of evidence for the existence of a protein, we have introduced a new topic in each entry. The 'protein existence' topic does not describe the accuracy or correctness of a sequence displayed in UniProtKB, but the evidence for the existence of a protein. It may happen that the protein sequence is not entirely accurate, especially for sequences derived from gene predictions from genomic sequences.

In the flat file, this information is provided in the new PE line. The format is:

     PE   Level: Evidence;
With the following values:
  • 1: Evidence at protein level
  • 2: Evidence at transcript level
  • 3: Inferred from homology
  • 4: Predicted
  • 5: Uncertain


     PE   1: Evidence at protein level;

The document 'Criteria for protein existence' lists the criteria used to assign a PE level to entries.

The PE line is found between the DR and KW lines of UniProtKB entries.

Modification of the citation of direct submissions to UniProtKB

In the flat file, the format of the RL (Reference Location) line for submissions used to be:

     RL   Submitted (MMM-YYYY) to DatabaseName.


  • MMM is the month
  • YYYY is the year
  • DatabaseName is one of:
    • the EMBL/GenBank/DDBJ databases
    • Swiss-Prot
    • the PDB data bank
    • the PIR data bank

We have replaced the DatabaseName value Swiss-Prot by UniProtKB.

Change in UniProtKB release frequency

UniProtKB release frequency used to be every other week. From now on, UniProtKB will be updated every 3 weeks.

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Deleted keyword:

  • Hypothetical protein