Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Why have some UniProtKB accession numbers been deleted? How can I track them?

Last modified September 21, 2015

An accession number (AC) is assigned to each protein sequence upon inclusion into UniProtKB. Accession numbers are stable from release to release (What is the difference between an accession number (AC) and the entry name?). It can however happen that a protein sequence (and its corresponding accession number) is deleted from UniProtKB.

Most UniProtKB/TrEMBL deletions are due to the deletion of the corresponding coding sequence (CDS) in the source nucleotide sequence databases EMBL-Bank/DDBJ/GenBank as requested by the original submitters. It occasionally happens that the same data is resubmitted at a later date, and UniProt works closely with EMBL-Bank/DDBJ/GenBank to ensure appropriate tracking of deletions and updates. However this is not always possible. In addition, some protein sequences are recognized by curators to be Open Reading frames (ORFs) that have been wrongly predicted to code for proteins or to be pseudogenes. When there is enough evidence that these hypothetical proteins are not real, we take the decision to remove them from UniProtKB/TrEMBL.

Another frequent deletion reason in UniProtKB/TrEMBL is proteome redundancy reduction.

Deleted entries in UniProtKB/Swiss-Prot are mostly Open Reading Frames (ORFs) or pseudogenes that have been wrongly predicted to code for proteins.

Deleted protein sequences can be found in UniParc. Example: O00597 can be found in UniParc (with the tag ‘Active=No’).

The history of a deleted entry can be tracked (example: O00597), and previous entry and sequence versions displayed.

Two documents list the deleted accession numbers:

See also: