Why have some UniProtKB accession numbers been deleted? How can I track them?
Last modified March 17, 2010
An accession number (AC) is assigned to each protein sequence upon inclusion into UniProtKB. Accession numbers are stable from release to release (What is the difference between an accession number (AC) and the entry name?). It can however happen that a protein sequence (and its corresponding accession number) is deleted from UniProtKB.
Most UniProtKB/TrEMBL deletions are due to the deletion of the corresponding coding sequence (CDS) in the source nucleotide sequence databases EMBL-Bank/DDBJ/GenBank as requested by the original submitters. It occasionally happens that the same data is resubmitted at a later date, and UniProt works closely with EMBL-Bank/DDBJ/GenBank to ensure appropriate tracking of deletions and updates. However this is not always possible. In addition, some protein sequences are recognized by curators to be Open Reading frames (ORFs) that have been wrongly predicted to code for proteins or to be pseudogenes. When there is enough evidence that these hypothetical proteins are not real, we take the decision to remove them from UniProtKB/TrEMBL.
Deleted entries in UniProtKB/Swiss-Prot are mostly Open Reading Frames (ORFs) or pseudogenes that have been wrongly predicted to code for proteins.
The history of a deleted entry can be tracked (example: O00597), and previous entry and sequence versions displayed.
Two documents list the deleted accession numbers:
- delac_sp.txt for deleted ACs in UniProtKB/Swiss-Prot
- delac_tr.txt for deleted ACs in UniProtKB/TrEMBL