Why have some UniProtKB accession numbers been deleted? How can I track them?
Last modified October 23, 2007
An accession number (AC) is assigned to each protein sequence upon inclusion into UniProtKB. Accession numbers are stable from release to release (What is the difference between an accession number (AC) and the entry name?). It can however happen that a protein sequence (and its corresponding accession number) is deleted from UniProtKB.
Most UniProtKB/TrEMBL deletions are due to the deletion of the corresponding coding sequence (CDS) in the source nucleotide sequence databases EMBL/DDBJ/GenBank. In addition, some protein sequences are recognized by curators to be Open Reading frames (ORFs) that have been wrongly predicted to code for proteins or to be pseudogenes. When there is enough evidence that these hypothetical proteins are not real, we take the decision to remove them from UniProtKB/TrEMBL.
Deleted entries in UniProtKB/Swiss-Prot are almost exclusively Open Reading Frames (ORFs) or pseudogenes that have been wrongly predicted to code for proteins.
Deleted protein sequences can be found in UniParc. Example: O00597 can be found in UniParc (with the tag 'Active=No').
The history of a deleted entry can be tracked (example: O00597), and previous entry and sequence versions displayed.
Two documents list the deleted accession numbers:
- delac_sp.txt for deleted ACs in UniProtKB/Swiss-Prot
- delac_tr.txt for deleted ACs in UniProtKB/TrEMBL
See also:



