Does UniProtKB contain all protein sequences?
Last modified January 15, 2010
The two sections of UniProtKB - UniProtKB/Swiss-Prot and UniProtKB/TrEMBL - give access to all the protein sequences which are available to the public. However, UniProtKB excludes the following protein sequences:
- Most non-germline immunoglobulins and T-cell receptors
- Synthetic sequences
- Most patent application sequences
- Small fragments encoded from nucleotide sequence (<8 amino acids)
- Pseudogenes
- Fusion/truncated proteins
- Not real proteins
The first 5 are identified automatically by the UniProtKB/TrEMBL creation program and never enter UniProtKB. However some proteins belonging to these classes are also identified during the UniProtKB/Swiss-Prot annotation process by the curators and then removed from UniProtKB.
Fusion/truncated proteins and those classified as not real proteins are only manually identified by the curators and removed from UniProtKB/TrEMBL or UniProtKB/Swiss-Prot. All these excluded sequences are available at UniParc. The corresponding UniParc entries have been flagged with the reason for the absence of that sequence from UniProtKB.
See also:
