Skip Header

 

Does UniProtKB contain all protein sequences?

Last modified July 24, 2007

The two sections of UniProtKB - UniProtKB/Swiss-Prot and UniProtKB/TrEMBL - give access to all the protein sequences which are available to the public. However, UniProtKB excludes the following protein sequences:

  1. Most non-germline immunoglobulins and T-cell receptors
  2. Synthetic sequences
  3. Most patent application sequences
  4. Small fragments encoded from nucleotide sequence (<8 amino acids)
  5. Pseudogenes
  6. Fusion/truncated proteins
  7. Not real proteins

The first 5 are identified automatically by the UniProtKB/TrEMBL creation program and never enter UniProtKB. However some proteins belonging to these classes are also identified during the UniProtKB/Swiss-Prot annotation process by the curators and then removed from UniProtKB.

Fusion/truncated proteins and those classified as not real proteins are only manually identified by the curators and removed from UniProtKB/TrEMBL or UniProtKB/Swiss-Prot. All these excluded sequences are available at UniParc. The corresponding UniParc entries have been flagged with the reason for the absence of that sequence from UniProtKB.

see also: