The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.
The UniProt Knowledgebase consists of two sections: a section containing manually-annotated records with information extracted from literature and curator-evaluated computational analysis, and a section with computationally analyzed records that await full manual annotation. For the sake of continuity and name recognition, the two sections are referred to as "UniProtKB/Swiss-Prot" (reviewed, manually annotated) and "UniProtKB/TrEMBL" (unreviewed, automatically annotated), respectively.
Where do the protein sequences come from?
About 98 % of the protein sequences provided by UniProtKB are derived from the translation of the coding sequences (CDS) which have been submitted to the public nucleic acid databases, the EMBL-Bank/GenBank/DDBJ databases (INSDC). All these sequences, as well as the related data submitted by the authors, are automatically integrated into UniProtKB/TrEMBL.
What are the differences between UniProtKB/Swiss-Prot and UniProtKB/TrEMBL?
UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database, which brings together experimental results, computed features and scientific conclusions.
How redundant are sequences in UniProtKB?
In order to have minimal redundancy and to improve sequence reliability, all protein sequences encoded by a same gene are merged into a single UniProtKB/Swiss-Prot entry. Differences found between various sequencing reports are analysed and fully described in the feature table (alternative splicing events, polymorphisms or conflicts for example). Once in UniProtKB/Swiss-Prot, a protein entry is removed from UniProtKB/TrEMBL.
What is manual annotation?
Manual annotation consists of a critical review of experimentally proven or computer-predicted data about each protein, including the protein sequences. Data are continuously updated by an expert team of biologists.
How are entry versions archived?
All changed UniProtKB entries are loaded into the UniSave Sequence/Annotation
Version Archive as part of the public four weekly UniProtKB releases.
Unlike UniProtKB, which contains only the latest Swiss-Prot and TrEMBL entry versions,
UniSave provides access to previous versions of these entries.
Archived versions of a UniProtKB entry are accessible through the History link located at the top of the entry view.