Chordata protein annotation project
The Chordata protein annotation project focuses on the manual annotation of chordata-specific proteins as well as those that are widely conserved. The aim of this project is twofold: 1. to keep the existing human entries up-to-date and 2. to broaden the manual annotation to other vertebrate species, especially model organisms, including great apes, cow, mouse, rat, chicken, zebrafish, as well as Xenopus laevis and Xenopus tropicalis.
Update of the human proteome
A draft of the complete human proteome is available in UniProtKB/Swiss-Prot since 2008 and one of the current priorities of the Chordata protein annotation project is to improve the quality of human sequences provided.
To this aim, we are updating sequences which show discrepancies with those predicted from the genome sequence. Dubious isoforms, sequences based on experimental artefacts and protein products derived from erroneous gene model predictions are also revisited. This work is in part done in collaboration with the Hinxton Sequence Forum (HSF), which allows active exchange between UniProt, HAVANA, Ensembl and HGNC groups, as well as with RefSeq database. UniProt is a member of the Consensus CDS project and we are in the process of reviewing our records to support convergence towards a standard set of protein annotation.
We also continuously update human entries with functional annotation, including novel structural, post-translational modification, interaction and enzymatic activity data. In order to identify candidates for re-annotation, we use, among others, information extraction tools such as the STRING database. In addition, we regularly add new sequence variants and maintain disease information. Indeed, this annotation project includes the Variation Annotation project, the goal of which is to annotate all known human genetic diseases and disease-linked protein variants, as well as neutral polymorphisms.
Annotation of other mammalian and chordata proteins
In addition to the review of the human proteome, other mammalian and non-mammalian chordata proteins are increasingly being manually annotated with special emphasis on species such as Xenopus laevis, Xenopus tropicalis and Danio rerio (Zebrafish) which are important model organisms for studying embryonic development and cell biology. We work in close collaboration with species-specific resources and model organism databases, such as HGNC, MGI, RGD, Zfin and Xenbase, to ensure consistency between UniProt and these resources.
- All manually reviewed mouse entries can be found here (statistics)
- All manually reviewed rat entries can be found here (statistics)
- All manually reviewed Xenopus laevis entries can be found here
- All manually reviewed Xenopus tropicalis entries can be found here
- All manually reviewed Danio rerio (Zebrafish) entries can be found here