UniProt release 2011_11
Published November 16, 2011
Who wants to be a millionaire? The first million HAMAP-annotated entries in UniProtKB/TrEMBL
As humanity explores more environmental and ecological niches, we are discovering a treasure-trove of organisms of which very little, if anything, is known. Sequencing genomes is becoming cheaper, and so to understand this diversity we sequence; but to begin to appreciate a genome’s possibilities quality annotation is required. HAMAP is an annotation project started over 10 years ago to provide annotation to the massive influx of completely sequenced bacterial and archaeal genomes and is now an integral part of the UniProt Automatic Annotation program.
The HAMAP rules automatically annotate bacterial and archaeal proteins, as well as related plastid-encoded proteins, based on manually-annotated, characterized template entries. These latter entries are used to generate the HAMAP profiles. UniProtKB/TrEMBL entries that belong to a family, i.e. that match a HAMAP profile, acquire annotation based on the manually annotated templates as well as template-based feature propagation. The propagated annotation also includes protein and gene names, general annotation (comments), keywords and GO terms. The annotation templates (http://hamap.expasy.org/families.html), seed alignments used to generate the HAMAP profiles and much more are available on the HAMAP website and will be integrated into the www.uniprot.org automatic annotation portal in the future.
Two years ago we wrote a headline highlighting the incorporation of 300,000 HAMAP annotated entries into UniProtKB/Swiss-Prot. Since that time we have discontinued incorporation of these semi-automatically annotated entries into UniProtKB/Swiss-Prot; this annotation is now added to UniProtKB/TrEMBL entries instead, while manually annotated ‘template’ entries (see above) are still integrated into UniProtKB/Swiss-Prot. With this release there are over 1 million bacterial, archaeal and plastid-encoded proteins in UniProtKB/TrEMBL that have been annotated by the HAMAP rules. With each UniProt release, and as families and new template entries are created or updated based on new experiments, entries from all genomes are (re)annotated, enriching them beyond what was known when the genomes were originally submitted to the DNA databases. All these entries are thus improved by this high quality semi-automated annotation, rendering them more useful to the community.
Cross-references to KO (KEGG Orthology)
Cross-references have been added to KO consisting of manually defined ortholog groups that correspond to KEGG pathway nodes, BRITE hierarchy nodes, and KEGG module nodes.
KO is available at http://www.genome.jp/kegg/ko.html.
The format of the explicit links in the flat file is:
|Resource identifier||KO identifier|
DR KO; K06630; -.
Changes to keywordsNew keyword:
Changes in the controlled vocabulary for PTMsNew term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):
- 3-hydroxypyridine-2,5-dicarboxylic acid (Ser-Cys) (with S-...)
- 3-hydroxypyridine-2,5-dicarboxylic acid (Ser-Ser) (with C-...)
- Thiazole-4-carboxylic acid (Glu-Cys)