Skip Header

You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.

UniProt release 2011_11

Published November 16, 2011

Headline

Who wants to be a millionaire? The first million HAMAP-annotated entries in UniProtKB/TrEMBL

As humanity explores more environmental and ecological niches, we are discovering a treasure-trove of organisms of which very little, if anything, is known. Sequencing genomes is becoming cheaper, and so to understand this diversity we sequence; but to begin to appreciate a genome’s possibilities quality annotation is required. HAMAP is an annotation project started over 10 years ago to provide annotation to the massive influx of completely sequenced bacterial and archaeal genomes and is now an integral part of the UniProt Automatic Annotation program.

The HAMAP rules automatically annotate bacterial and archaeal proteins, as well as related plastid-encoded proteins, based on manually-annotated, characterized template entries. These latter entries are used to generate the HAMAP profiles. UniProtKB/TrEMBL entries that belong to a family, i.e. that match a HAMAP profile, acquire annotation based on the manually annotated templates as well as template-based feature propagation. The propagated annotation also includes protein and gene names, general annotation (comments), keywords and GO terms. The annotation templates (http://hamap.expasy.org/families.html), seed alignments used to generate the HAMAP profiles and much more are available on the HAMAP website and will be integrated into the www.uniprot.org automatic annotation portal in the future.

Two years ago we wrote a headline highlighting the incorporation of 300,000 HAMAP annotated entries into UniProtKB/Swiss-Prot. Since that time we have discontinued incorporation of these semi-automatically annotated entries into UniProtKB/Swiss-Prot; this annotation is now added to UniProtKB/TrEMBL entries instead, while manually annotated ‘template’ entries (see above) are still integrated into UniProtKB/Swiss-Prot. With this release there are over 1 million bacterial, archaeal and plastid-encoded proteins in UniProtKB/TrEMBL that have been annotated by the HAMAP rules. With each UniProt release, and as families and new template entries are created or updated based on new experiments, entries from all genomes are (re)annotated, enriching them beyond what was known when the genomes were originally submitted to the DNA databases. All these entries are thus improved by this high quality semi-automated annotation, rendering them more useful to the community.

UniProtKB news

Cross-references to KO (KEGG Orthology)

Cross-references have been added to KO consisting of manually defined ortholog groups that correspond to KEGG pathway nodes, BRITE hierarchy nodes, and KEGG module nodes.

KO is available at http://www.genome.jp/kegg/ko.html.

The format of the explicit links in the flat file is:

Resource abbreviation KO
Resource identifier KO identifier
Example P41932:
DR   KO; K06630; -.

Show all the entries having a cross-reference to KO.

Changes to keywords

New keyword:

Changes in the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):
  • 4-hydroxyglutamate
New terms for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):
  • 3-hydroxypyridine-2,5-dicarboxylic acid (Ser-Cys) (with S-...)
  • 3-hydroxypyridine-2,5-dicarboxylic acid (Ser-Ser) (with C-...)
  • Thiazole-4-carboxylic acid (Glu-Cys)