Last modified April 14, 2015
The annotation score provides a heuristic measure of the annotation content of a UniProtKB entry or proteome.
It is computed in the following way:
- Different UniProtKB annotation types (e.g. protein names, gene names, functional annotations and sequence annotations, GO annotations, cross-references) are scored either by presence or by number of occurrences. Annotations with experimental evidence score higher than equivalent predicted/inferred annotations, thereby favoring expert literature-based curation over automatic annotation.
- The score of an individual entry is the sum of the scores of its annotations.
- The score of a proteome is the sum of the scores of the entries that are part of the proteome.
The open-ended interval obtained for these absolute numbers is translated into a 5-point-system by splitting it into 5 sub-intervals. Scores in the first interval are represented by “1 point out of 5” , those in the second by “2 points out of 5”, etc. An annotation score of 5 points is therefore associated with the best-annotated entries, and a 1-point-score denotes an entry with rather basic annotation.
There are several contexts in which annotation scores can be used:
The annotation scores can help you to get a quick idea of the relative level of annotation of the entries in your search results. Please note that search results are not ranked by the annotation score, but by a query score that considers not only the annotation scores of the entries that match your query, but also how often (and where) your query term(s) appear in a matching entry and across the whole database, and the importance of a term according to the total number of terms. For this reason, the best ranked entries are not necessarily those with the highest annotation scores.
UniProt is using annotation scores to select the representative member of a UniRef cluster.
- Reference proteomes
UniProt is using annotation scores to assist the selection of reference proteomes.
Please note that the annotation score cannot be used as a measure of the accuracy of the annotation – as we cannot define the “correct annotation” for any given protein.