What are reference proteomes?
Last modified March 10, 2015
UniProt provides several sets of proteins thought to be expressed by organisms whose genomes have been completely sequenced, termed “proteomes”.
In the past, these sets were based on the taxonomy of the organisms, combined with the keyword Complete proteome, but as more and more genomes of the same organism are being sequenced, we introduced unique proteome identifiers to distinguish individual proteomes.
These proteomes can be queried and downloaded from the Proteomes section of the UniProt website. UniProtKB entries that are part of a proteome have a cross-reference to their proteome.
With the significant increase in the number of complete genomes sequenced and thus for the number of proteomes as described above, it is critically important to organise this data in a way that allows users to effectively navigate the growing number of available complete proteome sequences. The approach adopted by UniProt to meet this challenge is to define a set of “reference proteomes” which are “landmarks” in proteome space.
Reference proteomes have been selected among all proteomes (manually and algorithmically, according to a number of criteria) to provide broad coverage of the tree of life. Reference proteomes constitute a representative cross-section of the taxonomic diversity to be found within UniProtKB. They include the proteomes of well-studied model organisms and other proteomes of interest for biomedical and biotechnological research. Species of particular importance may be represented by numerous reference proteomes for specific ecotypes or strains of interest.
UniProtKB entries from these reference proteomes are tagged with the keyword Reference proteome.