Last modified March 16, 2015
The taxonomy database that is maintained by the UniProt group is based on the NCBI taxonomy database, which is supplemented with data specific to the UniProt Knowledgebase (UniProtKB). While the NCBI taxonomy is updated daily to be in sync with GenBank/EMBL-Bank/DDBJ, the UniProt taxonomy is updated only at UniProt releases to be in sync with UniProtKB. It may therefore happen that for the time period of a UniProt release, you can find new taxa at the NCBI that are not yet in UniProt (and vice versa for deleted taxa).
Species with manually annotated and reviewed protein sequences in the Swiss-Prot section of UniProtKB are named according to UniProt nomenclature. In particular, we have adopted a systematic convention for naming viral and bacterial strains and isolates.
Links to external sites are chosen by the UniProt taxonomy team and show pictures and various scientific data of interest (taxonomy, biology, physiology, etc.). Due to the sheer volume of data present on the world-wide web, it is unfortunately not possible to contact each site individually. Should you wish to have your site linked from uniprot.org, or would prefer us to remove a link to your site, please do not hesitate to contact us.
You can query the UniProt taxonomy by taxon names or NCBI taxonomy identifiers. Searches by names are case-insensitive and you may use asterisks as wildcards anywhere in the query. When you search for taxon names, the results that match a UniProt organism denomination are ranked higher than those which match other organism names.
The organism denomination used in UniProtKB consists of the Latin scientific name, usually composed of the genus and species names (binomial system developed by Linnaeus), followed optionally by the English common name and a synonym.
Example: Cardamine pratensis (Cuckoo flower) (Alpine bitter cress)
The synonym can be a common name in English or in Latin in the case of some historical legacy names.
Example: Radianthus magnifica (Magnificent sea anemone) (Heteractis magnifica)
In the case of viruses, the denomination does not follow the binomial system. The English common name is used as the scientific name, sometimes followed by an acronym. Where possible, viruses are named according to the nomenclature of the International Committee on Taxonomy of Viruses.
A mnemonic organism identification code of at most 5 alphanumeric characters is used in the entry name of UniProtKB entries, e.g. SP0A_BACSU. This code is generally made of the first three letters of the genus name and the first two letters of the species name.
PSEPU is for Pseudomonas putida
NAJNI is for Naja nivea.
However, for a number of species commonly encountered in UniProtKB, we use self-explanatory codes. There are 16 of those codes:
- BOVIN for Bovine
- CHICK for Chicken
- ECOLI for Escherichia coli
- HORSE for Horse
- HUMAN for Homo sapiens
- MAIZE for Maize (Zea mays)
- MOUSE for Mouse
- PEA for Garden pea (Pisum sativum)
- PIG for Pig
- RABIT for Rabbit
- RAT for Rat
- SHEEP for Sheep
- SOYBN for Soybean (Glycine max)
- TOBAC for Common tobacco (Nicotina tabacum)
- WHEAT for Wheat (Triticum aestivum)
- YEAST for Baker’s yeast (Saccharomyces cerevisiae)
Since the above rules cannot apply to viruses, we give them arbitrary, but generally easy-to-remember, identification codes.
Codes starting with the digit 9 are used for higher nodes that group together organisms at a given taxonomic level.
9AMPH is for Amphibia
9COLE is for Coleoptera.
Other organism names
Organism nomenclature has always been an area where the creativity of biologists has consistently reached unsuspected heights. Practically, this means that one organism is frequently described by many different names. In addition to the organism denomination that is displayed in UniProtKB entries, the UniProt taxonomy entries also show all other names that are archived in the NCBI taxonomy database. This includes names classified as misspelling and misnomers that have been collected from various external sources and can be considered legacy data.
Lineage and taxonomy node rank
Taxonomy is organized in a tree structure that represents the taxonomic lineage. The position of each node on the tree is determined by its rank in the taxonomy hierarchy, so that the last ranks (usually
subspecies) represent the leaves on the tree’s branches and higher ranks (e.g.
family) are placed higher on the tree. The ordered list of the nodes forms the lineage.
The UniProt taxonomy database stores the taxonomy tree structure, thus making it possible to navigate from one node to another and to access the lineage for each node.
For convenience reasons, both GenBank/EMBL-Bank/DDBJ and UniProtKB entries store an abbreviated lineage, which contains only the familiar taxon names. But when you look at a UniProtKB entry on this website, you can configure its ‘Taxonomic lineage’ field to display the full lineage, including the so-called “hidden nodes”, which do not appear in the abbreviated lineage. Also, when you search for a taxon in UniProtKB on this website, the taxon is searched in the full lineage of the entries.
A list of strains may be provided for organisms with at least one entry in UniProtKB/Swiss-Prot. Where available, synonyms for particular strain names are listed in grey after the main name (see example ECOLX). In UniProtKB entries, strain names are displayed in the Strain lines under ‘Publications’ (see example P42652).
Note: Some of the strains present in the strain list might have their own taxon in the NCBI taxonomy database. The policy for the description of the source organism for a sequence has changed over the years from species to strain and back to species and you will therefore find a mixture of species and strain assignments in the nucleotide and protein databases.
A list of natural hosts is given for all viruses with at least one entry in UniProtKB/Swiss-Prot. Viral hosts data appears in the ‘Virus host’ field of UniProtKB entries (see example Q8JP02).
A virus is an inert particle outside its hosts. The virion (so called because it is not visible under the microscope), on its own, has neither metabolism, nor any replication capability, nor autonomous evolution. A virus cannot be considered a living organism outside its host. The viral taxonomy is arbitrarily based on the nature of viral genomes, and viruses in a same family can infect a wide range of hosts, from mammals to insects. The nature of the host does not always appear in the virus name, e.g. the hosts of the Yellow head virus are shrimps.
There are numerous virus-host interactions:
- shut-off of traduction
- oncogenesis on the virus side and antiviral state
- antigen presentation
- immune system on the host side
These interactions appear in the annotation of viral UniProtKB entries under various annotation types such as function, subunit, subcellular location and PTM.
Related document: Controlled vocabulary of species