Skip Header

 

Taxonomy

NEWT is the taxonomy database maintained by the UniProt group. It integrates taxonomy data compiled in the NCBI database and data specific to the UniProt Knowledgebase. [Reference].

Species with protein sequences stored in the UniProt Knowledgebase are named according to UniProt nomenclature [Guide to organism denomination]. We endeavour to maintain a list of manually curated species names for which protein sequence data is available. In particular, we have adopted a systematic convention for naming viral and bacterial strains and isolates.

Links to external sites are chosen by the UniProt taxonomy team and show pictures and various scientific data of interest (taxonomy, biology, physiology,...). Due to the sheer volume of data present on the world-wide web, it is unfortunately not possible to contact each site individually. Should you wish to have your site linked to NEWT, or would prefer us to have the link to your site removed, please do not hesitate to contact us.

Query the database by keywords (species name) or NCBI taxonomic identifier. NOTE: search by keywords is case-insensitive and scientific as well as common names in plain English can be used; you may use asterisks as wildcards anywhere in the query.

Why is it called NEWT? because we like that name. It also happens that newt is the English translation of the French word salamandre, which is the name of a cute little animal that slides through the most impenetrable looking walls. French speakers may enjoy reading la Salamandre. salamandre

Organism name

The organism denomination used in the UniProt Knowledgebase consists of the Latin scientific name, usually composed of the genus and species names (binomial system developed by Linnaeus), followed optionally by the English common name and a synonym.
For example: Cardamine pratensis (Cuckoo flower) (Alpine bitter cress)
The synonym can be a common name in English or in Latin in the case of some historical legacy names.
For example: Radianthus magnifica (Magnificent sea anemone) (Heteractis magnifica)
In the case of viruses, the denomination does not follow the binomial system. The English common name is used as the scientific name, sometimes followed by an acronym. Where possible, viruses are named according to the nomenclature of the International Committee on Taxonomy of Viruses (ICTV).

Organism mnemonic

A mnemonic organism identification code of at most 5 alphanumeric characters is used in the identification name of UniProt protein sequence entries, e.g. SP0A_BACSU. This code is generally made of the first three letters of the genus and the first two letters of the species.
For example: PSEPU is for Pseudomonas putida and NAJNI is for Naja nivea.

However, for species most commonly encountered in the database, self-explanatory codes are used. There are 16 of those codes: BOVIN for Bovine, CHICK for Chicken, ECOLI for Escherichia coli, HORSE for Horse, HUMAN for Human, MAIZE for Maize (Zea mays), MOUSE for Mouse, PEA for Garden pea (Pisum sativum), PIG for Pig, RABIT for Rabbit, RAT for Rat, SHEEP for Sheep, SOYBN for Soybean (Glycine max), TOBAC for Common tobacco (Nicotina tabacum), WHEAT for Wheat (Triticum aestivum), and YEAST for Baker's yeast (Saccharomyces cerevisiae).

As it was not possible to apply the above rules to viruses, they were given arbitrary, but generally easy-to-remember identification codes.

Codes starting with the digit 9 are used for higher nodes that group together organisms at a given taxonomic level.
For example: 9AMPH is for Amphibia and 9COLE is for Coleoptera.

Other organism names used in the taxonomy view

In addition to the manually curated Species list, the taxonomy view shows all other names archived in the NCBI taxonomy database. This includes names classified as misspelling and misnomers, that have been collected from various external sources and can be considered legacy data.

Name priority ranking: organism naming has always been an area where the creativity of biologists has consistently reached unsuspected heights. Practically, this means that one organism is frequently described by many different names. For practical purposes, attributing a priority ranking to each name avoids redundancy in search results, so that taxonomy searches in priority UniProtKB organism names, and then all other names, if no name with a higher priority matches the search term(s).

Lineage and taxonomy node rank definition

Taxonomy is organized in a tree structure, which represents the taxonomic lineage. The position of each node on a tree is determined by its rank in the taxonomy hierarchy, so that the last ranks (usually species or sub-species) represent the leaves on the tree's branches, and higher ranks like phylum, order and family are placed higher on the tree. The ordered list of the nodes forms the lineage.

The UniProtKB taxonomy database stores the taxonomy tree structure, thus making it possible to navigate from one node to another and to access the lineage for each node.

For convenience reasons, the nucleotide and protein sequence database entries store an abbreviated lineage, which contains only the familiar taxa names. The taxonomy view displays the full lineage, including the so-called hidden nodes, which do not appear in the abbreviated lineage.

Organism strains

For a given organism, the list of strains contains all strains present in UniprotKB/Swiss-Prot entries for that organism. Where available, synonyms for particular strain names are listed in grey after the main name.
Warning: Some of the strains present in the strain list might have their own taxonomy identifiers in the NCBI taxonomy database, but these taxonomy identifiers are not used in UniProtKB/Swiss-Prot entries.
Strain data appears in the reference comment (RC lines) of UniProtKB/Swiss-Prot entries.
See for example: P42652.

Viral hosts

A list of natural hosts is given for all viruses with at least one protein sequence entry in UniProtKB/Swiss-Prot.
Viral hosts data appears in the organism hosts section (OH lines) of UniProt/Swiss-Prot entries. See for example: Q8JP02.

* What is the purpose of the host information? *
A virus is an inert particle outside its hosts. The virion (so called because it is not visible under the microscope), on its own, has neither metabolism, nor any replication capability, nor autonomous evolution. A virus cannot be considered a living organism outside its host. The viral taxonomy is arbitrarily based on the nature of viral genomes, and viruses in a same family can infect a wide range of hosts, from mammals to insects. The nature of the host does not always appear in the virus name ex: Yellow Head Virus (host = shrimp!).
There are numerous virus-host interactions:

  • shut-off of traduction
  • immunoevasion
  • latency
  • oncogenesis on the virus side and antiviral state
  • antigen presentation
  • immune system on the host side

These interactions appear in the annotation of viral entries under various comment types such as function, subunit, subcellular location and PTM.