Taxonomy
NEWT is the taxonomy database maintained by the UniProt group. It integrates taxonomy data compiled in the NCBI database and data specific to the UniProt Knowledgebase. [Reference].
Species with protein sequences stored in the UniProt Knowledgebase are named according to UniProt nomenclature [Guide to organism denomination]. We endeavour to maintain a list of manually curated species names for which protein sequence data is available. In particular, we have adopted a systematic convention for naming viral and bacterial strains and isolates.
Links to external sites are chosen by the UniProt taxonomy team and show pictures and various scientific data of interest (taxonomy, biology, physiology,...). Due to the sheer volume of data present on the world-wide web, it is unfortunately not possible to contact each site individually. Should you wish to have your site linked to NEWT, or would prefer us to have the link to your site removed, please do not hesitate to contact us.
Query the database by keywords (species name) or NCBI taxonomic identifier. NOTE: search by keywords is case-insensitive and scientific as well as common names in plain English can be used; you may use asterisks as wildcards anywhere in the query.
Why is it called NEWT? because we like that name. It also happens that
newt is the English translation of the French word salamandre, which is
the name of a cute little animal that slides through the most impenetrable
looking walls.
French speakers may enjoy reading la Salamandre.
Organism name
The organism denomination used in the UniProt Knowledgebase
consists of the Latin scientific name,
usually composed of the genus
and species names (binomial
system developed by Linnaeus), followed optionally by the English
common name and a synonym.
For example: Cardamine
pratensis
(Cuckoo flower)
(Alpine bitter cress)
The synonym can be a common name in English or in Latin in the case of some
historical legacy names.
For example: Radianthus
magnifica
(Magnificent sea anemone)
(Heteractis magnifica)
In the case of viruses, the denomination does not follow the binomial system.
The English common name is used as the scientific name, sometimes followed by an acronym.
Where possible, viruses are named according to the nomenclature of the
International Committee
on Taxonomy of Viruses (ICTV).
Organism mnemonic
A mnemonic organism identification code of at most 5 alphanumeric characters
is used in the identification name of UniProt protein sequence entries, e.g.
SP0A_BACSU.
This code is generally
made of the first three letters of the genus
and the first two letters of the species.
For example: PSEPU is for Pseudomonas putida
and NAJNI is for Naja nivea.
However, for species most commonly encountered in the database, self-explanatory codes are used. There are 16 of those codes: BOVIN for Bovine, CHICK for Chicken, ECOLI for Escherichia coli, HORSE for Horse, HUMAN for Human, MAIZE for Maize (Zea mays), MOUSE for Mouse, PEA for Garden pea (Pisum sativum), PIG for Pig, RABIT for Rabbit, RAT for Rat, SHEEP for Sheep, SOYBN for Soybean (Glycine max), TOBAC for Common tobacco (Nicotina tabacum), WHEAT for Wheat (Triticum aestivum), and YEAST for Baker's yeast (Saccharomyces cerevisiae).
As it was not possible to apply the above rules to viruses, they were given arbitrary, but generally easy-to-remember identification codes.
Codes starting with the digit 9 are used for higher nodes that group together organisms at a
given taxonomic level.
For example: 9AMPH is for Amphibia and 9COLE is for Coleoptera.
Other organism names used in the taxonomy view
In addition to the manually curated
Species list, the taxonomy view shows all other names archived in
the NCBI taxonomy database.
This includes names classified as misspelling and misnomers, that have been collected from
various external sources and can be considered legacy data.
Name priority ranking:
organism naming has always been an area where the creativity of biologists has consistently
reached unsuspected heights. Practically, this means that one organism is frequently described by
many different names. For practical purposes, attributing a priority ranking to each name avoids
redundancy in search results, so that taxonomy searches in priority UniProtKB organism names,
and then all other names, if no name with a higher priority matches the search term(s).
Lineage and taxonomy node rank definition
Taxonomy is organized in a tree structure, which represents the taxonomic lineage.
The position of each node on a tree is determined by its rank in the taxonomy hierarchy,
so that the last ranks (usually species or sub-species) represent the leaves on the
tree's branches, and higher ranks like phylum, order and
family are placed higher on the tree. The ordered list of the nodes forms the lineage.
The UniProtKB taxonomy database stores the taxonomy tree structure, thus making it possible to navigate from one node to another and to access the lineage for each node.
For convenience reasons, the nucleotide and protein sequence database entries store an abbreviated lineage, which contains only the familiar taxa names. The taxonomy view displays the full lineage, including the so-called hidden nodes, which do not appear in the abbreviated lineage.
Organism strains
For a given organism, the list of strains contains all strains present
in UniprotKB/Swiss-Prot entries for that organism.
Where available, synonyms for particular strain names are listed in grey
after the main name.
Warning: Some of the strains present in the strain list might have their
own taxonomy identifiers in the NCBI taxonomy database, but these taxonomy
identifiers are not used in UniProtKB/Swiss-Prot entries.
Strain data appears in the reference comment (RC lines) of UniProtKB/Swiss-Prot entries.
See for example: P42652.
Viral hosts
A list of natural hosts is given for all viruses
with at least one protein sequence entry in UniProtKB/Swiss-Prot.
Viral hosts data appears in the organism hosts section (OH lines) of UniProt/Swiss-Prot entries.
See for example: Q8JP02.
* What is the purpose of the host information? *
A virus is an inert particle outside its hosts. The virion (so called because it is
not visible under the microscope), on its own, has neither metabolism, nor any
replication capability, nor autonomous evolution.
A virus cannot be considered a living organism outside its host.
The viral taxonomy is arbitrarily based on the nature of viral genomes, and viruses
in a same family can infect a wide range of hosts, from mammals to insects. The
nature of the host does not always appear in the virus name ex: Yellow Head Virus (host =
shrimp!).
There are numerous virus-host interactions:
- shut-off of traduction
- immunoevasion
- latency
- oncogenesis on the virus side and antiviral state
- antigen presentation
- immune system on the host side
These interactions appear in the annotation of viral entries under various comment types such as function, subunit, subcellular location and PTM.
