Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.



        UniProt - Swiss-Prot Protein Knowledgebase
        SIB Swiss Institute of Bioinformatics; Geneva, Switzerland
        European Bioinformatics Institute (EBI); Hinxton, United Kingdom
        Protein Information Resource (PIR); Washington DC, USA

Description: Prokaryotic protein naming guidelines
Name:        Proknameprot.txt
Release:     2015_11 of 11-Nov-2015


This is a subset of the UniProtKB document nameprot.txt which has been
developed with the International Nucleotide Sequence Database Collaboration
(INSDC) ( to provide guidelines for submitters of prokaryotic


RN : Recommended name (RecName)
AN : Alternative name (AltName)
GS : Gene symbol
OLN: Ordered locus name

General naming rules

If it exists, use the approved nomenclature.
See: nomlist.txt (, a list of nomenclature
related references for proteins.

If no accepted unification exists, and several alternatives are of equal
frequency in the literature, we use the one with the easiest extensibility or
standardization. In addition, preference is given to names that best reflect
the common acronym or gene symbol.

The protein naming guidelines are based on the premise that a good and stable
recommended name (referred to hereafter as "RN") for a protein is a name that
is as neutral as possible.

An RN should be, as far as possible, unique and attributed to all orthologs.

To facilitate attribution of the RN to all orthologs it should not include
references to specific characteristics of the protein in one particular species;
in particular it should not reflect the function or role of the protein, nor
its subcellular location, its domain structure, its tissue specificity, its
molecular weight or its species of origin.

The following examples illustrate cases where the use of such terminology
renders consistent application of a recommended name difficult, and explains
the reasons why:

- An RN should not contain information about the molecular weight of the
  protein, which may vary between orthologs.
  e.g. "unicornase subunit A" is preferred to "unicornase 52 kDa subunit."

- An RN should not be based on the name of a disease in which the protein may be
  implicated because this may apply to a single species.
  e.g. "Bloom syndrome protein" is not suitable.

- An RN should not be based on species-specific patterns of expression or
  e.g. "testis-specific protein ..." is not suitable.
  e.g. "androgen-induced protein 1" is not suitable.

- Finally, an RN must not include mention of a particular species.
  e.g. "yeast Ku70 protein" is not suitable.

The most optimal RN is a word that ends with "in" and which can be easily
pronounced in English.
  e.g. "zyxin", "insulin", "hemoglobin", "caveolin", "desmoglein", "secretin",

Names ending in "ine" should be avoided.
  e.g. "maurocalcin" instead of "maurocalcine".

Wherever American and British spelling conventions differ, the RN should use
the American form.
  e.g. "hemoglobin" instead of "haemoglobin".

- An RN should not contain a roman numeral.
  e.g. "caveolin-2" instead of "caveolin-II".

Exceptions are allowed for historical cases.
  e.g. "coagulation factor IX", "casein kinase II", "HLA class I", etc.
  e.g. "type III restriction enzyme", "DNA helicase I", and "type IV pilus
  assembly protein".

Abbreviations should not refer to the molecular weight of a protein.
e.g. Abbreviations such as p123, Gp62, p34 are not suitable.
Exceptions are allowed for cases where historically the molecular weight
has been consistently and generally applied as part of the accepted name.
  e.g. "p53".

General syntax

Greek letters are written entirely in lower case with the exception of "Delta"
in the context of the steroid/fatty acid metabolism nomenclature. Greek letters
must be written in full.
  e.g. "alpha", "omega".

If a Greek letter is preceded or followed by a number or letter, then it must
be separated by a dash "-".
  e.g. "unicornase alpha-1", "myprotease A-beta".

An RN should not use diacritics, such as accents, umlauts and so on.
  e.g. "Krüppel" is not suitable.

Eponyms should be used in the non-possessive form (a name should not be
followed by "'s").
  e.g. "Alzheimer disease amyloid A4 protein" instead of "Alzheimer's disease
  amyloid A4 protein".

RN based on the gene symbol (GS) should be in the form "protein <GS>" instead
of "<GS> protein". The word "protein" should be added in cases where no other
descriptor can be added instead of merely having the protein symbol by itself.
Note that for prokaryotic proteins, the gene symbol is capitalized and not
  e.g. "protein AbcD" instead of "AbcD protein" or "AbcD".
  e.g. "response regulator AlgR".

Some examples where the addition of the GS is useful include:

mismatch repair proteins
DNA repair proteins
DNA/RNA polymerases
DNA/RNA helicases
GTP-binding proteins
transcriptional regulators
cell division proteins
outer membrane proteins
recombination proteins
conjugation proteins
flagellar proteins
sporulation proteins
secretion proteins
[Note: this list is not exhaustive]

Whenever possible commas should be avoided in a RN except when their usage is
obligatory in accepted chemical names.
  e.g. "acyl-CoA dehydrogenase, short-chain specific" should be
       "short-chain specific acyl-CoA dehydrogenase".

Symbols of chemical elements can be used in abbreviations.
  e.g. "magnesium/calcium co-transporter" can be abbreviated as "Mg/Ca

For ions, chemical element symbols (e.g. Cu(+), Mg(2+), etc.) are preferred to
systematic names (copper(I), magnesium ion, etc.) and common names (cupric,
ferrous, etc). When necessary, the valence should be indicated within
  e.g. "Fe(2+)", "Fe(3+)", Cl(-), etc.

Abbreviations should not appear inside a RN, with the exception of:

  Deoxyribonucleic acid:
  Ribonucleic acid:
  Mono-, di-, tri- nucleoside phosphates:

[Note: this list is not exhaustive]

Note: protein name abbreviations should not be used.
  e.g. "acyl carrier protein" instead of "ACP".

Charged tRNAs are indicated by "tRNA" followed by the three-letter amino acid
code, with the first letter capitalized, in brackets.
e.g. "Glu-tRNA(Gln) amidotransferase subunit B".

Hyphens should be used to form compound modifiers (i.e. two or more words that
are acting as a single modifier for a noun). The following terms are commonly
used in compound identifiers:

  activated, activating, adapting, adding, amplified, anchored, anchoring,
  antagonizing, associated, associating, attracting, binding, blocking, bound,
  branching, bridging, bundling, capping, complementing, concentrating,
  conjugating, containing, controlled, controlling, converting, coupled,
  coupling, decapping, degrading, dependent, depolymerizing, derepressing,
  derived, deriving, destabilizing, docking, editing, enhanced, enhancing,
  enriched, exposed, expressed, flanking, forming, gated, grabbing, harvesting,
  independent, induced, inducible, inducing, inhibited, inhibiting, insensitive,
  interacting, laying, like, linked, linking, metabolizing, modifying,
  modulating, polymerizing, potentiating, preventing, processing, promoting,
  recognizing, recruited, recruiting, regulated, regulating, related, released,
  releasing, remodeling, removing, repressing, required, requiring, resistant,
  responsive, rich, ripening, scaffolding, sensing, sensitive, signaling,
  specific, splicing, spreading, stabilized, stabilizing, stacking, stimulated,
  stimulating, structuring, sulfating, suppressing, trafficking, transformed,
  transforming, transporting [Note: This list is not exhaustive].

  e.g. "secretin-binding protein", "pyrophosphate-dependent


Specific rules for enzymes

Enzymes commonly have RNs ending in "ase".
  e.g. "aminoacylase", "arginase", "caspase", "elastase", etc.

Transfer enzymes are often named in such a way as to describe the source and
target of the transfer reaction, with the two separated by a double dash (--).
This is an IUBMB recommendation.
  e.g. "formylmethanofuran--tetrahydromethanopterin formyltransferase".

For protein kinases and phosphatases, use the format:
"<modified_residues>-protein <activity>".
  e.g. serine/threonine-protein kinase", "tyrosine-protein phosphatase".

In cases where the protein is possibly an inactive version of an enzyme, avoid
mentioning the activity in the name unless in expressions such as "X domain-
containing protein". Inactive versions refer to proteins where active site
residues are altered, for example, and do not refer to pseudogenes.
  e.g. "protease domain-containing protein".

In some cases, the protein is named based on the pathway it is involved in. In
such cases the following format is suitable: "<Pathway> biosynthesis protein
  e.g. "thiamine biosynthesis protein ThiC".

Specific rules for multiprotein complexes

Proteins that belong to well-defined multi-subunit complexes can be named
according to the complex, followed by the specific subunit name. This type of
nomenclature is only allowed for well-defined complexes of known composition.
  e.g. "26S proteasome non-ATPase regulatory subunit 1".

The word "subunit" is preferred to "chain" or "component".
Chain refers to proteolytically processed polypeptides arising from a common
precursor protein.
  e.g. "unicornase heavy chain", "unicornase light chain".

If the name contains a "type" of subunit, then precede the word "subunit" with
the "type". The "type" is a controlled vocabulary:

  [Note: This list is not exhaustive]

  e.g. "unicornase regulatory subunit".

Avoid the word "subunit" with a size indicator:
  e.g. "unicornase large subunit", "ribosomal large subunit pseudouridine synthase", etc.

If the name contains a "designator" of the subunit, then the "designator" must
follow the word "subunit":

  Numbers               unicornase subunit 2
  Letters               unicornase subunit A
  GS                    unicornase subunit AbcD
  Greek letters         unicornase subunit alpha

The preference is to use Numbers > Letters > GS > Greek letters

An RN can include both a "type" and a "Designator"
  e.g. "unicornase regulatory subunit 1".

Additional rules

Unfortunately there are proteins of unknown or uncertain function for which
only family/domain identification, similarity or no information at all is
available. In these cases, we would recommend the following.

"Hypothetical protein" or "Uncharacterized protein".
These two are the only recommended terms for naming proteins of unknown

The following words should be avoided in a RN:

  Protein of unknown function
  Similar to

  Note: these words can be used IF they are 'internal' to the RN and
        do not convey a 'global' meaning.
  e.g.  "high-potential iron-sulfur protein".

When an RN is based on the predicted activity of the protein, the RN can be
preceded by 'putative' e.g. "putative acetylornithine deacetylase".

Proteins of unknown function which nevertheless contain a defined domain or
motif (that itself does not specify a particular function) are generally named
according to the domain(s) or repeat(s) present. The name should then be of the
following type: "<domain|repeat>-containing protein".
  e.g. "PAS domain-containing protein 5", "thioredoxin-domain containing

If there is more than one domain/repeat, use a slash for all items preceding
"containing" in accordance with grammatical rules. This also helps
differentiate specific domains.
  e.g. "ankyrin repeat/SAM domain-containing protein 1".

Do not use plurals.
  e.g. "ankyrin repeats-containing protein 8" is wrong.

Proteins of unknown function which exhibit significant sequence similarity to a
defined protein family have been named in accordance with other members of that
family. The word protein should be added after family if no other descriptor is
  e.g. "Holliday junction resolvase family endonuclease", "LysR family
  transcriptional regulator".

It is also possible to use "-like" in the name. Bear in mind that this
should only be used for cases that are outliers to a tight homomorphic
family. Family is preferred over '-like'.
  e.g. "Holliday junction resolvase-like protein".

Certain proteins have multiple functions. The RN could reflect this situation.
For multifunctional proteins which do not yet have a single unique name, a name
can be formed by combining individual functions along with a prefix specifying
the number of functions ('bi', 'tri', etc.). Each function should be separated
by a forward slash "/".
e.g. "bifunctional adenylyltransferase/ADP-heptose synthase cyclohydrolase".