Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Sequence annotation (Features)

Last modified June 13, 2017

Sequence annotations describe regions or sites of interest in the protein sequence, such as post-translational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also described in this manner.

Sequence annotations (position-specific annotations) used to be found in the ‘Sequence annotation (Features)’ section in the previous version of the UniProtKB entry view. The flat file and XML formats still group all position-specific annotation together in a “feature table” (FT, <feature>). Each sequence annotation consists of a “feature key”, “from” and “to” positions as well as a short description.

The current entry view displays annotation by subject (Function, PTM & processing, etc), and the various position-specific annotations are now distributed to the relevant new sections.

Subsection Content
Molecule processing
Initiator methionine Cleavage of the initiator methionine
Signal Sequence targeting proteins to the secretory pathway or periplasmic space
Transit peptide Extent of a transit peptide for organelle targeting
Propeptide Part of a protein that is cleaved during maturation or activation
Chain Extent of a polypeptide chain in the mature protein
Peptide Extent of an active peptide in the mature protein
Topological domain Location of non-membrane regions of membrane-spanning proteins
Transmembrane Extent of a membrane-spanning region
Intramembrane Extent of a region located in a membrane without crossing it
Domain Position and type of each modular protein domain
Repeat Positions of repeated sequence motifs or repeated domains
Calcium binding Position(s) of calcium binding region(s) within the protein
Zinc finger Position(s) and type(s) of zinc fingers within the protein
DNA binding Position and type of a DNA-binding domain
Nucleotide binding Nucleotide phosphate binding region
Region Region of interest in the sequence
Coiled coil Positions of regions of coiled coil within the protein
Motif Short (up to 20 amino acids) sequence motif of biological interest
Compositional bias Region of compositional bias in the protein
Active site Amino acid(s) directly involved in the activity of an enzyme
Metal binding Binding site for a metal ion
Binding site Binding site for any chemical group (co-enzyme, prosthetic group, etc.)
Site Any interesting single amino acid site on the sequence
Amino acid modifications
Non-standard residue Occurence of non-standard amino acids (selenocysteine and pyrrolysine) in the protein sequence
Modified residue Modified residues excluding lipids, glycans and protein cross-links
Lipidation Covalently attached lipid group(s)
Glycosylation Covalently attached glycan group(s)
Disulfide bond Cysteine residues participating in disulfide bonds
Cross-link Residues participating in covalent linkage(s) between proteins
Natural variations
Alternative sequence Amino acid change(s) producing alternate protein isoforms
Natural variant Description of a natural variant of the protein
Experimental info
Mutagenesis Site which has been experimentally altered by mutagenesis
Sequence uncertainty Regions of uncertainty in the sequence
Sequence conflict Description of sequence discrepancies of unknown origin
Non-adjacent residues Indicates that two residues in a sequence are not consecutive
Non-terminal residue The sequence is incomplete. Indicate that a residue is not the terminal residue of the complete protein
Secondary structure
Helix Helical regions within the experimentally determined protein structure
Turn Turns within the experimentally determined protein structure
Beta strand Beta strand regions within the experimentally determined protein structure

The exact boundaries of the described sequence feature, as well as its length, are provided. When a feature is known to extend beyond the position that is given in this section, the endpoint specification will be preceded by ‘<’ (less than) for features which continue to the N-terminal direction or by ‘>’ (greater than) for features which continue to the C-terminal direction.

Example: P62756

Unknown endpoints are denoted by a question mark ’?’.
Example: P78586

Uncertain endpoints are denoted by a question mark ’?’ before the position, e.g. ’?42’.
Example: Q3ZC31

Feature identifiers

Some features are associated with a unique and stable feature identifier, which allows to construct links directly from position-specific annotation to specialized protein-related databases.

The format of feature identifier is XXX_number, where XXX is the 3-letter code, specific for the feature described, separated by an underscore from a 6 to 10-digit number.

Feature identifiers currently exist for the following topics: Propeptide, Chain, Peptide, Glycosylation, Alternative sequence and Natural variant.

Subsection Format of the identifier Availability Example
Molecule processing
Propeptide PRO_number Any processed propeptide Q7XAD0
PRO_number Any mature polypeptide Q9W568
Amino acid modifications
Glycosylation CAR_number Currently only for residues attached to an oligosaccharide structure annotated in the UniCarbKB database P02771
Natural variations
Alternative sequence VSP_number Any sequence with an ‘Alternative sequence’ feature P81278
Natural variant VAR_number Currently only for protein sequence variants of Hominidae (great apes and humans) P11171