Your basket is currently empty.
Select item(s) and click on "Add to basket" to create your own collection here
(400 entries max)
Sequence annotation (Features)
Last modified June 13, 2017
Sequence annotations describe regions or sites of interest in the protein sequence, such as post-translational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also described in this manner.
Sequence annotations (position-specific annotations) used to be found in the ‘Sequence annotation (Features)’ section in the previous version of the UniProtKB entry view. The flat file and XML formats still group all position-specific annotation together in a “feature table” (FT, <feature>). Each sequence annotation consists of a “feature key”, “from” and “to” positions as well as a short description.
The current entry view displays annotation by subject (Function, PTM & processing, etc), and the various position-specific annotations are now distributed to the relevant new sections.
| Subsection | Content |
| Molecule processing | |
| Initiator methionine | Cleavage of the initiator methionine |
| Signal | Sequence targeting proteins to the secretory pathway or periplasmic space |
| Transit peptide | Extent of a transit peptide for organelle targeting |
| Propeptide | Part of a protein that is cleaved during maturation or activation |
| Chain | Extent of a polypeptide chain in the mature protein |
| Peptide | Extent of an active peptide in the mature protein |
| Regions | |
| Topological domain | Location of non-membrane regions of membrane-spanning proteins |
| Transmembrane | Extent of a membrane-spanning region |
| Intramembrane | Extent of a region located in a membrane without crossing it |
| Domain | Position and type of each modular protein domain |
| Repeat | Positions of repeated sequence motifs or repeated domains |
| Calcium binding | Position(s) of calcium binding region(s) within the protein |
| Zinc finger | Position(s) and type(s) of zinc fingers within the protein |
| DNA binding | Position and type of a DNA-binding domain |
| Nucleotide binding | Nucleotide phosphate binding region |
| Region | Region of interest in the sequence |
| Coiled coil | Positions of regions of coiled coil within the protein |
| Motif | Short (up to 20 amino acids) sequence motif of biological interest |
| Compositional bias | Region of compositional bias in the protein |
| Sites | |
| Active site | Amino acid(s) directly involved in the activity of an enzyme |
| Metal binding | Binding site for a metal ion |
| Binding site | Binding site for any chemical group (co-enzyme, prosthetic group, etc.) |
| Site | Any interesting single amino acid site on the sequence |
| Amino acid modifications | |
| Non-standard residue | Occurence of non-standard amino acids (selenocysteine and pyrrolysine) in the protein sequence |
| Modified residue | Modified residues excluding lipids, glycans and protein cross-links |
| Lipidation | Covalently attached lipid group(s) |
| Glycosylation | Covalently attached glycan group(s) |
| Disulfide bond | Cysteine residues participating in disulfide bonds |
| Cross-link | Residues participating in covalent linkage(s) between proteins |
| Natural variations | |
| Alternative sequence | Amino acid change(s) producing alternate protein isoforms |
| Natural variant | Description of a natural variant of the protein |
| Experimental info | |
| Mutagenesis | Site which has been experimentally altered by mutagenesis |
| Sequence uncertainty | Regions of uncertainty in the sequence |
| Sequence conflict | Description of sequence discrepancies of unknown origin |
| Non-adjacent residues | Indicates that two residues in a sequence are not consecutive |
| Non-terminal residue | The sequence is incomplete. Indicate that a residue is not the terminal residue of the complete protein |
| Secondary structure | |
| Helix | Helical regions within the experimentally determined protein structure |
| Turn | Turns within the experimentally determined protein structure |
| Beta strand | Beta strand regions within the experimentally determined protein structure |
The exact boundaries of the described sequence feature, as well as its length, are provided. When a feature is known to extend beyond the position that is given in this section, the endpoint specification will be preceded by ‘<’ (less than) for features which continue to the N-terminal direction or by ‘>’ (greater than) for features which continue to the C-terminal direction.
Example: P62756
Unknown endpoints are denoted by a question mark ’?’.
Example: P78586
Uncertain endpoints are denoted by a question mark ’?’ before the position, e.g. ’?42’.
Example: Q3ZC31
Feature identifiers
Some features are associated with a unique and stable feature identifier, which allows to construct links directly from position-specific annotation to specialized protein-related databases.
The format of feature identifier is XXX_number, where XXX is the 3-letter code, specific for the feature described, separated by an underscore from a 6 to 10-digit number.
Feature identifiers currently exist for the following topics: Propeptide, Chain, Peptide, Glycosylation, Alternative sequence and Natural variant.
| Subsection | Format of the identifier | Availability | Example |
| Molecule processing | |||
| Propeptide | PRO_number | Any processed propeptide | Q7XAD0 |
| Chain Peptide |
PRO_number | Any mature polypeptide | Q9W568 P15515 |
| Amino acid modifications | |||
| Glycosylation | CAR_number | Currently only for residues attached to an oligosaccharide structure annotated in the UniCarbKB database | P02771 |
| Natural variations | |||
| Alternative sequence | VSP_number | Any sequence with an ‘Alternative sequence’ feature | P81278 |
| Natural variant | VAR_number | Currently only for protein sequence variants of Hominidae (great apes and humans) | P11171 |
