Sequence annotation (Features)
Last modified April 15, 2016
Sequence annotations describe regions or sites of interest in the protein sequence, such as post-translational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also described in this manner.
Sequence annotations (position-specific annotations) used to be found in the 'Sequence annotation (Features)' section in the previous version of the UniProtKB entry view. The flat file and XML formats still group all position-specific annotation together in a "feature table" (FT, <feature>). Each sequence annotation consists of a "feature key", "from" and "to" positions as well as a short description.
The current entry view displays annotation by subject (Function, PTM & processing, etc), and the various position-specific annotations are now distributed to the relevant new sections.
|Initiator methionine||Cleavage of the initiator methionine|
|Signal||Sequence targeting proteins to the secretory pathway or periplasmic space|
|Transit peptide||Extent of a transit peptide for organelle targeting|
|Propeptide||Part of a protein that is cleaved during maturation or activation|
|Chain||Extent of a polypeptide chain in the mature protein|
|Peptide||Extent of an active peptide in the mature protein|
|Topological domain||Location of non-membrane regions of membrane-spanning proteins|
|Transmembrane||Extent of a membrane-spanning region|
|Intramembrane||Extent of a region located in a membrane without crossing it|
|Domain||Position and type of each modular protein domain|
|Repeat||Positions of repeated sequence motifs or repeated domains|
|Calcium binding||Position(s) of calcium binding region(s) within the protein|
|Zinc finger||Position(s) and type(s) of zinc fingers within the protein|
|DNA binding||Position and type of a DNA-binding domain|
|Nucleotide binding||Nucleotide phosphate binding region|
|Region||Region of interest in the sequence|
|Coiled coil||Positions of regions of coiled coil within the protein|
|Motif||Short (up to 20 amino acids) sequence motif of biological interest|
|Compositional bias||Region of compositional bias in the protein|
|Active site||Amino acid(s) directly involved in the activity of an enzyme|
|Metal binding||Binding site for a metal ion|
|Binding site||Binding site for any chemical group (co-enzyme, prosthetic group, etc.)|
|Site||Any interesting single amino acid site on the sequence|
|Amino acid modifications|
|Non-standard residue||Occurence of non-standard amino acids (selenocysteine and pyrrolysine) in the protein sequence|
|Modified residue||Modified residues excluding lipids, glycans and protein cross-links|
|Lipidation||Covalently attached lipid group(s)|
|Glycosylation||Covalently attached glycan group(s)|
|Disulfide bond||Cysteine residues participating in disulfide bonds|
|Cross-link||Residues participating in covalent linkage(s) between proteins|
|Alternative sequence||Amino acid change(s) producing alternate protein isoforms|
|Natural variant||Description of a natural variant of the protein|
|Mutagenesis||Site which has been experimentally altered by mutagenesis|
|Sequence uncertainty||Regions of uncertainty in the sequence|
|Sequence conflict||Description of sequence discrepancies of unknown origin|
|Non-adjacent residues||Indicates that two residues in a sequence are not consecutive|
|Non-terminal residue||The sequence is incomplete. Indicate that a residue is not the terminal residue of the complete protein|
|Helix||Helical regions within the experimentally determined protein structure|
|Turn||Turns within the experimentally determined protein structure|
|Beta strand||Beta strand regions within the experimentally determined protein structure|
The exact boundaries of the described sequence feature, as well as its length, are provided. When a feature is known to extend beyond the position that is given in this section, the endpoint specification will be preceded by ‘<’ for features which continue to the N-terminal direction or by ‘>’ for features which continue to the C-terminal direction.
Unknown endpoints are denoted by ’?’.
Uncertain endpoints are denoted by a ’?’ before the position, e.g. ’?42’.
Some features are associated with a unique and stable feature identifier, which allows to construct links directly from position-specific annotation to specialized protein-related databases.
The format of feature identifier is XXX_number, where XXX is the 3-letter code, specific for the feature described, separated by an underscore from a 6 to 10-digit number.
Feature identifiers currently exist for the following topics: Propeptide, Chain, Peptide, Glycosylation, Alternative sequence and Natural variant.
|Subsection||Format of the identifier||Availability||Example|
|Propeptide||PRO_number||Any processed propeptide||Q7XAD0|
|PRO_number||Any mature polypeptide|| Q9W568
|Amino acid modifications|
|Glycosylation||CAR_number||Currently only for residues attached to an oligosaccharide structure annotated in the GlycoSuiteDB database||P02771|
|Alternative sequence||VSP_number||Any sequence with an ‘Alternative sequence’ feature||P81278|
|Natural variant||VAR_number||Currently only for protein sequence variants of Hominidae (great apes and humans)||P11171|