Sequence annotation (Features)
Last modified December 15, 2008
This section provide a precise but simple means for the annotation of sequence data.
It describes regions or sites of interest in the protein sequence. In general this section lists post-translational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also included in this section.
| Subsection | Content |
| Molecule processing | |
| Initiator methionine | Cleavage of the initiator methionine |
| Signal | Sequence targeting proteins to the secretory pathway or periplasmic space |
| Transit peptide | Extent of a transit peptide for organelle targeting |
| Propeptide | Part of a protein that is cleaved during maturation or activation |
| Chain | Extent of a polypeptide chain in the mature protein |
| Peptide | Extent of an active peptide in the mature protein |
| Regions | |
| Topological domain | Location of non-membrane regions of membrane-spanning proteins |
| Transmembrane | Extent of a membrane-spanning region |
| Domain | Position and type of each modular protein domain |
| Repeat | Positions of repeated sequence motifs or repeated domains |
| Calcium binding | Position(s) of calcium binding region(s) within the protein |
| Zinc finger | Position(s) and type(s) of zinc fingers within the protein |
| DNA binding | Position and type of a DNA-binding domain |
| Nucleotide binding | Nucleotide phosphate binding region |
| Region | Region of interest in the sequence |
| Coiled coil | Positions of regions of coiled coil within the protein |
| Motif | Short (up to 20 amino acids) sequence motif of biological interest |
| Compositional bias | Region of compositional bias in the protein |
| Sites | |
| Active site | Amino acid(s) directly involved in the activity of an enzyme |
| Metal binding | Binding site for a metal ion |
| Binding site | Binding site for any chemical group (co-enzyme, prosthetic group, etc.) |
| Site | Any interesting single amino acid site on the sequence |
| Amino acid modifications | |
| Non-standard residue | Occurence of non-standard amino acids (selenocysteine and pyrrolysine) in the protein sequence |
| Modified residue | Modified residues excluding lipids, glycans and protein cross-links |
| Lipidation | Covalently attached lipid group(s) |
| Glycosylation | Covalently attached glycan group(s) |
| Disulfide bond | Cysteine residues participating in disulfide bonds |
| Cross-link | Residues participating in covalent linkage(s) between proteins |
| Natural variations | |
| Alternative sequence | Amino acid change(s) producing alternate protein isoforms |
| Natural variant | Description of a natural variant of the protein |
| Experimental info | |
| Mutagenesis | Site which has been experimentally altered by mutagenesis |
| Sequence uncertainty | Regions of uncertainty in the sequence |
| Sequence conflict | Description of sequence discrepancies of unknown origin |
| Non-adjacent residues | Indicates that two residues in a sequence are not consecutive |
| Non-terminal residue | The sequence is incomplete. Indicate that a residue is not the terminal residue of the complete protein |
| Secondary structure | |
| Helix | Helical regions within the experimentally determined protein structure |
| Turn | Turns within the experimentally determined protein structure |
| Beta strand | Beta strand regions within the experimentally determined protein structure |
The exact boundaries of the described sequence feature, as well as its length, are provided. When a feature is known to extend beyond the position that is given in this section, the endpoint specification will be preceded by ‘<’ for features which continue to the N-terminal direction or by ‘>’ for features which continue to the C-terminal direction.
Example: P62756
Unknown endpoints are denoted by ’?’.
Example: P78586
Uncertain endpoints are denoted by a ’?’ before the position, e.g. ’?42’.
Example: Q3ZC31
Feature identifiers
Some features are associated with a unique and stable feature identifier, which allows to construct links directly from position-specific annotation in the ‘Sequence annotation (Features)’ section to specialized protein-related databases.
The format of feature identifier is XXX_number, where XXX is the 3-letter code, specific for the feature described, separated by an underscore from a 6 to 10-digit number.
Feature identifiers currently exist for the following topics: Propeptide, Chain, Peptide, Glycosylation, Alternative sequence and Natural variant.
| Subsection | Format of the identifier | Availability | Example |
| Molecule processing | |||
| Propeptide | PRO_number | Any processed propeptide | Q7XAD0 |
| Chain Peptide |
PRO_number | Any mature polypeptide | Q9W568 P15515 |
| Amino acid modifications | |||
| Glycosylation | CAR_number | Currently only for residues attached to an oligosaccharide structure annotated in the GlycoSuiteDB database | P02771 |
| Natural variations | |||
| Alternative sequence | VSP_number | Any sequence with an ‘Alternative sequence’ feature | P81278 |
| Natural variant | VAR_number | Currently only for protein sequence variants of Hominidae (great apes and humans) | P11171 |



