Sequence annotation (Features)
Last modified September 9, 2013
This section provide a precise but simple means for the annotation of sequence data.
It describes regions or sites of interest in the protein sequence. In general this section lists post-translational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also included in this section.
|Initiator methionine||Cleavage of the initiator methionine|
|Signal||Sequence targeting proteins to the secretory pathway or periplasmic space|
|Transit peptide||Extent of a transit peptide for organelle targeting|
|Propeptide||Part of a protein that is cleaved during maturation or activation|
|Chain||Extent of a polypeptide chain in the mature protein|
|Peptide||Extent of an active peptide in the mature protein|
|Topological domain||Location of non-membrane regions of membrane-spanning proteins|
|Transmembrane||Extent of a membrane-spanning region|
|Intramembrane||Extent of a region located in a membrane without crossing it|
|Domain||Position and type of each modular protein domain|
|Repeat||Positions of repeated sequence motifs or repeated domains|
|Calcium binding||Position(s) of calcium binding region(s) within the protein|
|Zinc finger||Position(s) and type(s) of zinc fingers within the protein|
|DNA binding||Position and type of a DNA-binding domain|
|Nucleotide binding||Nucleotide phosphate binding region|
|Region||Region of interest in the sequence|
|Coiled coil||Positions of regions of coiled coil within the protein|
|Motif||Short (up to 20 amino acids) sequence motif of biological interest|
|Compositional bias||Region of compositional bias in the protein|
|Active site||Amino acid(s) directly involved in the activity of an enzyme|
|Metal binding||Binding site for a metal ion|
|Binding site||Binding site for any chemical group (co-enzyme, prosthetic group, etc.)|
|Site||Any interesting single amino acid site on the sequence|
|Amino acid modifications|
|Non-standard residue||Occurence of non-standard amino acids (selenocysteine and pyrrolysine) in the protein sequence|
|Modified residue||Modified residues excluding lipids, glycans and protein cross-links|
|Lipidation||Covalently attached lipid group(s)|
|Glycosylation||Covalently attached glycan group(s)|
|Disulfide bond||Cysteine residues participating in disulfide bonds|
|Cross-link||Residues participating in covalent linkage(s) between proteins|
|Alternative sequence||Amino acid change(s) producing alternate protein isoforms|
|Natural variant||Description of a natural variant of the protein|
|Mutagenesis||Site which has been experimentally altered by mutagenesis|
|Sequence uncertainty||Regions of uncertainty in the sequence|
|Sequence conflict||Description of sequence discrepancies of unknown origin|
|Non-adjacent residues||Indicates that two residues in a sequence are not consecutive|
|Non-terminal residue||The sequence is incomplete. Indicate that a residue is not the terminal residue of the complete protein|
|Helix||Helical regions within the experimentally determined protein structure|
|Turn||Turns within the experimentally determined protein structure|
|Beta strand||Beta strand regions within the experimentally determined protein structure|
The exact boundaries of the described sequence feature, as well as its length, are provided. When a feature is known to extend beyond the position that is given in this section, the endpoint specification will be preceded by ‘<’ for features which continue to the N-terminal direction or by ‘>’ for features which continue to the C-terminal direction.
Unknown endpoints are denoted by ’?’.
Uncertain endpoints are denoted by a ’?’ before the position, e.g. ’?42’.
Some features are associated with a unique and stable feature identifier, which allows to construct links directly from position-specific annotation in the ‘Sequence annotation (Features)’ section to specialized protein-related databases.
The format of feature identifier is XXX_number, where XXX is the 3-letter code, specific for the feature described, separated by an underscore from a 6 to 10-digit number.
Feature identifiers currently exist for the following topics: Propeptide, Chain, Peptide, Glycosylation, Alternative sequence and Natural variant.
|Subsection||Format of the identifier||Availability||Example|
|Propeptide||PRO_number||Any processed propeptide||Q7XAD0|
|PRO_number||Any mature polypeptide|| Q9W568
|Amino acid modifications|
|Glycosylation||CAR_number||Currently only for residues attached to an oligosaccharide structure annotated in the GlycoSuiteDB database||P02771|
|Alternative sequence||VSP_number||Any sequence with an ‘Alternative sequence’ feature||P81278|
|Natural variant||VAR_number||Currently only for protein sequence variants of Hominidae (great apes and humans)||P11171|