Last modified December 6, 2011
This subsection of the ‘Sequence annotation (Features)’ section indicates the positions and types of repeated sequence motifs or repeated domains within the protein.
Repeats vary from short amino acid repetitions, such as the polyglutamine tracts of the Huntington disease gene product huntingtin, to large repetitions containing multiple domains, such as in the cytoskeletal protein titin. One likely reason for their evolutionary success is that repeat-containing proteins are relatively cheap to evolve. That is, large and thermodynamically stable proteins may arise by the simple expedient of intragenic duplications rather than the more complex processes of de novo alpha-helix and beta-sheet creation.
1. Annotation of specific repeated sequence motifs
When no standard name exists, the ‘Region’ subsection of the ‘Sequence annotation (Features)’ section is used to describe the region containing all the repeats and, when possible, the pattern of the repeat. The ‘Repeat’ subsection is then used simply to specify the number and position of each such repeat.
Conventions: We define repeats as ‘half-length’ or ‘truncated’ when appropriate and as ‘approximate’, ‘degenerate’ or ‘atypical’ when they deviate significantly from the consensus sequence. We also omit the position of each individual repeat when they are extremely abundant.
Examples: P15497, P38479, P15305
2. Annotation of predicted repeats
A large number of repeated domains have been modelled by InterPro and by the REP program of Andrade and Bork. Repeats predicted using both of these resources are annotated in UniProtKB. When using REP, the e-value thresholds for reporting matches are initially set to their most conservative values, but may be relaxed to ensure that consistent numbers of repeats are annotated in orthologous proteins. The e-values may also be adjusted to ensure that the predicted number of repeats is consistent with the 3-dimensional structures the repeats may adopt. For instance, Kelch repeats form a propeller-like structure containing 5-7 tandem repeats. Armadillo (Arm) and HEAT repeats are very similar and to date all known proteins possess only of these two repeat types. Therefore if REP detects both repeat types in a single protein, all repeats are annotated as the most common of the two types reported by REP.
See also: Non-experimental qualifiers