Last modified March 23, 2009
This subsection of the ‘Sequence annotation (Features)’ section describes natural variant(s) of the protein sequence.
We annotate natural variants, including polymorphisms, variations between strains, isolates or cultivars, disease-associated mutations and RNA editing events. We report the nature of the amino acid change, the name of the variant (or allele), when available, and the effect(s) of the variation on the protein, the cell or the complete organism.
Note that mutations that induce major changes in the protein sequence, such as frameshifts or premature stops, are not annotated: their deleterious effects on protein function are often obvious. While these mutations cannot be described in this subsection, the phenotype, if known, will be reported in the ‘Polymorphism’ or ‘Involvement in disease’ subsections of the ‘General annotation (Comments)’ section .
The sequence displayed by default in the entry (also called the canonical sequence) is usually the most common polymorphic variant or the most conserved in orthologous species. All polymorphisms are described with regard to this canonical sequence.
Most naturally occurring polymorphisms (also called single amino acid polymorphisms or SAPs) are due to a single nucleotide change (SNP) at the codon level. When a polymorphism results from more than one nucleotide change, it is indicated in the ‘Description’ field.
Additional information, such as the cell type or tissue of origin of the variant, the distribution or the frequency of the allele in a given population, is indicated in the ‘Description’ field, when available.
Examples: P30154, O14896
When several natural variants exist for a single position, they are annotated in distinct ‘Natural variant’ subsections.
Note that polymorphisms are by definition naturally occurring variants. Mutations resulting from (large-scale) mutagenesis screens of genetically tractable organisms, such as yeast or fly, are not considered as natural variants and are described in the ‘Mutagenesis’ subsection of the ‘Sequence annotation (Features)’ section.
Related keyword: Polymorphism
2. Disease-associated mutations
Information on disease-associated mutations is mostly restricted to human proteins. We describe the amino acid change, the abbreviation of the associated disease and the effect(s) of the variation on the protein, the cell and/or the organism, if known. Additional information about the disease itself is provided in the ‘Involvement in disease’ subsection of the ‘General annotation (Comments)’ section.
Examples: O43593, P26439
Validated human disease-associated polymorphisms from the NCBI dbSNP database are annotated and tagged with the dbSNP identifier in the ‘Description’ field. These SAPs are relatively rare as many disease-associated mutations have too low frequencies to be reported in dbSNP.
Nucleotide insertion/deletion are not described in detail in UniProtKB/Swiss-Prot as they usually produce a non functional protein. Additional information can be found by following links to the OMIM database.
Example: P51681 and allele ‘delta32 deletion’ ( OMIM )
Related keyword: Disease mutation
3. RNA editing
RNA editing events include conversion, insertion and deletion of nucleotides. We annotate only those RNA editing events that change the protein sequence. Silent events are not annotated. For partial RNA editing events, we show the translation of the underlying genomic DNA sequence and indicate the potential amino acid change due to RNA editing in the ‘Natural variant’subsection.
Additional information on RNA editing events can be found in the ‘RNA editing’ subsection of the ‘General annotation (Comments)’ section.
Related keyword: RNA editing