Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Sequences

Last modified December 15, 2015

This section displays by default the canonical protein sequence and upon request all isoforms described in the entry. It also includes information pertinent to the sequence(s), including length and molecular weight.

The protein sequence displayed by default is the protein sequence to which all positional annotation refers. We call it the ‘canonical’ sequence.

We use the official IUPAC amino acid one-letter code. For the amino acids selenocysteine (Sec; U) and pyrrolysine (Pyl; O), we follow the proposed nomenclature.

For each isoform, the name of the isoform is provided, as well as its length and molecular mass in Daltons. The mass is calculated on the basis of the amino acid composition of the entire sequence. It does not take into account PTMs, thus excluding any proteolytic processing.

The checksum of the displayed sequence is also given. Currently the checksum is a 64-bit CRC (Cyclic Redundancy Check) value (‘CRC64’) based on a algorithm described in the ISO 3309 standard. The generator polynomial used is x64 + x4 + x3 + x + 1 (See reference). Although in theory two different sequences could have the same CRC64 value, the likelihood that this would happen is extremely low.

References:

Nomenclature and symbolism for amino acids and peptides (Recommendation 1983)(IUPAC)

IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN) and Nomenclature Committee of IUBMB (NC-IUBMB), newsletter 1999

Press W.H., Flannery B.P., Teukolsky S.A., Vetterling W.T. “Numerical recipes in C”, 2nd ed., pp896-902, Cambridge University Press (1993)

Related documents