Last modified December 15, 2015
This section displays by default the canonical protein sequence and upon request all isoforms described in the entry. It also includes information pertinent to the sequence(s), including length and molecular weight.
The protein sequence displayed by default is the protein sequence to which all positional annotation refers. We call it the ‘canonical’ sequence.
We use the official IUPAC amino acid one-letter code. For the amino acids selenocysteine (Sec; U) and pyrrolysine (Pyl; O), we follow the proposed nomenclature.
For each isoform, the name of the isoform is provided, as well as its length and molecular mass in Daltons. The mass is calculated on the basis of the amino acid composition of the entire sequence. It does not take into account PTMs, thus excluding any proteolytic processing.
The checksum of the displayed sequence is also given. Currently the checksum is a 64-bit CRC (Cyclic Redundancy Check) value (‘CRC64’) based on a algorithm described in the ISO 3309 standard. The generator polynomial used is x64 + x4 + x3 + x + 1 (See reference). Although in theory two different sequences could have the same CRC64 value, the likelihood that this would happen is extremely low.
Press W.H., Flannery B.P., Teukolsky S.A., Vetterling W.T. “Numerical recipes in C”, 2nd ed., pp896-902, Cambridge University Press (1993)