UniProt release 12.4
Published October 23, 2007
More controlled vocabulary in the 'Subcellular location' subsection
Over 160'000 UniProtKB/Swiss-Prot entries (56%) contain a subcellular location description in the General Annotation section (CC lines in the flat file). We have standardized the content of these comments with the concomitant creation of a controlled vocabulary and a new, parsable flat-file format.
The subcellular location controlled vocabularies are stored in a new document (subcell.txt) which provides, for each individual UniProtKB location, topology or orientation term, the corresponding definition, as well as other relevant information, such as synonyms, hierarchies or mapped GO terms.
The format of the 'Subcellular location' subtopic has changed from free text to a more structured format. When required for the accurate description of a complex biological situation, free text is still used in the 'Note' (see for example O43918). In addition, since release 11.0, this subsection can occur more than once per entry, allowing specific annotation for each isoform, chain or peptide in separate subsections.
New document listing the controlled vocabularies used in the 'Subcellular location' subsection
The document subcell.txt, available by ftp and on the Web site, lists the controlled vocabularies used in the in the 'Subcellular location' subsection (CC SUBCELLULAR LOCATION lines in the flat file), their definitions and further information such as synonyms or relevant GO terms in the following format:
--------- ------------------------------- ---------------------------------------------- Line code Content Occurrence in an entry --------- ------------------------------- ---------------------------------------------- ID Identifier (location) Once; starts an entry IT Identifier (topology) Once; starts a 'topology' entry IO Identifier (orientation) Once; starts an 'orientation' entry AC Accession (SL-xxxx) Once DE Definition Once or more SY Synonyms Optional; Once or more SL Content of subc. loc. lines Once HI Hierarchy ('is-a') Optional; Once or more HP Hierarchy ('part-of') Optional; Once or more KW Associated keyword (accession) Optional; Once or more GO Gene ontology (GO) mapping Optional; Once or more WW Interesting links or references Optional; Once or more // Terminator Once; ends an entry
ID Cyanelle. AC SL-0082 DE A cyanelle is a photosynthetic organelle of glaucocystophyte algae. DE Cyanelles are surrounded by a double membrane and, in between, a DE peptidoglycan wall. Thylakoid membrane architecture and the presence DE of carboxysomes are cyanobacteria-like. Historically, the term DE cyanelle is derived from a classification as endosymbiotic DE cyanobacteria, and thus is not fully correct. SY Muroplast; Cyanoplast. SL Plastid, cyanelle. HI Plastid. KW KW-0194 GO GO:0009842; cyanelle //
Syntax modification of the 'Subcellular location' subtopic
We have structured the 'Subcellular location' subtopic (CC SUBCELLULAR LOCATION lines in the flat file) in order to improve the consistency of annotation and to allow to parse its content.
The new format of SUBCELLULAR LOCATION in the flat file is:
CC -!- SUBCELLULAR LOCATION:(( Molecule:)?( Location\.)+)?( Note=Free_text( Flag)?\.)?Where:
- Molecule: Isoform, chain or peptide name
- Location =
Subcellular_location( Flag)?(; Topology( Flag)?)?(; Orientation( Flag)?)?
- Subcellular_location: SL-line of subcell.txt ID-record
- Topology: SL-line of subcell.txt IT-record
- Orientation: SL-line of subcell.txt IO-record
- Flag =
Note: Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 1 or more times (+). Alternative values are separated by a pipe symbol (|).
CC -!- SUBCELLULAR LOCATION: Cytoplasm. Endoplasmic reticulum membrane; CC Peripheral membrane protein. Golgi apparatus membrane; Peripheral CC membrane protein.Q96QV1:
CC -!- SUBCELLULAR LOCATION: Cell membrane; Peripheral membrane protein CC (By similarity). Secreted (By similarity). Note=The last 22 C- CC terminal amino acids may participate in cell membrane attachment. CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm (Probable).P35670:
CC -!- SUBCELLULAR LOCATION: Golgi apparatus, trans-Golgi network CC membrane; Multi-pass membrane protein (By similarity). CC Note=Predominantly found in the trans-Golgi network (TGN). Not CC redistributed to the plasma membrane in response to elevated CC copper levels. CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm. CC -!- SUBCELLULAR LOCATION: WND/140 kDa: Mitochondrion.
Modification of the EC (Enzyme Commission) number format
EC numbers are used to describe enzyme reactions and are based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). The EC numbers and the reactions they describe are stored in the ENZYME and IntEnz databases.
In the UniProt Knowledgebase some enzymes are assigned so-called partial EC numbers where part of the numbers are replaced by dashes (e.g. EC 3.4.24.-). This happens in the following situations:
- The catalytic activity of the protein is not known exactly.
- The protein catalyzes a reaction that is known, but not yet included in the IUBMB EC list.
To distinguish these two meanings, we have started to use the letter 'n' with a preliminary number instead of a dash '-' for the latter case. The retrofit of those existing EC numbers of proteins in UniProtKB that catalyze a reaction that is known, but not yet included in the IUBMB EC list will be an ongoing process.
The catalytic activity of the protein is not known exactly:Q9VAC5:
DE ADAM 17-like protease precursor (EC 3.4.24.-).
The protein catalyzes a reaction that is known, but not yet included in the IUBMB's EC list:Q01468:
DE 4-oxalocrotonate tautomerase (EC 5.3.2.n1) (4-OT).Q8IV42:
DE L-seryl-tRNA(Sec) kinase (EC 2.7.1.n3) (O-phosphoseryl-tRNA(Sec) DE kinase) (PSTK).