Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 12.4

Published October 23, 2007

Headlines

More controlled vocabulary in the 'Subcellular location' subsection

Over 160'000 UniProtKB/Swiss-Prot entries (56%) contain a subcellular location description in the General Annotation section (CC lines in the flat file). We have standardized the content of these comments with the concomitant creation of a controlled vocabulary and a new, parsable flat-file format.

The subcellular location controlled vocabularies are stored in a new document (subcell.txt) which provides, for each individual UniProtKB location, topology or orientation term, the corresponding definition, as well as other relevant information, such as synonyms, hierarchies or mapped GO terms.

The format of the 'Subcellular location' subtopic has changed from free text to a more structured format. When required for the accurate description of a complex biological situation, free text is still used in the 'Note' (see for example O43918). In addition, since release 11.0, this subsection can occur more than once per entry, allowing specific annotation for each isoform, chain or peptide in separate subsections.

UniProtKB News

New document listing the controlled vocabularies used in the 'Subcellular location' subsection

The document subcell.txt, available by ftp and on the Web site, lists the controlled vocabularies used in the in the 'Subcellular location' subsection (CC SUBCELLULAR LOCATION lines in the flat file), their definitions and further information such as synonyms or relevant GO terms in the following format:

     ---------  -------------------------------   ----------------------------------------------
     Line code  Content                           Occurrence in an entry
     ---------  -------------------------------   ----------------------------------------------
     ID         Identifier (location)             Once; starts an entry
     IT         Identifier (topology)             Once; starts a 'topology' entry
     IO         Identifier (orientation)          Once; starts an 'orientation' entry
     AC         Accession (SL-xxxx)               Once
     DE         Definition                        Once or more
     SY         Synonyms                          Optional; Once or more
     SL         Content of subc. loc. lines       Once
     HI         Hierarchy ('is-a')                Optional; Once or more
     HP         Hierarchy ('part-of')             Optional; Once or more
     KW         Associated keyword (accession)    Optional; Once or more
     GO         Gene ontology (GO) mapping        Optional; Once or more
     WW         Interesting links or references   Optional; Once or more
     //         Terminator                        Once; ends an entry
    

Example:

     ID   Cyanelle.
     AC   SL-0082
     DE   A cyanelle is a photosynthetic organelle of glaucocystophyte algae.
     DE   Cyanelles are surrounded by a double membrane and, in between, a
     DE   peptidoglycan wall. Thylakoid membrane architecture and the presence
     DE   of carboxysomes are cyanobacteria-like. Historically, the term
     DE   cyanelle is derived from a classification as endosymbiotic
     DE   cyanobacteria, and thus is not fully correct.
     SY   Muroplast; Cyanoplast.
     SL   Plastid, cyanelle.
     HI   Plastid.
     KW   KW-0194
     GO   GO:0009842; cyanelle
     //
    

Syntax modification of the 'Subcellular location' subtopic

We have structured the 'Subcellular location' subtopic (CC SUBCELLULAR LOCATION lines in the flat file) in order to improve the consistency of annotation and to allow to parse its content.

The new format of SUBCELLULAR LOCATION in the flat file is:

     CC   -!- SUBCELLULAR LOCATION:(( Molecule:)?( Location\\\\.)+)?( Note=Free_text( Flag)?\\\\.)?
    
Where:
  • Molecule: Isoform, chain or peptide name
  • Location = Subcellular_location( Flag)?(; Topology( Flag)?)?(; Orientation( Flag)?)?
    • Subcellular_location: SL-line of subcell.txt ID-record
    • Topology: SL-line of subcell.txt IT-record
    • Orientation: SL-line of subcell.txt IO-record
    • Flag = \\\\(By similarity|Probable|Potential\\\\)

Note: Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 1 or more times (+). Alternative values are separated by a pipe symbol (|).

Examples:

P32755:
     CC   -!- SUBCELLULAR LOCATION: Cytoplasm. Endoplasmic reticulum membrane;
     CC       Peripheral membrane protein. Golgi apparatus membrane; Peripheral
     CC       membrane protein.
    
Q96QV1:
     CC   -!- SUBCELLULAR LOCATION: Cell membrane; Peripheral membrane protein
     CC       (By similarity). Secreted (By similarity). Note=The last 22 C-
     CC       terminal amino acids may participate in cell membrane attachment.
     CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm (Probable).
    
P35670:
     CC   -!- SUBCELLULAR LOCATION: Golgi apparatus, trans-Golgi network
     CC       membrane; Multi-pass membrane protein (By similarity).
     CC       Note=Predominantly found in the trans-Golgi network (TGN). Not
     CC       redistributed to the plasma membrane in response to elevated
     CC       copper levels.
     CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm.
     CC   -!- SUBCELLULAR LOCATION: WND/140 kDa: Mitochondrion.
    

Modification of the EC (Enzyme Commission) number format

EC numbers are used to describe enzyme reactions and are based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). The EC numbers and the reactions they describe are stored in the ENZYME and IntEnz databases.

In the UniProt Knowledgebase some enzymes are assigned so-called partial EC numbers where part of the numbers are replaced by dashes (e.g. EC 3.4.24.-). This happens in the following situations:

  1. The catalytic activity of the protein is not known exactly.
  2. The protein catalyzes a reaction that is known, but not yet included in the IUBMB EC list.

To distinguish these two meanings, we have started to use the letter 'n' with a preliminary number instead of a dash '-' for the latter case. The retrofit of those existing EC numbers of proteins in UniProtKB that catalyze a reaction that is known, but not yet included in the IUBMB EC list will be an ongoing process.

Examples:

The catalytic activity of the protein is not known exactly:

Q9VAC5:
     DE   ADAM 17-like protease precursor (EC 3.4.24.-).
    

The protein catalyzes a reaction that is known, but not yet included in the IUBMB's EC list:

Q01468:
     DE   4-oxalocrotonate tautomerase (EC 5.3.2.n1) (4-OT).
    
Q8IV42:
     DE   L-seryl-tRNA(Sec) kinase (EC 2.7.1.n3) (O-phosphoseryl-tRNA(Sec)
     DE   kinase) (PSTK).
    
UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again