Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2014_11

Published November 26, 2014

Headline

Higher and higher

It is in human nature to push back the frontiers of what is possible. Modern humans left Africa and conquered the world. During their exploration, they met other humans who had already colonized the most improbable places tens of thousands of years earlier, maybe themselves being driven by the same urge to discover new horizons. Among the most challenging dwelling places is the Tibetan plateau, with an average elevation exceeding 4,500 meters. At this altitude, the oxygen concentration is only 60% of that available at sea level. Nevertheless, the Tibetan plateau is thought to have been inhabited for some 25,000 years.

To maintain oxygen homeostasis at high altitude (over 2,500 meters), the body responds in various ways, including increasing ventilation over the short term and increasing red blood cell production over the long term (see review). Hypoxia-inducible factor (HIF) plays a key role in the regulation of gene transcription in this process. HIF is a dimer composed of a common subunit beta, called ARNT, and 1 of 3 alpha subunits, called HIF1A, EPAS1, or HIF3A. Under normoxic conditions, HIFs-alpha are hydroxylated by prolyl hydroxylases EGLN1 (also known as PHD2), EGLN2 or EGLN3. Hydroxylation allows interaction with an E3-ubiquitin ligase, named VHL, followed by proteasomal degradation. Under hypoxic conditions, hydroxylation is arrested and HIFs-alpha are stabilized. They dimerize with ARNT and initiate the hypoxia response transcriptional program, which includes the stimulation of erythropoiesis. Strikingly, Tibetans exhibit a blunted erythropoietic response and their hemoglobin concentration is maintained at values expected at sea-level.

In 2010, 3 independent publications identified genes or loci showing evidence of hypoxia adaptation in Tibetans. All 3 studies pointed to 2 genes, among many others, being significantly associated with the decreased hemoglobin phenotype. They are EPAS1 and EGLN1. Interestingly, Tibetans may have inherited EPAS1 SNPs from Denisova man, an archaic Homo species identified in the Altai mountains of Siberia. The Tibetan-specific EGLN1 variant is more recent, currently estimated to have appeared some 8,000 years ago. It contains 2 single amino acid polymorphisms: p.Asp4Cys and p.Cys127Ser. Some characterization of this double variant came in September this year. Lorenzo et al. showed that it exhibited a lower K(m) value for oxygen, suggesting that it promotes increased HIF-alpha hydroxylation and degradation under hypoxic conditions. It could hence abrogate hypoxia-induced and HIF-mediated augmentation of erythropoiesis. Song et al. reported that the double variant specifically interferes with binding to PTGES3 (also called HSP90 cochaperone p23), but not to other known EGLN1 ligands, including FKBP8 or HSP90AB. As PTGES3-binding may facilitate HIF-alpha hydroxylation, a perturbation in this interaction would actually decrease HIF-alpha hydroxylation, hence decreased degradation and consequently increased HIF activity. The central question about the functional consequences of the Tibetan EGLN1 variant remains open…

It is not yet clear how high-altitude populations adapted to their harsh environment, but at least we begin to grasp the amazing complexity of this phenomenon. The scientific community has studied mostly 3 populations, Tibetans, Andeans and Ethiopians settled on the Simien plateau. They all exhibit patterns of genetic adaptation largely distinct from one another and the overlap is surprisingly low. The polymorphisms identified so far may not be straightforward loss- or gain-of-function, but they may instead fine tune complex interactions in which several proteins, possibly themselves carrying adaptive variations, are involved in a tissue-specific context.

As of this release, the UniProtKB/Swiss-Prot human EGLN1 has been updated with the new characterization data of the p.[Asp4Cys; Cys127Ser] polymorphism. On the new UniProt website, this information is to be found in the ‘Sequences’ section, ‘Polymorphism’ and ‘Natural variant’ subsections.

UniProtKB news

New mouse and zebrafish variation files

We would like to announce the release of two additional species, mouse and zebrafish, to the set of variation files available in the dedicated variants directory on the UniProt FTP sites. Both files catalogue protein altering Single Nucleotide Variants (SNVs or SNPs), stop-gained and stop-lost variants for UniProtKB/Swiss-Prot and UniProtKB/TrEMBL sequences of each species. These variants have been automatically mapped to UniProtKB sequences, including isoform sequences, through Ensembl. We very much welcome the feedback of the community on our efforts.

Structuring of ‘cofactor’ annotations

We have structured the previously free text cofactor annotations in UniProtKB and mapped individual cofactors to ChEBI identifiers. How this affects different UniProtKB distribution formats is described below.

Text format

 CC   -!- COFACTOR:( <molecule>:)?
(CC       Name=<cofactor>; Xref=<database>:<identifier>;( Evidence={<evidence>};)?)* 
(CC       Note=<free text>;)?

Note: Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 0 or more times (*).

A cofactor annotation consists of:

  • An optional <molecule> value that indicates the isoform, chain or peptide to which this annotation applies.
  • Zero or more cofactors that are each described with:
    • A Name= field that shows the cofactor name.
    • A Xref= field that shows a cross-reference to the corresponding ChEBI record.
    • An optional Evidence= field that provides the evidence for the cofactor (see Evidence in the UniProtKB flat file format)
  • An optional Note= field that provides additional information.

Each cofactor description and the optional Note= field start on a new line. Lines are wrapped at a line length of 75 characters and indented to increase readability.

Examples:

  • Protein binds alternate/several cofactors
    CC   -!- COFACTOR:
    CC       Name=Mg(2+); Xref=ChEBI:CHEBI:18420;
    CC         Evidence={ECO:0000255|HAMAP-Rule:MF_00086};
    CC       Name=Co(2+); Xref=ChEBI:CHEBI:48828;
    CC         Evidence={ECO:0000255|HAMAP-Rule:MF_00086};
    CC       Note=Binds 2 divalent ions per subunit (magnesium or cobalt).
    CC       {ECO:0000255|HAMAP-Rule:MF_00086};
    CC   -!- COFACTOR:
    CC       Name=K(+); Xref=ChEBI:CHEBI:29103;
    CC         Evidence={ECO:0000255|HAMAP-Rule:MF_00086};
    CC       Note=Binds 1 potassium ion per subunit. {ECO:0000255|HAMAP-
    CC       Rule:MF_00086};
    
  • Isoforms
    CC   -!- COFACTOR: Isoform 1:
    CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105;
    CC         Evidence={ECO:0000269|PubMed:16683188};
    CC       Note=Isoform 1 binds 3 Zn(2+) ions. {ECO:0000269|PubMed:16683188};
    CC   -!- COFACTOR: Isoform 2:
    CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105;
    CC         Evidence={ECO:0000269|PubMed:16683188};
    CC       Note=Isoform 2 binds 2 Zn(2+) ions. {ECO:0000269|PubMed:16683188};
    
  • Chains
    CC   -!- COFACTOR: Serine protease NS3:
    CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105;
    CC         Evidence={ECO:0000269|PubMed:9060645};
    CC       Note=Binds 1 zinc ion. {ECO:0000269|PubMed:9060645};
    CC   -!- COFACTOR: Non-structural protein 5A:
    CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105; Evidence={ECO:0000250};
    CC       Note=Binds 1 zinc ion in the NS5A N-terminal domain.
    CC       {ECO:0000250};
    
  • Cofactor unknown
    CC   -!- COFACTOR:
    CC       Note=Does not require a metal cofactor.
    CC       {ECO:0000269|PubMed:24450804};
    

XML format

We modified the XSD type commentType and introduced a new XSD type cofactorType as shown in red. We also moved the declaration of the molecule element – already used in the comment type "subcellular location" – to a more generic context so that it can also be used by other comment types such as "cofactor".

    <xs:complexType name="commentType">
        ...
        <xs:sequence>
            <xs:element name="molecule" type="moleculeType" minOccurs="0"/>
            <xs:choice minOccurs="0">
            ...
                <xs:sequence>
                    <xs:annotation>
                        <xs:documentation>Used in 'cofactor' annotations.</xs:documentation>
                    </xs:annotation>
                    <xs:element name="cofactor" type="cofactorType" maxOccurs="unbounded"/>
                </xs:sequence>

                <xs:sequence>
                    <xs:annotation>
                        <xs:documentation>Used in 'subcellular location' annotations.</xs:documentation>
                    </xs:annotation>
                    <!-- <xs:element name="molecule" type="moleculeType" minOccurs="0"/> -->
                    <xs:element name="subcellularLocation" type="subcellularLocationType" maxOccurs="unbounded"/>
                </xs:sequence>
                ...
            </xs:choice>
            ...
            <xs:element name="text" type="evidencedStringType" minOccurs="0">
                <xs:annotation>
                    <xs:documentation>Used to store non-structured types of annotations,
                    as well as optional free-text notes of structured types of annotations.</xs:documentation>
                </xs:annotation>
            </xs:element>
            ...
        </xs:sequence>
        ...
    </xs:complexType>
    ...
    <xs:complexType name="cofactorType">
        <xs:annotation>
            <xs:documentation>Describes a cofactor.</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="name" type="xs:string"/>
            <xs:element name="dbReference" type="dbReferenceType"/>
        </xs:sequence>
        <xs:attribute name="evidence" type="intListType" use="optional"/>
    </xs:complexType>

A cofactor annotation consists of a sequence of:

  • An optional molecule element that indicates the isoform, chain or peptide to which this annotation applies.
  • Zero or more cofactor elements that each describe an individual cofactor with the following child elements:
    • A name element shows the cofactor name.
    • A dbReference element represents a cross-reference to the corresponding ChEBI record.
  • An optional text element that provides additional information.

Examples:

  • Protein binds alternate/several cofactors
    <comment type="cofactor">
      <cofactor evidence="1">
        <name>Mg(2+)</name>
        <dbReference type="ChEBI" id="CHEBI:18420"/>
      </cofactor>
      <cofactor evidence="1">
        <name>Co(2+)</name>
        <dbReference type="ChEBI" id="CHEBI:48828"/>
      </cofactor>
      <text evidence="1">Binds 2 divalent ions per subunit (magnesium or cobalt).</text>
    </comment>
    <comment type="cofactor">
      <cofactor evidence="1">
        <name>K(+)</name>
        <dbReference type="ChEBI" id="CHEBI:29103"/>
      </cofactor>
      <text evidence="1">Binds 1 potassium ion per subunit.</text>
    </comment>
    ...
    <evidence key="1" type="ECO:0000255">
      <source>
        <dbReference type="HAMAP-Rule" id="MF_00086"/>
      </source>
    </evidence>
    
  • Isoforms
    <comment type="cofactor">
      <molecule>Isoform 1</molecule>
      <cofactor evidence="9">
        <name>Zn(2+)</name>
        <dbReference type="ChEBI" id="CHEBI:29105"/>
      </cofactor>
      <text evidence="9">Isoform 1 binds 3 Zn(2+) ions.</text>
    </comment>
    <comment type="cofactor">
      <molecule>Isoform 2</molecule>
      <cofactor evidence="9">
        <name>Zn(2+)</name>
        <dbReference type="ChEBI" id="CHEBI:29105"/>
      </cofactor>
      <text evidence="9">Isoform 2 binds 2 Zn(2+) ions.</text>
    </comment>
    ...
    <evidence key="9" type="ECO:0000269">
      <source>
        <dbReference type="PubMed" id="16683188"/>
      </source>
    </evidence>
    
  • Chains
    <comment type="cofactor">
      <molecule>Serine protease NS3</molecule>
      <cofactor evidence="13">
        <name>Zn(2+)</name>
        <dbReference type="ChEBI" id="CHEBI:29105"/>
      </cofactor>
      <text evidence="13">Binds 1 zinc ion.</text>
    </comment>
    <comment type="cofactor">
      <molecule>Non-structural protein 5A</molecule>
      <cofactor evidence="3">
        <name>Zn(2+)</name>
        <dbReference type="ChEBI" id="CHEBI:29105"/>
      </cofactor>
      <text evidence="3">Binds 1 zinc ion in the NS5A N-terminal domain.</text>
    </comment>
    ...
    <evidence key="3" type="ECO:0000250"/>
    ...
    <evidence key="13" type="ECO:0000269">
      <source>
        <dbReference type="PubMed" id="9060645"/>
      </source>
    </evidence>
    
  • Cofactor unknown
    <comment type="cofactor">
      <text evidence="1">Does not require a metal cofactor.</text>
    </comment>
    ...
    <evidence key="1" type="ECO:0000269">
      <source>
        <dbReference type="PubMed" id="24450804"/>
      </source>
    </evidence>
    

RDF format

We introduced a new cofactor property to list individual cofactors as ChEBI resource descriptions. As for other types of annotations, an optional sequence property may describe the molecule to which the annotation applies and an optional rdfs:comment property may provide additional information.

Examples:

Note: Evidence tags are omitted from the examples to make it easier to read them. They are represented as for all other types of annotations by reification of the concerned statements.

  • Protein binds alternate/several cofactors
    uniprot:Q5M434
      up:annotation SHA:1, SHA:2 ;
      ...
    SHA:1
      rdf:type up:Cofactor_Annotation ;
      rdfs:comment "Binds 2 divalent ions per subunit (magnesium or cobalt)." ;
      up:cofactor <http://purl.obolibrary.org/obo/CHEBI_18420> ,
                  <http://purl.obolibrary.org/obo/CHEBI_48828> .
    SHA:2
      rdf:type up:Cofactor_Annotation ;
      rdfs:comment "Binds 1 potassium ion per subunit." ;
      up:cofactor <http://purl.obolibrary.org/obo/CHEBI_29103> ;
    
  • Isoforms
    uniprot:O15304
      up:annotation SHA:1, SHA:2 ;
      ...
    SHA:1
      rdf:type up:Cofactor_Annotation ;
      rdfs:comment "Isoform 1 binds 3 Zn(2+) ions." ;
      up:cofactor <http://purl.obolibrary.org/obo/CHEBI_29105> ;
      up:sequence isoform:O15304-1 .
    SHA:2
      rdf:type up:Cofactor_Annotation ;
      rdfs:comment "Isoform 2 binds 2 Zn(2+) ions." ;
      up:cofactor <http://purl.obolibrary.org/obo/CHEBI_29105> ;
      up:sequence isoform:O15304-2 .
    
  • Chains
    uniprot:P26662
      up:annotation SHA:1, SHA:2 ;
      ...
    SHA:1
      rdf:type up:Cofactor_Annotation ;
      rdfs:comment "Binds 1 zinc ion." ;
      up:cofactor <http://purl.obolibrary.org/obo/CHEBI_29105> ;
      up:sequence annotation:PRO_0000037644 .
    SHA:2
      rdf:type up:Cofactor_Annotation ;
      rdfs:comment "Binds 1 zinc ion in the NS5A N-terminal domain." ;
      up:cofactor <http://purl.obolibrary.org/obo/CHEBI_29105> ;
      up:sequence annotation:PRO_0000037647 .
    
  • Cofactor unknown
    uniprot:A9CEQ7
      up:annotation SHA:1 ;
      ...
    SHA:1
      rdf:type up:Cofactor_Annotation ;
      rdfs:comment "Does not require a metal cofactor." ;
    

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases: