Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2020_02

Published April 22, 2020

Headline

Genome integrity maintenance by HMCES

Apurinic or apyrimidinic sites, also known as abasic or AP sites, are one of the most common DNA lesions. They occur at a frequency of about 15,000 per day in human cells. In double-stranded DNA, the majority of AP sites are removed by base excision repair. After removal of the lesion, the undamaged strand is used as a template for repair synthesis. AP sites also form in single-stranded DNA (ssDNA), but until recently there was no known mechanism involved in their repair in this context. A major breakthrough in the field was reported last year in Cell.

Mohni et al. were interested in HMCES. HMCES full name is 'stem cell-specific 5-hydroxymethylcytosine-binding protein'. It was originally thought to be a regulator of 5-hydroxymethylcytosine. However, it had also been identified in the replisome, a large protein machine that carries out DNA replication. HMCES is conserved in almost all organisms, even in those that do not utilize methylcytosine for epigenetic control. Taken together, these observations suggested that HMCES could bear another crucial function, possibly in replication. Surprisingly HMCES knockout in cells did not affect DNA replication, nor cell division, but rather exacerbated cell sensitivity toward several DNA-damaging agents. Knockout cells accumulated DNA damage and exhibited increased genetic instability. Different DNA-damaging agents were tested and the only common kind of lesion they induced was the formation of AP sites.

HMCES appears to act as the initiating step of a replication-coupled repair mechanism for abasic sites in ssDNA. In eukaryotic cells, HMCES interacts with proliferating cell nuclear antigen (PCNA), an essential factor for replication, and travels with replication forks. When it senses AP sites in ssDNA, it covalently crosslinks to ssDNA AP sites generating a DNA-protein intermediate. The nature of this crosslink has been identified by crystallographic studies as a stable thiazolidine DNA-protein linkage formed between the N-terminal cysteine and the aldehyde form of the AP deoxyribose. The crosslink is so stable that its resolution requires HMCES degradation via the proteasome. This sequence of events may appear counterintuitive. It is almost as if HMCES takes a bad situation and makes it worse. However, this crosslink effectively shields the lesion from endonucleases and error-prone trans-lesion bypass (TLS) polymerases, such as REV1 and REV3L, and prevents mutagenesis they might engender. The DNA repair mechanism acting downstream of HMCES is not known.

As of this release, human HMCES, as well as YedK, an Escherichia coli homolog have been updated and are available in UniProtKB/Swiss-Prot. The exact structure of the chemical crosslink was submitted to ChEBI where more details are provided.

UniProtKB news

Change of annotation topic 'Interaction'

The annotation topic 'Interaction' provides information about binary protein-protein interactions. This data is curated in the IntAct database and a quality-filtered subset is imported into UniProtKB at each release.

In the context of improving the functional annotation of different gene products in UniProtKB/Swiss-Prot, we have started to import more detailed data from IntAct. Our previous representation of a binary protein-protein interaction provided details only for the protein that was described in another entry. This left ambiguity in UniProtKB/Swiss-Prot entries that describe more than one protein (isoforms or/and products of proteolytic cleavage). To address this we now describe both interacting proteins by unique UniProtKB identifiers.

This change affects the three main UniProtKB distribution formats (text, XML, RDF). The details are described for each format in a separate section below. The following placeholders are used in the format descriptions:

  • <Interactant> represents a UniProtKB protein.
    • <Accession> is a UniProtKB accession number.
    • <IsoId> is a UniProtKB isoform ID.
    • <ProductId> is a UniProtKB product ID.
    • <Gene> is either the gene name, ordered locus name or ORF name of the gene that encodes the UniProtKB protein (see Gene names).
  • <Experiments> is the number of experiments in IntAct that support an interaction.
  • <IntActId> is an IntAct protein ID.

Note: The format descriptions make use of POSIX ERE syntax.

Text format

Previous format:

CC   -!- INTERACTION:
CC       <Interactant>( \(xeno\))?; NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       <Interactant>( \(xeno\))?; NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       ...

The <Interactant> was described in the following way:

Self|(<Accession>|<IsoId>):(<Gene>|-)

Where Self represents a self-interaction and a dash is shown for proteins with an undefined <Gene>. xeno is an optional flag that indicates that the interacting proteins are derived from different species. This may be due to the experimental set-up or may reflect a pathogen-host interaction.

New format:

CC   -!- INTERACTION:
CC       <Interactant>; <Interactant>;( Xeno;)? NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       <Interactant>; <Interactant>;( Xeno;)? NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       ...

Where

  • the first <Interactant> is represented by:
    (<Accession>|<IsoId>|<ProductId>)
    
  • the second <Interactant> is represented by:
    (<Accession>|<IsoId>|<ProductId> [<Accession>])(: <Gene>)?
    

Example: P11309

Binary interactions with different isoforms that are described in P11309.

Previous format:

CC   -!- INTERACTION:
CC       Q9BZS1-1:FOXP3; NbExp=3; IntAct=EBI-1018629, EBI-9695448;
CC       Q9UNQ0:ABCG2; NbExp=5; IntAct=EBI-1018633, EBI-1569435;

New format:

CC   -!- INTERACTION:
CC       P11309-1; Q9BZS1-1: FOXP3; NbExp=3; IntAct=EBI-1018629, EBI-9695448;
CC       P11309-2; Q9UNQ0: ABCG2; NbExp=5; IntAct=EBI-1018633, EBI-1569435;

Example: P27958 and Q9NPY3

Binary interaction with a product of proteolytic cleavage. Interactions involving products of proteolytic cleavage were previously not imported from IntAct, therefore only the new data/format is shown.

New data and format of P27958:

CC   -!- INTERACTION:
CC       PRO_0000037566; Q9NPY3: CD93; Xeno; NbExp=2; IntAct=EBI-6377335, EBI-1755002;

New data and format of Q9NPY3:

CC   -!- INTERACTION:
CC       Q9NPY3; PRO_0000037566 [P27958]; Xeno; NbExp=2; IntAct=EBI-1755002, EBI-6377335;

XML format

The UniProtKB XSD represents a binary interaction with:

  • two interactant elements of interactantType
  • a boolean organismsDiffer element that indicates that the interacting proteins are derived from different species. This may be due to the experimental set-up or may reflect a pathogen-host interaction.
  • an experiments element that gives the number of experiments in IntAct that support an interaction.

The interactantType uses an interactantGroup to represent a sequence of:

  • an id element
  • an optional label element

We have added an optional dbReference element to the interactantGroup to allow us to represent the UniProtKB <Accession> for a <ProductId>:

<xs:group name="interactantGroup">
        <xs:sequence>
            <xs:element name="id" type="xs:string"/>
            <xs:element name="label" type="xs:string" minOccurs="0"/>
            <xs:element name="dbReference" type="dbReferenceType" minOccurs="0"/>
        </xs:sequence>
    </xs:group>

Previous format:

<comment type="interaction">
  <interactant intactId="<IntActId>"/>
  <interactant intactId="<IntActId>">
    <id><Accession>|<IsoId></id>
    <label><Gene></label>
  </interactant>
  <organismsDiffer>true|false</organismsDiffer>
  <experiments><Experiments></experiments>
</comment>

New format:

<comment type="interaction">
  <interactant intactId="<IntActId>">
    <id><Accession>|<IsoId>|<ProductId></id>
  </interactant>
  <interactant intactId="<IntActId>">
    <id><Accession>|<IsoId>|<ProductId></id>
    <label><Gene></label>
    <!-- If <id> is a <ProductId>: -->
    <dbReference type="UniProtKB" id="<Accession>"/>
  </interactant>
  <organismsDiffer>true|false</organismsDiffer>
  <experiments><Experiments></experiments>
</comment>

Example: P11309

Binary interactions with different isoforms that are described in P11309.

Previous format:

<comment type="interaction">
  <interactant intactId="EBI-1018629"/>
  <interactant intactId="EBI-9695448">
    <id>Q9BZS1-1</id>
    <label>FOXP3</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>3</experiments>
</comment>
<comment type="interaction">
  <interactant intactId="EBI-1018633"/>
  <interactant intactId="EBI-1569435">
    <id>Q9UNQ0</id>
    <label>ABCG2</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>5</experiments>
</comment>

New format:

<comment type="interaction">
  <interactant intactId="EBI-1018629">
    <id>P11309-1</id>
  </interactant>
  <interactant intactId="EBI-9695448">
    <id>Q9BZS1-1</id>
    <label>FOXP3</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>3</experiments>
</comment>
<comment type="interaction">
  <interactant intactId="EBI-1018633">
    <id>P11309-2</id>
  </interactant>
  <interactant intactId="EBI-1569435">
    <id>Q9UNQ0</id>
    <label>ABCG2</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>5</experiments>
</comment>

Example: P27958 and Q9NPY3

Binary interaction with a product of proteolytic cleavage. Interactions involving products of proteolytic cleavage had previously not been imported from IntAct, therefore only the new data/format is shown.

New data and format of P27958:

<comment type="interaction">
  <interactant intactId="EBI-6377335">
    <id>PRO_0000037566</id>
  </interactant>
  <interactant intactId="EBI-1755002">
    <id>Q9NPY3</id>
    <label>CD93</label>
  </interactant>
  <organismsDiffer>true</organismsDiffer>
  <experiments>2</experiments>
</comment>

New data and format of Q9NPY3:

<comment type="interaction">
  <interactant intactId="EBI-1755002">
    <id>Q9NPY3</id>
  </interactant>
  <interactant intactId="EBI-6377335">
    <id>PRO_0000037566</id>
    <dbReference type="UniProtKB" id="P27958"/>
  </interactant>
  <organismsDiffer>true</organismsDiffer>
  <experiments>2</experiments>
</comment>

RDF format

The UniProt RDF schema ontology represents a binary interaction with an interaction property whose rdfs:range is the Interaction class. This class is the domain of the following properties that describe the interaction:

  • xeno is a boolean that indicates that the interacting proteins are derived from different species. This may be due to the experimental set-up or may reflect a pathogen-host interaction.
  • experiments gives the number of experiments in IntAct that support an interaction.

A Participant is identified by its unique IntAct identifier. It also refers to the corresponding UniProtKB protein which is represented as described in the news article about the functional annotation of different gene products in UniProtKB/Swiss-Prot. An optional rdfs:label property may provide the gene name, ordered locus name or ORF name of the gene that encodes the UniProtKB protein.

The RDF schema ontology required no changes to represent the more detailed data that we now import from IntAct. Due to the symmetry of binary interactions, the UniProt SPARQL server already provided access to the full details about both interacting proteins. We have however taken this opportunity to normalize the URI of a binary interaction so that the two UniProtKB entries that describe the interacting proteins refer to the interaction with the same URI:

Previous format:

<<Accession>#interaction-<IntActId>-<IntActId>> .

New format:

<http://purl.uniprot.org/intact/<IntActId>-<IntActId>> .

Example: P11309 and Q8N9N5

Previous format:

P11309:

<P11309#interaction-696621-744695>

Q8N9N5:

<Q8N9N5#interaction-744695-696621>

New format:

P11309 and Q8N9N5:

<http://purl.uniprot.org/intact/EBI-696621-EBI-744695>

Cross-references to Antibodypedia

Cross-references have been added to Antibodypedia, a portal providing access to publicly available research antibodies towards human protein targets from many different providers.

Antibodypedia is available at https://www.antibodypedia.com/.

The format of the explicit links is:

Resource abbreviationAntibodypedia
Resource identifierResource identifier
Optional information 1Number of antibodies

Example: P04626

Show all entries having a cross-reference to Antibodypedia.

Text format

Example: P04626

DR   Antibodypedia; 740; 5394 antibodies.

XML format

Example: P04626

<dbReference type="Antibodypedia" id="740">
   <property type="antibodies" value="5394 antibodies"/>
</dbReference>

RDF format

Example: P04626

uniprot:P04626
  rdfs:seeAlso <http://purl.uniprot.org/antibodypedia/740> .

<http://purl.uniprot.org/antibodypedia/740>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Antibodypedia> ;
  rdfs:comment "5394 antibodies" .

Cross-references to MetOSite

Cross-references have been added to MetOSite, a database of methionine sulfoxide sites. Each collected site has been classified according to the effect of its sulfoxidation on the biological properties of the modified protein. Thus, MetOSite documents cases where the sulfoxidation of methionine leads to gain or loss of activity, increased or decreased protein-protein interaction susceptibility, and to changes in protein stability or in subcellular location.

MetOSite is available at https://metosite.uma.es/.

The format of the explicit links is:

Resource abbreviationMetOSite
Resource identifierUniProtKB accession number

Example: P10987

Show all entries having a cross-reference to MetOSite.

Text format

Example: P10987

DR   MetOSite; P10987; -.

XML format

Example: P10987

<dbReference type="MetOSite" id="P10987"/>

RDF format

Example: P10987

uniprot:P10987
  rdfs:seeAlso <http://purl.uniprot.org/metosite/P10987> .
<http://purl.uniprot.org/metosite/P10987>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/MetOSite> .

Cross-references to PHI-base

Cross-references have been added to PHI-base, a database providing expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions.

PHI-base is available at http://www.phi-base.org/.

The format of the explicit links is:

Resource abbreviationPHI-base
Resource identifierResource identifier

Example: Q00310

Show all entries having a cross-reference to PHI-base.

Text format

Example: Q00310

DR   PHI-base; PHI:104; -.

XML format

Example: Q00310

<dbReference type="PHI-base" id="PHI:104"/>

RDF format

Example: Q00310

uniprot:Q00310
  rdfs:seeAlso <http://purl.uniprot.org/phi-base/PHI:104> .
<http://purl.uniprot.org/phi-base/PHI:104>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/PHI-base> .

Change to the cross-references to Human Protein Atlas (HPA)

We have changed the way we present the Human Protein Atlas database cross-references. Links between UniProtKB entries and HPA used to be established by HPA antibody identifier, but are now based on Ensembl Gene identifiers.

We have also introduced an additional field in these cross-references to indicate the level of RNA tissue specificity. The RNA specificity category is based on mRNA expression levels in the analyzed samples. The categories include: 'Tissue enriched', 'Group enriched', 'Tissue enhanced', 'Low tissue specificity' and 'Not detected'. For more details on these categories, see the Classification of transcriptomics data by Human Protein Atlas.

Text format

Example: Q9NSG2

Previous format:

DR   HPA; HPA023778; -.
DR   HPA; HPA024451; -.

New format:

DR   HPA; ENSG00000000460; Tissue enhanced (lymphoid).

XML format

Example: Q9NSG2

Previous format:

<dbReference type="HPA" id="HPA023778"/>
<dbReference type="HPA" id="HPA024451"/>

New format:

<dbReference type="HPA" id="ENSG00000000460">
  <property type="expression patterns" value="Tissue enhanced (lymphoid)"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: Q9NSG2

Previous format:

uniprot:Q9NSG2
  rdfs:seeAlso <http://purl.uniprot.org/hpa/HPA023778> ,
               <http://purl.uniprot.org/hpa/HPA024451> .
<http://purl.uniprot.org/hpa/HPA023778>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/HPA> .
<http://purl.uniprot.org/hpa/HPA024451>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/HPA> .

New format:

uniprot:Q9NSG2
  rdfs:seeAlso <http://www.proteinatlas.org/ENSG00000000460> .
<http://www.proteinatlas.org/ENSG00000000460>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/HPA> ;
  rdfs:comment "Tissue enhanced (lymphoid)" .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • Thiazolidine linkage to a ring-opened DNA abasic site
  • Deoxyhypusine

RDF news

Change of URIs for the Human Protein Atlas (HPA) database

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the Human Protein Atlas database have therefore been updated from:

http://purl.uniprot.org/hpa/<ID>

to:

http://www.proteinatlas.org/<ID>

If required for backward compatibility, you will be able to use the following query to add the old URIs:

PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX up:<http://purl.uniprot.org/core/>
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/HPA> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/HPA> .
   BIND(iri(concat('http://purl.uniprot.org/hpa/', substr(str(?new),29))) AS ?old)
}

The dereferencing of existing http://purl.uniprot.org/hpa/<ID> URIs will be maintained.

Standardized MD5 checksums in UniProt RDF

The UniProt databases UniProtKB, UniRef and UniParc have historically provided a CRC-64 checksum for the amino acid sequences. In the UniParc RDF representation we had already introduced an MD5 checksum, and we have now replaced it with a SPARQL 1.1 compliant MD5 representation (lowercase string) and use this across all databases. This allows to use the MD5 function defined in SPARQL 1.1 to check that the sequence string is not corrupted, without the need to use the lowercase (LCASE) function and a cast to string, as it was formerly the case:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX up:<http://purl.uniprot.org/core/>
SELECT ?computedMD5 ((?uniprotMD5 = ?computedMD5) AS ?md5SumsMatch)
WHERE
{
  ?protein a up:Protein ;
    up:sequence ?sequence .
  ?sequence rdf:value ?value ;
    up:md5Checksum ?uniprotMD5 .
  BIND(MD5(?value) AS ?computedMD5)
}
UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again