SubmitCancel

Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2010_04

Published March 23, 2010

Headlines

UniProtKB wonder web

UniProtKB/Swiss-Prot was the first biomolecular database to include cross-references in its entries, long before the advent of the internet, and a high level of integration with other databases is a hallmark of the resource. UniProtKB is indeed a general interest database, and the cross-references it includes provide users with easy access to relevant additional information from more specialized resources.

The number of cross-references keeps growing. Over the past year, 21 new databases have been added and 6 out of the 8 phylogenomic databases cross-referenced in UniProtKB have been added during the last 10 months. Today 126 databases are explicitly cross-referenced in the knowledgebase. Most links are stored in the ‘Cross-references’ section.

As of this release, the total number of cross-references in UniProtKB/Swiss-Prot passed 13 million and the average number per entry is over 25. In TrEMBL, the unreviewed section of UniProtKB, the average number of cross-references per entry is approximately half lower (over 11). For both sections, the most represented databases reflect our information sources and annotation strategies. They are:
  1. EMBL-Bank (on average 1.7 cross-references per entry): the vast majority of UniProtKB sequences come from translated CDS submitted to the EMBL-Bank/GenBank/DDBJ, it is therefore not surprising that more than 98% UniProtKB/Swiss-Prot entries contain a cross-reference to the original nucleotide submission(s). For extensively studied organisms, such as human, the average number of EMBL-Bank cross-references may exceed 7.
  2. InterPro (on average 3.1 cross-references per entry): this integrated database classifies proteins at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. In UniProtKB, we have always paid special attention to domain and family annotation. InterPro predictions are automatically integrated into TrEMBL entries and domain/family annotation is later manually reviewed and completed before integration into Swiss-Prot.
  3. Gene Ontology (GO): UniProtKB annotators manually assign GO terms to all entries they curate and high-quality manually assigned GO terms from other GO Consortium groups are imported to ensure that a comprehensive collection of GO annotations is available through UniProtKB. In addition, UniProtKB incorporates GO terms generated from a range of electronic mapping methods. As a result, the number of GO cross-references per entry is expected to further grow significantly in the near future.

In addition to the “regular” ‘Cross-references’ section, the ‘Web resources’ section offers links to specific web pages or databases whose scope is too specialized to warrant the creation of specific cross-references. For instance, the IARC TP53 mutation database, a repository of somatic and germline TP53 mutations in human cancers is only available from the human p53 entry. Currently more than 6’500 entries contain ‘Web resources’ sections, which represent some 8’500 additional links. Note that links to relevant databases pepper all sections of Swiss-Prot entries. Cross-references to ENZYME are available from the EC numbers provided in the ‘Protein names’ subsection, links to PubMed from the ‘References’ section, etc.

In conclusion, for a complete overview on a given protein, users should use different resources, each of them shedding complementary light on the field. The coexistence of various databases does not imply competition between them, but rather collaboration, to better serve the life science community. UniProtKB may be used to get a manually reviewed summary of the current knowledge and to direct users to more specialized databases, such as organism-oriented, phylogenomic or genome annotation databases, for more detailed information.

For detailed statistics on cross-references, see our release notes, section 5 (‘Statistics for some line types’).

UniProtKB News

Change of release numbers

In the past, we have distinguished major and minor releases of the UniProt knowledgebase and this was reflected in the release number format: major releases were numbered x.0, minor releases were x.1, x.2, etc. We have abandoned this distinction and changed the format to YYYY_XX where YYYY is the calendar year and XX a 2-digit number that is incremented for each release of a given year, e.g. 2010_01, 2010_02, etc. We will archive previous releases on our ftp site for at least 2 years.

Change of release cycle

UniProt releases are now published every 4 weeks.

Cross-references to GenoList

Cross-references have been added to the GenoList Integrated Environment for the Analysis of Microbial Genomes. GenoList hosts numerous model organism databases for complete microbial genomes, including BuruList, ListiList, MypuList, PhotoList, SagaList and SubtiList which used to be cross-referenced from UniProtKB individually.
Relevant UniProtKB entries from the following organisms are therefore now linked to GenoList:

  • Mycobacterium ulcerans (strain Agy99) (formerly linked to BuruList)
  • Listeria monocytogenes and innocua (formerly linked to ListiList)
  • Mycoplasma pulmonis (formerly linked to MypuList)
  • Photorhabdus luminescens subsp. laumondii (formerly linked to PhotoList)
  • Streptococcus agalactiae serotype III (formerly linked to SagaList)
  • Bacillus subtilis (formerly linked to SubtiList)

GenoList is available at http://genodb.pasteur.fr/cgi-bin/WebObjects/GenoList.woa/

The format of the explicit links in the flat file is:

Resource abbreviation GenoList
Resource identifier Ordered locus name
Examples Q925X3:
DR   GenoList; LIN0124; -.
DR   GenoList; LIN2378; -.
DR   GenoList; LIN2564; -.
P37551:
DR   GenoList; BSU00470; -.

Show all the entries having a cross-reference to GenoList.

Changes concerning cross-references to BuruList, ListiList, MypuList, PhotoList, SagaList and SubtiList.

Cross-references to BuruList, ListiList, MypuList, PhotoList, SagaList and SubtiList have been removed.

Cross-references to ConoServer

Cross-references have been added to the Cone snail toxin database ConoServer. The ConoServer database is a manually curated database dedicated to conopeptides. ConoServer uses standardized names and a genetic and structural classification scheme to present data retrieved from UniProtKB, GenBank, the Protein Data Bank and the literature.

The ConoServer web site incorporates specialized features like the graphic display of post-translational modifications that are extensively present in conopeptides. ConoServer manages nucleic sequences, proteic sequences, and 3D structures. The aim of this resource is to give a comprehensive overview over the diversity of conopeptides and their uses as drugs, drug leads and diagnostic tools.

ConoServer is available at http://www.conoserver.org/.

The format of the explicit links in the flat file is:

Resource abbreviation ConoServer
Resource identifier ConoServer identifier
Optional information 1 Toxin name
Examples P0C8R2:
DR   ConoServer; 2838; ArIA precursor.
DR   ConoServer; 3450; Sequence 299 from Patent EP1852440.
P0C1W3:
DR   ConoServer; 1574; RVIIIA.

Show all the entries having a cross-reference to ConoServer.

Cross-references to MINT

Cross-references have been added to the Molecular INTeraction database MINT, which focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators.

MINT is available at http://mint.bio.uniroma2.it/mint/.

The format of the explicit links in the flat file is:

Resource abbreviation MINT
Resource identifier MINT interactor ID
Examples P00925:
DR   MINT; MINT-517950; -.
P0A887:
DR   MINT; MINT-1243319; -.

Show all the entries having a cross-reference to MINT.

Changes concerning keywords

New keywords: Modified keyword:

Changes in subcellular location controlled vocabulary

New subcellular location:
  • Host basolateral cell membrane

Changes concerning the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):
  • S-(dipyrrolylmethanemethyl)cysteine