UniProt release 2013_05
Published May 1, 2013
Human genetic diseases in UniProtKB/Swiss-Prot
During the past decade, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. More than 100 causative genes in various Mendelian disorders have been identified by means of whole exome sequencing. However, the wealth of variation data made available by NGS is not sufficient, alone, to understand the mechanisms underlying disease pathogenesis and manifestation. Diseases are the consequences of series of events that include not only primary mutations in disease-causing genes, but also variations in disease-modifying genes, as well as the combined effects of gene-gene and gene-environment interactions. That is why new approaches to unravel disease mechanisms are based on biological network analysis.
In addition to providing a large amount of information on protein functions, interactions and biological pathways, UniProt pays particular attention to the annotation of human genetic diseases and disease-linked variants. Information on genetic diseases is shown in the ‘Involvement in disease’ subsection of the ‘General Annotation (Comments)’ section. In the current release, over 4,600 phenotypes are described in close to 3,000 human entries. The great majority of UniProtKB disease descriptions have links to the Online Mendelian Inheritance in Man knowledgebase (OMIM), allowing users to retrieve more detailed information.
In order to improve the clarity of medical annotation and to facilitate the retrieval of disease information from UniProtKB, we have modified the format of the subsection ‘Involvement in disease’. The newly modified subsection is organized in 2 parts. Firstly, the disease name, acronym and features are defined using a controlled vocabulary. Secondly, the role of the gene/protein in the disease is described in a ‘Note:’, that allows discrimination between disease-causing, disease-modifying and susceptibility genes. This note, partly written in free text, provides information on the biological context or other interesting information that may not be directly related to the phenotype description, such as the involvement of different proteins in the pathological mechanism. For example, multiple sulfatase deficiency (MSD) is due to the simultaneous decrease of activity of all sulfatases. However, the primary cause is a mutation in SUMF1, an enzyme required for post-translational modification and catalytic activation of these enzymes. This additional information is stored in the ‘Involvement in disease’ note.
Genetic diseases annotated in UniProtKB/Swiss-Prot are indexed in the humdisease.txt file, available for our users as of this release. Each record in this file consists of a disease identifier, acronym, and description, as well as known disease synonyms, links to OMIM, Medical Subject Headings (MeSH) and associated UniProtKB keywords.
Complete proteomes for Ensembl species
For UniProt release 2013_05, one new species from Ensembl vertebrates and 3 new Ensembl Genomes have been made available. These are:
In addition to the new imports, existing proteomes derived from Ensembl species have been updated with data from Ensembl release 70.
All predicted protein sequences from an Ensembl Genome are mapped to their UniProtKB counterparts under stringent conditions: 100% identity over 100% of the length of the two sequences is required. Any sequence found to be absent from UniProtKB is imported into the unreviewed component of UniProtKB, UniProtKB/TrEMBL. All UniProtKB entries that map to an Ensembl Genome are used to build the proteome; they are tagged with the keyword Complete proteome and an Ensembl Genome cross-reference is added.
We very much welcome the feedback of the community on our efforts. In future UniProt releases, we expect to make proteomes for the remaining Ensembl and Ensembl Genomes species currently absent from UniProtKB.
Removal of the cross-reference to GenomeReviews
Cross-references to GenomeReviews have been removed.
Changes to keywordsNew keywords:
- Acetylcholine receptor inhibitor -> Acetylcholine receptor inhibiting toxin