UniProt release 2010_08
Published July 13, 2010
Headline
Viral reference strains: a virtual vaccine against virus pandemic in sequence databases
Viruses are not only the most abundant biological entities on the planet, they are also the most represented taxonomic group in UniProtKB. Without contest the title holder is the HIV-1 virus with about 350’000 entries. Taking into account that the HIV genomes encode about 9 proteins, these entries correspond to the equivalent of about 35’000 complete genomes!
While these numbers reflect the tremendous sequence diversity of viruses, they also make it difficult to find one’s way around, and users looking for general information on a viral species face a dilemma: which one to choose? Retrieving only manually reviewed proteins will still leave the user in doubt as the same viral proteins can be present by the dozen in UniProtKB/Swiss-Prot. For example, which Influenza A Hemagglutinin proteins should be selected preferentially among the 170 reviewed entries?
The UniProt solution to this problem is to define viral reference strains, each being representative of one virus genus, to curate them to the highest quality standards and to continuously maintain their annotation. The reference strains that have been selected are those whose genomes belong to the NCBI Reference Sequence collection (RefSeq). Therefore not only their proteomes, but also their genomes are carefully reviewed. The keyword ‘Virus reference strain’ has been created to allow their easy retrieval. At the current time we have defined 355 viral reference strains. These reference strains contain 12’576 proteins, of which 4’500 entries, most representing double strand DNA viruses, have been tagged with the ‘Virus reference strain’ keyword. We are actively updating the remaining 8’000 entries to provide a full set of tagged entries reflecting the diversity of the virus world.
Reference strains allow users to identify the strain with the best and most up-to-date information for any given virus. For bioinformaticians, they present another interesting feature as they can serve as templates for high quality automated annotation of other viruses of the same genus, following a pipeline analogous to the one used in UniProtKB for microbial proteins (see HAMAP program).
The viral reference strains are also accessible via the ViralZone fact sheet which provides links to the corresponding UniProtKB proteome and RefSeq genome (see for instance Influenza A).
UniProtKB News
Format change in the cross-references to WormBase
C.elegans and C.briggsae entries used to have cross-references to both WormPep and WormBase databases. WormPep is no longer active, and all worm sequences are contained in WormBase, a comprehensive database for biological information on worm sequences and annotation. We have therefore removed cross-references to WormPep and modified the WormBase cross-references to include transcript and protein identifiers from WormPep. Proteins with alternative products have one WormBase cross-reference per gene product.
Previous format in the flat file:
DR WormPep; TranscriptIdentifier; ProteinIdentifier.
DR WormBase; GeneIdentifier; GeneName.
New format:
DR WormBase; TranscriptIdentifier; ProteinIdentifier; GeneIdentifier; GeneName.
If there is no GeneName, a dash (’-’) is stored in that position.
Example: O45818
Previous format in the flat file:
DR WormBase; WBGene00012019; dkf-2.
DR WormPep; T25E12.4a; CE18967.
DR WormPep; T25E12.4b; CE18283.
DR WormPep; T25E12.4c; CE42507.
New format:
DR WormBase; T25E12.4a; CE18967; WBGene00012019; dkf-2.
DR WormBase; T25E12.4b; CE18283; WBGene00012019; dkf-2.
DR WormBase; T25E12.4c; CE42507; WBGene00012019; dkf-2.
Show all the entries having a cross-reference to WormBase.
Cross-references to WormPep have been removed.
Changes concerning keywords
New keywords:- Ligand-gated ion channel
- Activation of host autophagy by virus
- Activation of host caspases by virus
- Activation of host NF-kappa-B by virus
- Cleavage of host translation factors by virus
- Dephosphorylation of host translation factors by virus
- G0/G1 host cell cycle checkpoint dysregulation by virus
- G1/S host cell cycle checkpoint dysregulation by virus
- Host G2/M cell cycle arrest by virus
- Inhibition of host adaptive immune response by virus
- Inhibition of host apoptosis by viral BCL2-like protein
- Inhibition of host apoptosis by viral FLIP-like protein
- Inhibition of host autophagy by virus
- Inhibition of host tetherin by virus
- Inhibition of host caspases by virus
- Inhibition of host chemokines by virus
- Inhibition of host complement factors by virus
- Inhibition of host RIG-I by virus
- Inhibition of host MDA5 by virus
- Inhibition of host innate immune response by virus
- Inhibition of host interferon receptors by virus
- Inhibition of host IRF3 by virus
- Inhibition of host IRF7 by virus
- Inhibition of host IRF9 by virus
- Inhibition of host ISG15 by virus
- Inhibition of host JAK1 by virus
- Inhibition of host MAVS by virus
- Inhibition of host mitotic exit by virus
- Inhibition of host mRNA nuclear export by virus
- Inhibition of host NF-kappa-B by virus
- Inhibition of host poly(A)-binding protein by virus
- Inhibition of host PKR by virus
- Inhibition of host pre-mRNA processing by virus
- Inhibition of host RNA polymerase II by virus
- Inhibition of host STAT1 by virus
- Inhibition of host STAT2 by virus
- Inhibition of host TAP by virus
- Inhibition of host tapasin by virus
- Inhibition of host TBK1-IKBKE-DDX3 complex by virus
- Inhibition of host TRAFs by virus
- Inhibition of host transcription initiation by virus
- Inhibition of host TYK2 by virus
- Inhibition of host IFN-mediated response initiation by virus
- Inhibition of host interferon signaling pathway by virus
- Inhibition of host MHC class I molecule presentation by virus
- Inhibition of host MHC class II molecule presentation by virus
- Inhibition of host proteasome antigen processing by virus
- Modulation of host dendritic cell activity by virus
- Modulation of host cell apoptosis by virus
- Modulation of host cell cycle by viral cyclin-like protein
- Modulation of host cell cycle by virus
- Modulation of host chromatin by virus
- Modulation of host E3 ubiquitin ligases by virus
- Modulation of host immunity by viral IgG Fc receptor-like protein
- Evasion of host immunity by viral interleukin-like protein
- Modulation of host PP1 activity by virus
- Modulation of host ubiquitin pathway by viral deubiquitinase
- Modulation of host ubiquitin pathway by viral E3 ligase
- Modulation of host ubiquitin pathway by viral ubl
- Modulation of host ubiquitin pathway by virus
- Modulation of host NK-cell activity by virus
- Virus-mediated host mRNA decay by hyperadenylation
Changes concerning the controlled vocabulary for PTMs
New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):- S-(coelenterazin-3a-yl)cysteine
Deleted terms:
- Glutamyl lysine isopeptide (Gln-Lys) (interchain with K-...)
- Glutamyl lysine isopeptide (Lys-Gln) (interchain with Q-...)
