UniProt release 12.5
Published November 13, 2007
Headlines
Acanthamoeba polyphaga mimivirus, a "giant" virus in UniProtKB/Swiss-Prot
Mimivirus (for mimicking microbe) is a new viral genus containing a single identified species, Acanthamoeba polyphaga mimivirus (APMV), discovered by Didier Raoult's lab in 1992 within the amoeba Acanthamoeba polyphaga while working on Legionellosis. The virion has a non-enveloped, icosahedral capsid with a diameter of 400 nm and protein filaments projecting from its surface. The capsid contains the internal core surrounded by an internal lipid layer. Its linear, double- stranded DNA genome is roughly 1.2 million bp in length, the largest viral genome known so far. Its replication cycle, genome and capsid structure place it into the nucleocytoplasmic large DNA viruses (NCLDVs), which include amongst others the poxviruses and iridoviruses.
This virus is amazing in many ways. It is the largest virus ever isolated, with a genome size and complexity comparable to that of a small bacterium. A thorough bioinformatics analysis carried out by the group of Jean-Michel Claverie uncovered 909 potential protein-coding genes. Some of these proteins belong to families that are shared with all or some NCLDVs, many have eukaryotic counterparts and there are quite a number of ORFans (no sequence similarity to proteins from other genomes). It was a surprise to find an appreciable number of genes coding for proteins involved in metabolism, DNA repair pathways and, most surprising, genes encoding a partially functional protein translation apparatus. Mimivirus does indeed encode four aminoacyl-tRNA synthetases (ArgRS, CysRS, MetRS, TyrRS), as well as various translation initiation, elongation and termination factors. It is very intriguing to find, in a virus, genes corresponding to central components of the protein translation machinery, a biochemical process widely thought to be an exclusive signature of cellular organisms.
The discovery of this amazing virus has lead to the concept of "giant" virus and implies that there is an overlap in terms of particle dimension, genome size, and genetic complexity between the viral and cellular organism worlds.
A special effort has been made in UniProtKB/Swiss-Prot database to provide the complete, fully annotated mimivirus proteome. We have also integrated all proteomics and structural information that has been made available by the groups of Jean-Michel Claverie and Chantal Abergel.
To get all UniProtKB mimivirus entries, click here.
UniProtKB News
Format change in the ptmlist.txt document file
The ptmlist.txt document, which is available by ftp and on the Web site, describes the post-translational modifications (PTMs) that are annotated in UniProtKB/Swiss-Prot entries in the sequence annotation section (Features) (FT lines in the flat file) in the subsections "Cross-link" (CROSSLNK key in the flat file), "Lipidation" (LIPID key in the flat file) and "Modified residue" (MOD_RES key in the flat file). The document was in a format that was suitable for computer applications (e.g. ExPASy's proteomics tools), but which was not very human readable. The new file format should improve this.
Previous format:
N,N-dimethylproline MOD_RES P BB Nter C2H4 28.031300 28.06 in e:6446,7586,33682 Methylation FT=MOD_RES%20dimethylproline&wild=1 AA0066 MOD:00075
New format:
ID N,N-dimethylproline AC PTM-0179 FT MOD_RES TG Proline. PA Amino acid backbone. PP N-terminal. CF C2 H4 MM 28.031300 MA 28.06 LC Intracellular localisation. TR Eukaryota; taxId:6446 (Sipunculus nudus), taxId:7586 (Echinodermata), taxId:33682 (Euglenozoa). KW Methylation. DR RESID:AA0066. DR MOD:00075. //
With the following definitions of the line types:
--------- --------------------------- ----------------------
Line code Content Occurrence in an entry
--------- --------------------------- ----------------------
ID Identifier (FT description) Once; starts a PTM entry.
AC Accession (PTM-xxxx) Once.
FT Feature key Once.
TG Target Once; two targets separated
by a dash in case of intrachain
crosslinks.
PA Position of the modified Optional, once.
amino acid
PP Position of the modification Optional, once.
in the polypeptide
CF Correction formula Optional, once.
MM Monoisotopic mass difference Optional, once.
MA Average mass difference Optional, once.
LC Cellular location Optional, once; alternatives
can be proposed.
TR Taxonomic range Optional, once or more.
KW Keyword Optional, once or more.
DR Cross-reference to PTM Optional, once or more.
databases
// Terminator Once; ends an entry.
Changes concerning cross-references to PDB
We added an additional field to the cross-reference (DR line in the flat file) to the PDB database to show the resolution of structures that were determined by X-ray crystallography or electron microscopy.
For the chain names we use now the remediated data from wwPDB, therefore the chain names have changed for some entries.
Previous format:
DR PDB; ENTRY_NAME; METHOD; CHAIN.
New format:
DR PDB; ENTRY_NAME; METHOD; RESOLUTION; CHAIN.
Examples:
Q20728:DR PDB; 1LPL; X-ray; 1.77 A; A=135-229.Q5HEB7:
DR PDB; 2I8C; X-ray; 2.46 A; A/B=1-356.
A dash indicates that we found no information about the resolution or that the field is not applicable (for NMR structures and theoretical models).
Examples:
P02768:DR PDB; 2ESG; X-ray; -; C=25-609.P12872:
DR PDB; 1LBJ; NMR; -; A=26-47.P0AC41:
DR PDB; 2AD0; Model; -; A=1-588.
Cross-references to CleanEx
Cross-references have been added to the CleanEx database of gene expression profiles. CleanEx is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparison.
The CleanEx database is available at http://www.cleanex.isb-sib.ch/.
The format of the explicit links in the flat file is:
| Resource abbreviation | CleanEx |
|---|---|
| Resource identifier | Combination of a species code and a gene identifier. |
| Examples |
O08788: DR CleanEx; MM_DCTN1; -. P78358: DR CleanEx; HS_CTAG1A; -. DR CleanEx; HS_CTAG1B; -. |
Changes concerning keywords (KW line)
Modified keyword:
- Phosphorylation -> Phosphoprotein
