UniProt release 15.15
Published March 2, 2010
Headlines
Bacillus subtilis, a Gram-positive model bacterium fully annotated in UniProtKB/Swiss-Prot
We are all aware of the importance of model bacterial systems. Escherichia coli K12 is the paradigm for Gram-negative bacteria, but what of Gram-positive bacteria? There are a large variety of these bacteria that serve us, are neutral or infect us, and model systems for these bacteria are in demand.
Bacillus subtilis, a rod-shaped, soil-and water-dwelling bacterium originally described as Vibrio subtilis in 1835 by Ehrenberg and renamed in 1872 by Cohn has served this role for over a century. B.subtilis differentiates to produce endospores, can be made naturally competent for DNA uptake and is a bacteriophage host. In the wild it has been seen to produce over 2 dozen different antibiotics. These characteristics make it an obvious choice as a model system for bacterial differentiation and genetics, as well as a model for other - often more dangerous - bacteria such as Bacillus anthracis, Mycobacterium tuberculosis or Staphylococcus aureus. Additionally, it is used for the production of various industrially interesting enzymes such as amylases and proteases. A substrain, B.subtilis natto, is used to prepare natto, a traditional Japanese dish made from fermented soybeans. Although B.subtilis is not considered pathogenic for any known organism, it has been isolated from patients suffering from various illness such as endocarditis, pneumonia etc., and also occasionally from spoiled food where it might be responsible for cases of food poisoning.
The genome of B.subtilis 168, a widely used laboratory strain, was sequenced by a large international consortium in 1997 - the 6th bacterium to be fully sequenced. The sequence was updated and reannotated in 2009 by the Institut Pasteur and the Génoscope. In coordination with them we have annotated the complete proteome, providing all 4'192 B.subtilis proteins in UniProtKB/Swiss-Prot, each of which has a cross-reference to the dedicated B.subtilis database SubtiList/GenoList as well as other databases. A list of all B.subtilis UniProtKB/Swiss-Prot entries is available in the bacsu.txt file. This of course provides a snapshot of the knowledge about this first fully manually annotated Gram-positive model organism and will date easily. Despite having been so intently studied for so long, there are many B.subtilis proteins about which we know very little. There will be work for years to come for the B.subtilis (and larger scientific) community as these proteins and their homologues are characterized.
All B.subtilis entries can be retrieved from UniProtKB/Swiss-Prot combining the organism name "Bacillus subtilis" (or the taxonomy identifier 1423) with the keyword 'Complete proteome' (organism:"Bacillus subtilis" AND keyword:"Complete proteome" or organism:1423 AND keyword:181).
UniProtKB News
Cross-references to EuPathDB
Cross-references have been added to the Eukaryotic Pathogen Database Resources EuPathDB (formerly ApiDB), an integrated database covering the eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources ("child databases": e.g. ToxoDB, PlasmoDB, CryptoDB...), and the opportunity to leverage orthology for searches across genera.
EuPathDB is available at http://www.eupathdb.org/.
The format of the explicit links in the flat file is:
| Resource abbreviation | EuPathDB |
|---|---|
| Resource identifier | Combination of the child database name and the accession number in this database concatenated by a ":". |
| Examples |
P84155: DR EuPathDB; TritrypDB:LmjF06.1270; -. Q38FA5: DR EuPathDB; TritrypDB:Tb09.160.2970; -. |
Show all the entries having a cross-reference to EuPathDB.
Cross-references to ProtClustDB
Cross-references have been added to Entrez Protein Clusters ProtClustDB, a collection of related protein sequences (clusters) which consists of Reference Sequence proteins encoded by complete genomes. This database contains both curated and non-curated clusters. The Protein Clusters database provides easy access to annotation information, publications, domains, structures, and external links and analysis tools including multiple alignments, phylogenetic trees, and genomic neighborhoods (ProtMap).
ProtClustDB is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.
The format of the explicit links in the flat file is:
| Resource abbreviation | ProtClustDB |
|---|---|
| Resource identifier | ProtClustDB accession number. |
| Examples |
P99178: DR ProtClustDB; PRK05431; -. P92693: DR ProtClustDB; MTH00098; -. |
Show all the entries having a cross-reference to ProtClustDB.
Cross-references to SUPFAM
Cross-references have been added to the Superfamily database of structural and functional annotation SUPFAM, a database of structural and functional annotation for all proteins and genomes. The SUPFAM annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level. A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from over 1,200 completely sequenced genomes against the hidden Markov models.
SUPFAM is available at http://supfam.org.
The format of the explicit links in the flat file is:
| Resource abbreviation | SUPFAM |
|---|---|
| Resource identifier | SUPFAM superfamily identifier. |
| Optional information 1 | SUPFAM superfamily domain name. |
| Optional information 2 | Number of hits found. |
| Examples |
P08519: DR SUPFAM; SSF57440; Kringle-like; 38. DR SUPFAM; SSF50494; Pept_Ser_Cys; 1. P00967: DR SUPFAM; SSF56042; AIR_synth_C; 2. DR SUPFAM; SSF53328; formyl_transf; 1. DR SUPFAM; SSF52440; PreATP-grasp-like; 1. DR SUPFAM; SSF55326; PurM_N-like; 2. DR SUPFAM; SSF51246; Rudmnt_hyb_motif; 1. |
Show all the entries having a cross-reference to SUPFAM.
Format change in the cross-references to HOVERGEN
The format of the cross-references to the HOVERGEN project has changed: The resource identifier, which was a UniProtKB accession number, has been replaced by a HOVERGEN identifier.
Example:
Previous format:
DR HOVERGEN; P32754; -.
New format:
DR HOVERGEN; HBG005987; -.
Show all the entries having a cross-reference to HOVERGEN.
Changes concerning keywords
New keywords:
Changes concerning the controlled vocabulary for PTMs
Modified term for the feature key 'Cross-link' ('CROSSLNK' in the flat file):
New terms:
- Alanine isoaspartyl cyclopeptide (Ala-Asn)
- Glycyl cysteine dithioester (Cys-Gly) (interchain with G-...)
- Trithiocysteine (Cys-Cys)
Modified terms for the feature key 'Lipidation' ('LIPID' in the flat file):
New terms:
- N-[(12R)-12-hydroxymyristoyl]cysteine
- N-(12-oxomyristoyl)cysteine
Modified terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):
New terms:
- S-(4-hydroxycinnamyl)cysteine
- S-cysteinyl cysteine
- Tele-(1,2,3-trihydroxypropan-2-yl)histidine
