UniProt release 13.0
Published February 26, 2008
Headlines
UniProtKB major release (13.0)
UniProt Knowledgebase release 13.0 includes Swiss-Prot release 55.0 and TrEMBL release 38.0.
Release 55.0 of 26-Feb-08 of UniProtKB/Swiss-Prot contains 356'194 sequence entries, comprising 127'836'513 amino acids abstracted from 165776 references. 80'183 sequences have been added since release 54.0, the sequence data of 1'411 existing entries has been updated and the annotations of 262'009 entries have been revised.
The following improvements were carried out in the last 7 months:
- We have changed the representation of non-standard amino acids (selenocysteine and pyrrolysine). They are now annotated under a new feature key 'Non- standard residue' ('NON_STD' in the flat file), and are represented in the sequence using the IUPAC/IUBMB recommended one-letter codes 'U' for selenocysteine and 'O' for pyrrolysine.
- We have structured the 'Subcellular location' subsection in the 'General annotation' section (CC line, topic SUBCELLULAR LOCATION in the flat file) in order to improve the consistency of annotation and to allow to parse its content.
- To be consistent with other comment line topics, we have applied small changes to the topics WEB RESOURCE and MASS SPECTROMETRY.
- We have enriched the controlled vocabulary for post-translational modification description with 14 new terms for the feature key 'MOD_RES'.
- We have added cross-references to 8 new databases: 2DBase-Ecoli, CleanEx, GeneID, PDBsum, PhosphoSite, RefSeq, VectorBase, and World-2DPAGE. We have removed cross-references to RZPD-ProtExp, and added resolution information to our PDB cross-references.
- We have added 9 new keywords, 3 have been modified and 1 deleted.
- We have added 11 new documents:
- Human and mouse protein kinases: classification and index
- Controlled vocabulary of subcellular locations and membrane topologies and orientations
- Brucella abortus strain 2308: entries and gene names
- bruab.txtBrucella abortus: entries and gene names
- Brucella melitensis: entries and gene names
- Brucella suis: entries and gene names
- Coxiella burnetii: entries and gene names
- Rickettsia bellii strain RML369-C: entries and gene names
- Rickettsia conorii: entries and gene names
- Rickettsia felis (Rickettsia azadi): entries and gene names
- Rickettsia typhi: entries and gene names
- We have created a non-redundant set of Arabidopsis proteins, including nuclear, mitochondrial and chloroplastic proteins, and the selected entries have been labelled with the keyword 'Complete proteome' to allow easy retrieval.
- We have begun the procedure to integrate automatically entries annotated by the HAMAP automated pipeline. In this pipeline, proteins from complete bacterial and archaeal proteomes, together with the related plastid proteins, are automatically annotated based on manually created family rules for complete protein annotation, with template-based feature propagation. We have established many checks and conditional propagation in order to ensure that automatic annotation will produce data of a quality up to that of manual curation.
UniProtKB News
New representation of non-standard amino acids (selenocysteine and pyrrolysine)
The non-standard amino acids selenocysteine and pyrrolysine used to be annotated in the 'Sequence annotation' section, 'Amino acid modifications' subsection, under the feature keys 'Selenocysteine' and 'Modified residue', respectively. In the sequence, selenocysteine was represented by the one-letter code 'C' and pyrrolysine by 'K'. In order to annotate these and future non-standard amino acids more adequately, we created a new key 'Non- standard residue'. The type of non-standard residue involved is indicated in the 'description' of the feature key. Sequences will accomodate the IUPAC/IUBMB recommended one-letter codes 'U' for selenocysteine and 'O' for pyrrolysine.
In the flat file, selenocysteine used to be described with the feature key SE_CYS and pyrrolysine with the more generic feature key MOD_RES. We have replaced these keys with the new feature key NON_STD (non-standard). The type of non-standard residue involved is indicated in the 'description' of the NON_STD key.
Former annotation in the flat file:
ID BTHD_DROME Reviewed; 249 AA.
..
FT SE_CYS 37 37
..
MPPKRNKKAE APIAERDAGE ELDPNAPVLY VEHCRSCRVF RRRAEELHSA LRERGLQQLQ
*
ID MTBB1_METAC Reviewed; 467 AA.
..
FT MOD_RES 356 356 Pyrrolysine (Probable).
..
RAVNFMKAAV QASPIPCHVD MGMGVGGIPM LETPPVDAVT RASKAMVEVA GVDGIKIGVG
*
Current annotation:
ID BTHD_DROME Reviewed; 249 AA.
..
FT NON_STD 37 37 Selenocysteine.
..
MPPKRNKKAE APIAERDAGE ELDPNAPVLY VEHCRSURVF RRRAEELHSA LRERGLQQLQ
*
ID MTBB1_METAC Reviewed; 467 AA.
..
FT NON_STD 356 356 Pyrrolysine (Probable).
..
RAVNFMKAAV QASPIPCHVD MGMGVGGIPM LETPPVDAVT RASKAMVEVA GVDGIOIGVG
*
UniProtKB/Swiss-Prot entries describing a selenocysteine- and pyrrolysine- containing sequences can be retrieved with the 'Selenocysteine' and 'Pyrrolysine' keywords, respectively.
Cross-references to PhosphoSite
Cross-references have been added to PhosphoSite, an expert-curated knowledgebase of information focused on protein phosphorylation mainly in vertebrates. In addition to phosphorylation sites curated from the literature, large numbers of new unpublished sites discovered by MS/MS analyses are being added regularly.
The Phosphorylation site database is available at http://phosphosite.cellsignal.com/.
The format of the explicit links in the flat file is:
| Resource abbreviation | PhosphoSite |
|---|---|
| Resource identifier | UniProtKB accession number. |
| Examples |
P01266: DR PhosphoSite; P01266; -. Q9JMH6: DR PhosphoSite; Q9JMH6; -. |
Cross-references to 2DBase-Ecoli
Cross-references have been added to 2DBase-Ecoli, the 2D-PAGE database of Escherichia coli. The 2DBase-Ecoli database currently contains 12 gels consisting of 1185 protein spots information in which 723 proteins where identified and annotated. Individual protein spots in the existing gels can be displayed, queried, analysed and compared in a tabular format based on various functional categories enabling quick and subsequent analysis.
The 2D-PAGE Database of Escherichia coli is available at http://2dbase.techfak.uni-bielefeld.de/.
The format of the explicit links in the flat file is:
| Resource abbreviation | 2DBase-Ecoli |
|---|---|
| Resource identifier | UniProtKB accession number. |
| Examples |
P02930: DR 2DBase-Ecoli; P02930; -. P04816: DR 2DBase-Ecoli; P04816; -. |
Changes concerning keywords
New keywords:
Changes in subcellular location controlled vocabulary
New subcellular locations:
- Extravirionic side
- Intravirionic side
Changes in PTM controlled vocabulary
New terms in the 'Amino acid modifications' subsection (feature key 'MOD_RES' in the flat file):
- 2'-methylsulfonyltryptophan
- 4,5-dihydroxyleucine
- (3R,4S)-3,4-dihydroxyproline
- Cyclopeptide (Ala-Pro)
- D-serine (Cys)
- D-serine (Ser)
- D-threonine
- N,N-dimethylalanine
- N-acetylisoleucine
- O-(5'-phospho-DNA)-serine
- O-(5'-phospho-DNA)-tyrosine
- O-(5'-phospho-RNA)-serine
- O-(5'-phospho-RNA)-tyrosine
- S-glutathionyl cysteine
