UniProt release 9.0
Published October 31, 2006
UniProtKB/Swiss-Prot major release (51.0)
UniProt Knowledgebase release 9.0 includes Swiss-Prot release 51.0 and TrEMBL release 34.0.
Release 51.0 of 31-Oct-06 of UniProtKB/Swiss-Prot contains 241'242 sequence entries, comprising 88'541'632 amino acids abstracted from 148'048 references.
19'061 sequences have been added since release 50.0, the sequence data of 1'336 existing entries has been updated and the annotations of 222'181 entries have been revised.
Many improvements were carried out in the last 5 months:
- We have changed the format of the ID line, replacing the terms STANDARD/PRELIMINARY by Reviewed/Unreviewed, and removing the legacy field MoleculeType.
- We have also standardized the FASTA header lines of UniProtKB and UniRef entries.
- Cross-references to five new databases have been added, and the cross-references to Gene Ontology (GO) have been modified to include the source database for the mapping.
- There are two new documents: Index of Oryza sativa entries and their corresponding gene designations, and Protein naming guidelines.
ID (IDentification) line
The format of the ID line was:
ID EntryName DataClass; MoleculeType; SequenceLength.
We have changed the values of the
as described in this table:
||Entries that have been manually reviewed and annotated by UniProtKB curators (Swiss-Prot section of the UniProt Knowledgebase).|
||Computer-annotated entries that have not been reviewed by UniProtKB curators (TrEMBL section of the UniProt Knowledgebase).|
We have also dropped the field
which was a legacy of compatibility with the EMBL flat file
format. The new format of the ID line is:
ID EntryName DataClass; SequenceLength.
ID CYC_PIG Reviewed; 104 AA.
ID Q3ASY8_CHLCH Unreviewed; 36805 AA.
FASTA header line
We have standardized the FASTA header line of UniProtKB and UniRef entries in the following way:
Format for UniProtKB
>UniqueIdentifier|EntryName ProteinName - OrganismName
UniqueIdentifieris the primary accession number of the UniProtKB entry, or, in the case of entries that describe several protein isoforms, an isoform identifier.
EntryNameis the entry name of the UniProtKB entry.
ProteinNameis the recommended or submitted protein name of the UniProtKB entry. (This is the name before the first bracket, excluding 'precursor' but including 'Fragment' if appropriate - see the Protein name documentation.)
OrganismNameis the scientific name of the organism of the UniProtKB entry.
>P24856|ANP_NOTCO Ice-structuring glycoprotein (Fragment) - Notothenia coriiceps neglecta
>P51650-1|SSDH_RAT Succinate semialdehyde dehydrogenase - Rattus norvegicus
Cross-references to Human Protein Atlas
Cross-references have been added to the Human Protein Atlas. The Human Protein Atlas shows the expression and localization of proteins in a large variety of normal human tissues and cancer cells. The data is presented as high resolution images representing immunohistochemically stained tissue sections. Each antibody in the database has been used for immunohistochemical staining of both normal and cancer tissue. The immunohistochemical protocols used result in a brown-black staining, localized where an antibody has bound to its corresponding antigen.
The format of the explicit links in the flat file is:
|Resource identifier||Antibody identifier.|
Cross-references to Gene Ontology (GO)
The last field of the cross-references to the Gene Ontology (GO) database has been modified. This field displays the GO evidence code and we have appended to this the source database from which the cross-reference was obtained, separated by a colon.
Q15738: DR GO; GO:0005783; C:endoplasmic reticulum; IDA:LIFEdb. P13569: DR GO; GO:0005524; F:ATP binding; TAS:ProtInc.
Changes concerning keywords
- S-adenosyl-L-methionine (new)