UniProt release 11.0
Published May 29, 2007
UniProtKB/Swiss-Prot major release (53.0)
Release 53.0 of 29-May-07 of UniProtKB/Swiss-Prot contains 269'293 sequence entries, comprising 98'902758 amino acids abstracted from 156'204 references. 9'228 sequences have been added since release 52.0: this represents a 3.5% increase. In addition, the sequence data of 734 existing entries have been updated and the annotations of 210'454 entries have been revised.
The following improvements were carried out in the last 3 months:
- We have created a new comment subsection: SEQUENCE CAUTION. This subsection is used for all the comments related to submitted sequences which differ from the sequence shown in the entry because of conflicts, such as frameshifts, erroneous gene model predictions, erroneous translation or other discrepancies that cannot be described in the feature 'Conflict'.
- We have added cross-references to 3 new databases: BuruList, a database dedicated to the analysis of Mycobacterium ulcerans genome, Orphanet, a database dedicated to information on rare diseases and orphan drugs, and PseudoCAP, a Pseudomonas genome database.
UniProt Knowledgebase release 11.0 includes Swiss-Prot release 53.0 and TrEMBL release 36.0.
We are pleased to announce a new UniProt database, the UniProt Metagenomic and Environmental Sequences (UniMES) database, a repository specifically developed for metagenomic and environmental data. UniMES is available in FASTA format on the UniProt ftp servers, in the new subdirectory current_release/unimes:
New ftp directory for UniProt Metagenomic and Environmental Sequences (UniMES)
We have added a new subdirectory, current_release/unimes, to the UniProt ftp servers
New comment subsection SEQUENCE CAUTION
We have introduced a new subsection in the General Annotation (Comments) section, SEQUENCE CAUTION, to describe protein sequence reports that differ from the sequence that is shown in UniProtKB due to conflicts that are not described in the SEQUENCE CONFLICT lines (Features), such as frameshifts, erroneous gene model predictions, etc. This kind of information was before reported in the topic CAUTION together with other warnings that are unrelated to sequence conflicts.
The format of the SEQUENCE CAUTION topic is:
CC -!- SEQUENCE CAUTION: Sequence=Sequence; Type=Type;[ Positions=Positions;][ Note=Note;]
- Sequence is the sequence which differs from the UniProtKB sequence. It is described by one of:
- an EMBL protein identifier (with version number)
- an EMBL accession number.
- a literature reference (e.g. Ref. 3).
- Type describes the cause for the sequence difference(s) and is one of:
- Erroneous initiation
- Erroneous termination
- Erroneous gene model prediction
- Erroneous translation
- Miscellaneous discrepancy
- Positions describes the UniProtKB sequence position(s) or range(s) of the difference(s) where possible. Sometimes the term 'Several' is used to indicate that there are many differences.
- Note is an optional free text explanation.
These lines are not wrapped and their length may therefore exceed 75 characters.
Previous annotation: CC -!- CAUTION: Ref.2 (BAA97015) sequence differs from that shown due to CC erroneous gene model prediction. The predicted gene At5g49940 has CC been split into 2 genes: At5g49940 and At5g49945. New annotation: CC -!- SEQUENCE CAUTION: CC Sequence=BAA97015.1; Type=Erroneous gene model prediction; Note=The predicted gene At5g49940 has been split into 2 genes: At5g49940 and At5g49945;Q83M39:
Previous annotation: CC -!- CAUTION: Ref.1 and Ref.2 sequences differ from that shown due to a CC stop codon at position 273 which was translated as Gln to extend CC the sequence. New annotation: CC -!- SEQUENCE CAUTION: CC Sequence=AAN42076.1; Type=Erroneous termination; Positions=273; Note=Translated as Gln; CC Sequence=AAP15953.1; Type=Erroneous termination; Positions=273; Note=Translated as Gln;P17814:
Previous annotation: CC -!- CAUTION: Ref.1 (CAA36850) sequence differs from that shown due to CC a frameshift in position 496. CC -!- CAUTION: Ref.1 (CAA36850) sequence differs from that shown due to CC erroneous gene model prediction. New annotation: CC -!- SEQUENCE CAUTION: CC Sequence=CAA36850.1; Type=Erroneous gene model prediction; CC Sequence=CAA36850.1; Type=Frameshift; Positions=496;P0A7B3:
Previous annotation: CC -!- CAUTION: Ref.4 (X07863) sequence differs from that shown due to CC several frameshifts. CC -!- CAUTION: Ref.5 (Y00357) sequence differs from that shown due to CC frameshifts in positions 204, 215 and 282. New annotation: CC -!- SEQUENCE CAUTION: CC Sequence=X07863; Type=Frameshift; Positions=Several; CC Sequence=Y00357; Type=Frameshift; Positions=204, 215, 282;P27612:
Previous annotation: CC -!- CAUTION: Ref.2 (AAA39943) sequence differs from that shown due to CC frameshifts in positions 4, 32, and 42. CC -!- CAUTION: Ref.2 (AAA39943) sequence differs from that shown due to CC contaminating sequence. CC -!- CAUTION: Ref.3 sequence differs from that shown due to a CC frameshift in position 697. Current annotation: CC -!- SEQUENCE CAUTION: CC Sequence=AAA39943.1; Type=Miscellaneous discrepancy; Note=Several frameshifts and contaminating sequence; CC Sequence=Ref.3; Type=Frameshift; Positions=697;
Change in comment subsection SUBCELLULAR LOCATION
From now on, the comment subsection SUBCELLULAR LOCATION may occur more than once per entry.
Cross-references to PseudoCAP
Cross-references have been added to the Pseudomonas aeruginosa Community Annotation Project database. This database provides genome annotation of P. aeruginosa strain PAO1 and of other Pseudomonas species, acting as a valuable comparative resource for P. aeruginosa research, as well as being useful for the larger Pseudomonas research community. Over the coming year this database will be further enhanced toward more focus on comparative analysis of P. aeruginosa isolates and more specific information about putative drug and vaccine targets.
The Pseudomonas aeruginosa Community Annotation Project database is available at http://www.pseudomonas.com/.
The format of the explicit links in the flat file is:
|Resource identifier||Ordered locus name.|
Q9I576: DR PseudoCAP; PA0865; -.
Cross-references to Orphanet
Cross-references have been added to the Orphanet database. This database is dedicated to information on rare diseases and orphan drugs. It aims to improve management and treatment of genetic, auto-immune or infectious rare diseases, rare cancers, or not yet classified rare diseases. ORPHANET offers services adapted to the needs of patients and their families, health professionals and researchers, support groups and industry.
The Orphanet database is available at http://www.orpha.net/consor/cgi-bin/home.php?Lng=GB.
The format of the explicit links in the flat file is:
|Resource identifier||Orphanet unique disease identifier.|
|Optional information 1||Name of the disease.|
P26439: DR Orphanet; 418; Adrenal hyperplasia, congenital. DR Orphanet; 3185; Stein-Leventhal syndrome.
Changes concerning keywords