You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.
TrEMBL release 26.0
Published March 2, 2004
TrEMBL Release Notes Release 26, March 2004 EMBL Outstation European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom Telephone: (+44 1223) 494 444 Fax: (+44 1223) 494 468 Electronic mail address: email@example.comfirstname.lastname@example.org WWW server: http://www.ebi.ac.uk/ Swiss Institute of Bioinformatics (SIB) Centre Medical Universitaire 1, rue Michel Servet 1211 Geneva 4 Switzerland Telephone: (+41 22) 702 50 50 Fax: (+41 22) 702 58 58 Electronic mail address: Amos.Bairoch@isb-sib.ch WWW server: http://www.expasy.org/ Acknowledgements UniProtKB/TrEMBL has been prepared by: o Maria Jesus Martin, Claire O'Donovan, Nicola Althorpe, Rolf Apweiler, Daniel Barrell, Kirsty Bates, Paul Browne, Kirill Degtyarenko, Ruth Eberhardt, Gill Fraser, Alexander Fedetov, Andre Hackmann, Alexander Kanapin, Youla Karavidopoulou, Paul Kersey, Ernst Kretschmann, Kati Laiho, Minna Lehvaslaiho, Michele Magrane, Michelle McHale, Virginie Mittard, Nicola Mulder, John F. O'Rourke, Markiyan Oliynyk, Sandra Orchard, Astrid Rakow, Sandra van den Broek, Eleanor Whitfield and Allyson Williams at the EMBL Outstation - European Bioinformatics Institute (EBI) in Hinxton, UK; o Amos Bairoch, Alexandre Gattiker, Karine Michoud, Isabelle Phan and Sandrine Pilbout at the Swiss Institute of Bioinformatics in Geneva, Switzerland. Copyright Notice TrEMBL copyright (c) 2004 EMBL-EBI This manual and the database it accompanies may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. Citation If you want to cite TrEMBL in a publication please use the following reference: Apweiler R., Bairoch A., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O'Donovan C., Redaschi N. and Yeh L.L. UniProt: the Universal Protein knowledgebase Nucleic Acids Res. 32: D115-D119 (2004) 1. Introduction UniProtKB/TrEMBL is a computer-annotated protein sequence database complementing the Swiss-Prot Protein Knowledgebase. UniProtKB/TrEMBL contains the translations of all coding sequences (CDS) present in the EMBL/GenBank/DDBJ Nucleotide Sequence Databases and also protein sequences extracted from the literature or submitted to Swiss-Prot, which are not yet integrated into UniProtKB/Swiss-Prot. For all UniProtKB/TrEMBL entries, UniProtKB/Swiss-Prot accession numbers have been assigned. 2. Why a complement to UniProtKB/Swiss-Prot? The ongoing gene sequencing and mapping projects have dramatically increased the number of protein sequences to be incorporated into UniProtKB/Swiss-Prot. We do not want to dilute the quality standards of UniProtKB/Swiss-Prot by incorporating sequences without proper sequence analysis and annotation but we do want to make the sequences available as quickly as possible. UniProtKB/TrEMBL achieves this second goal, and is a major step in the process of speeding up subsequent upgrading of annotation to the standard UniProtKB/Swiss-Prot quality. To address the problem of redundancy, the translations of all coding sequences (CDS) in the EMBL Nucleotide Sequence Database already included in Swiss-Prot have been removed from UniProtKB/TrEMBL. 3. The Release This UniProtKB/TrEMBL release has been produced in synch with Swiss-Prot release 43. It was created from the EMBL Nucleotide Sequence Database release 77 and updates until the 26-January-2004 and contains 1'069649 entries and 335'331'748 amino acids. UniProtKB/TrEMBL is organized in subsections: arc.dat (Archaea): 1850 entries arp.dat (Complete Archaeal proteomes): 30436 entries fun.dat (Fungi): 26545 entries hum.dat (Human): 33713 entries inv.dat (Invertebrates): 116930 entries mam.dat (Other Mammals): 14217 entries mhc.dat (MHC proteins): 9401 entries org.dat (Organelles): 87676 entries phg.dat (Bacteriophages): 9349 entries pln.dat (Plants): 95077 entries pro.dat (Prokaryotes): 94217 entries prp.dat (Complete Prokaryotic Proteomes): 293602 entries rod.dat (Rodents): 42282 entries unc.dat (Unclassified): 456 entries vrl.dat (Viruses): 96447 entries vrt.dat (Other Vertebrates): 20986 entries vrv.dat (Retroviruses): 96465 entries 67'168 new entries have been integrated in UniProtKB/TrEMBL. The sequences of 2443 UniProtKB/TrEMBL entries have been updated and the annotation has been updated in 441'151 entries. In the document deleteac.txt, you will find a list of all accession numbers which were previously present in UniProtKB/TrEMBL, but which have now been deleted from the database. 4. Format Differences Between UniProtKB/Swiss-Prot and UniProtKB/TrEMBL The format and conventions used by UniProtKB/TrEMBL follow as closely as possible that of UniProtKB/Swiss-Prot. Hence, it is not necessary to produce an additional user manual and extensive release notes for UniProtKB/TrEMBL. The information given in the UniProtKB/Swiss-Prot release notes and user manual are in general valid for UniProtKB/TrEMBL. The differences are mentioned below. The general structure of an entry is identical in UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. The data class used in UniProtKB/TrEMBL (in the ID line) is always 'PRELIMINARY', whereas in UniProtKB/Swiss-Prot it is always 'STANDARD'. Differences in line types present in UniProtKB/Swiss-Prot and TrEMBL: The ID line (IDentification): The entry name used in UniProtKB is the same as the Accession Number of the entry. The DT line (DaTe) The format of the DT lines that serve to indicate when an entry was created and updated are identical to that defined in UniProtKB/Swiss-Prot; but the DT lines in UniProtKB/TrEMBL refer to the UniProtKB/TrEMBL release. The difference is shown in the example below. DT lines in a UniProtKB/Swiss-Prot entry: DT 01-JAN-1988 (Rel. 06, Created) DT 01-JUL-1989 (Rel. 11, Last sequence update) DT 01-AUG-1992 (Rel. 23, Last annotation update) DT lines in a UniProtKB/TrEMBL entry: DT 01-NOV-1996 (UniProtKB/TrEMBLrel. 01, Created) DT 01-NOV-1999 (UniProtKB/TrEMBLrel. 12, Last sequence update) DT 01-MAR-2004 (UniProtKB/TrEMBLrel. 26, Last annotation update) 5. Weekly updates of UniProtKB/TrEMBL and non-redundant data sets 5.1 UniProtKB/TrEMBL updates Biweekly cumulative updates of TrEMBL are available by anonymous FTP and from the EBI SRS server. 5.2 UniProtKB We also produce biweekly a complete non-redundant protein sequence collection by providing three compressed files: uniprot.sprot.dat.gz and uniprot.trembl.dat.gz are in /pub/databases/uniprot/knowledgebase and uniprot.trembl_new.dat.gz is in /pub/databases/uniprot/knowledgebase/new on the EBI ftp server. This set of non-redundant files is especially important for two types of users: (i) Managers of similarity search services. They can now provide what is currently the most comprehensive and non-redundant data set of protein sequences. (ii) Anybody wanting to update their full copy of UniProtKB/Swiss-Prot + UniProtKB/TrEMBL to their own schedule without having to wait for full releases of UniProtKB/Swiss-Prot or UniProtKB/TrEMBL (UniProtKB). 5.3 XML A version of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL in XML format has been developed and is provided with this release. More information is available at http://www.ebi.uniprot.org/support/documents.shtml and the data can be downloaded from ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase We would welcome any feedback from the user community. 5.4 Varsplic Expand We also provide Varsplic Expand which is a program to generate " expanded" sequences from Swiss-Prot and TrEMBL records i.e. sequences including the variants specified by the varsplic, variant and conflict annotations. New records are produced in either pseudo-Swiss-Prot or FASTA format for each specified variant. More information and the data is available at ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase 6. Access/Data Distribution FTP server: ftp.ebi.ac.uk/pub/databases/trembl SRS server: http://srs.ebi.ac.uk/ UniProtKB/TrEMBL is also available on the UniProtKB/Swiss-Prot CD-ROM. UniProtKB/Swiss-Prot + UniProtKB/TrEMBL is searchable on the following servers at the EBI: FASTA3 (http://www.ebi.ac.uk/fasta33/) BLAST2 (http://www.ebi.ac.uk/blast2/) Scanps (http://www.ebi.ac.uk/scanps/) MPSrch (http://www.ebi.ac.uk/MPsrch/) For each UniProtKB/TrEMBL release, a synchronized version of the concurrent UniProtKB/Swiss-Prot release is distributed at ftp.ebi.ac.uk/pub/databases/trembl/swissprot/ 7. General announcements and Forthcoming changes 7.1 What's new We have introduced new resources for users to enable us to communicate effectively between releases about what is new in UniProtKB/Swiss-Prot and UniProtKB/TrEMBL and what is planned for the future. These are available at: http://us.expasy.org/sprot/relnotes/sp_news.html http://us.expasy.org/sprot/relnotes/sp_soon.html