You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.
Swiss-Prot release 18.0
Published May 1, 1991
SWISS-PROT RELEASE 18.0 RELEASE NOTES 1. INTRODUCTION 1.1 Evolution Release 18.0 of SWISS-PROT contains 20772 sequence entries, comprising 6'792'034 amino acids abstracted from 20580 references. This represents an increase of 4% over release 17. The recent growth of the data bank is summarized below: Release Date Number of entries Nb of amino acids 3.0 11/86 4160 969 641 4.0 04/87 4387 1 036 010 5.0 09/87 5205 1 327 683 6.0 01/88 6102 1 653 982 7.0 04/88 6821 1 885 771 8.0 08/88 7724 2 224 465 9.0 11/88 8702 2 498 140 10.0 03/89 10008 2 952 613 11.0 07/89 10856 3 265 966 12.0 10/89 12305 3 797 482 13.0 01/90 13837 4 347 336 14.0 04/90 15409 4 914 264 15.0 08/90 16941 5 486 399 16.0 11/90 18364 5 986 949 17.0 02/91 20024 6 524 504 18.0 05/91 20772 6 792 034 1.2 Source of data Release 18.0 has been updated using protein sequence data from release 27.0 of the PIR (Protein Identification Resource) protein data bank, as well as translation of nucleotide sequence data from release 26.0 of the EMBL Nucleotide Sequence Data Library. As an indication to the source of the sequence data in the SWISS-PROT data bank we list here the statistics concerning the DR (Databank Reference) pointer lines: Entries with pointer(s) to only PIR entri(es): 3863 Entries with pointer(s) to only EMBL entri(es): 3435 Entries with pointer(s) to both EMBL and PIR entri(es): 13001 Entries with no pointers lines: 473 2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 17 2.1 Sequences and annotations About 780 sequences have been added since release 17, the sequence data of 139 existing entries has been updated and the annotations of 3010 entries have been revised. In particular we have used reviews articles to update the annotations of the following groups or families of proteins: - Bacterial luciferase subunits - Beta-amylases - Carboxylesterases type-B - Class-I aminoacyl-tRNA synthetases - Clusterins - Fumarate reductases / succinate dehydrogenases - G-protein coupled receptors - GTPase-activating proteins - IMP dehydrogenase / GMP reductase - Insulin-like growth factor binding proteins - Leucine/isoleucine-binding proteins - Lipocalins - Manganese-dependent dipeptidases - Nickel-dependent hydrogenases - P-II proteins (glnB) - Pectinesterases - Phenylalanine and histidine ammonia-lyases - Phosphoenolpyruvate carboxykinase (ATP and GTP) - Polygalacturonases - Prokaryotic molybdopterin oxidoreductases - Receptors tyrosine kinase class IV (FGF receptors) - Ribosomal proteins - Signal recognition particle 54 Kd protein - Somatostatins - Thymosin beta-4 family - Tyrosinases - Wnt-1 family 2.2 Change in the OS line As previously announced we have inverted the order of the information in the OS line. We switched from 'English common name (Latin name)` to 'Latin name (English common name)`. Example: OS HUMAN (HOMO SAPIENS). as been changed to: OS HOMO SAPIENS (HUMAN). 2.3 Cross-references to the Escherichia coli gene-protein database We have added cross-references to the Escherichia coli gene-protein database (2D gels spots) (for a description see: VanBogelen R.A., Hutton M.E., and Neidhardt F.C., in Electrophoresis (1990), 12:1131-1166). These cross-references are present in the DR lines. The data bank identifier is EC-2D-GEL, the primary identifier is the 2D gel spot alphanumeric designation, and the secondary identifier is the latest edition of the data bank that we have used to derive the cross- reference. Example of a DR line for the Escherichia coli gene-protein database: DR EC-2D-GEL; G052.0; 3RD EDITION. 2.4 Status of cross-references to PIR We have continued adding cross-references to entries in the unannotated sections of PIR (known as PIR2 and PIR3); currently we have cross- references to 13118 sequence entries in PIR2/3 out of a total of 19051 entries in those sections in release 27 of PIR. 2.5 Documentation changes - The EC2DTOSP.TXT document is an index of Escherichia coli Gene- protein database entries referenced in SWISS-PROT (see section 2.3). - The SPEINDEX.TXT document is a species index. - The JOURLIST.TXT document now indicates, when it exist, the 6 characters CODEN designation of the journals cited in SWISS-PROT and in PROSITE. Example of an entry in the JOURLIST.TXT file: Abbrev: EMBO J. Title : EMBO Journal ISSN : 0261-4189 Coden : EMJ0DG - The SPECODES.TXT document is no longer distributed. The information contained in this document was duplicating that found in the species index. 2.6 Absence of the line-types: CA and CF We announced in the last two release notes that, starting with release 18, the enzyme entries in SWISS-PROT would have two new line-types: CA Description_of_catalytic_activity. CF Description_of_cofactor. We finally decided not to implement this change as it would create line- types specific to a subset of entries (enzymes); it would open the door to the creation of too many types of lines. We believe that the use of topics in the comment line is a better approach for the storage of such information. 3. FORTHCOMING CHANGES 3.1 New line-types: RP and RC We plan to implement the following change in release 19; the current RN line will be replaced by three line types: a modified RN (Reference Number) line type containing just the reference number, a new RP (Reference Position) line type containing the extent of the work carried out by the authors of the reference, and a new RC (Reference Comment) line type containing comments relevant to the reference (strain, tissue, etc.). Three examples of the usage of these new lines are given below. RN  RP SEQUENCE FROM N.A., AND SEQUENCE OF 1-23. RC STRAIN=K12; RN  RP SEQUENCE OF 24-56 AND 67-89. RC STRAIN=BALB/C; TISSUE=BRAIN; RN  RP X-RAY CRYSTALLOGRAPHY 1.8 ANGSTROMS. - Each reference block will continue to have exactly one RN line. - There will always be a single RP line which will be in free text format. - As many RC lines as are needed to display the comments will appear; if a reference has no comment then the RC line will not appear. - A precise syntax will be used to display the information that appear on the RC line. The syntax of the Rc line is: RC TOKEN=Text; TOKEN=Text;.... Where the following token are already defined: MEDLINE, PLASMID, SPECIES, STRAIN, and TISSUE. Additional tokens will probably be added to this list. 3.2 MEDLINE unique identifiers Starting with release 19 each journal reference listed in SWISS-PROT which exists in the MEDLINE bibliographic data bank will include the "Unique Identifier" (UI) of that reference in MEDLINE. This information will be stored in the new RC line using the "MEDLINE" token. Example: RC MEDLINE=90205618; It is planned that, in a few months, MEDLINE will add cross-references to SWISS-PROT. 4. ENZYME AND PROSITE 4.1 The ENZYME data bank Release 5.0 of the ENZYME data bank is distributed along with release 18 of SWISS-PROT. ENZYME release 5.0 contains information relative to 3072 enzymes. The data bank is complete and up to date. Until new enzyme nomenclature data is published we only plan to update the SWISS-PROT pointers at each release of the protein sequence data bank, correct eventual errors, and complete the information concerning synonyms and cofactors using the literature. 4.2 The PROSITE data bank Release 7.0 of the PROSITE data bank is distributed along with release 18 of SWISS-PROT. Release 7.0 contains 441 documentation chapters that describes 508 different patterns. Since the last major release of PROSITE (release 6.0 of November 1990), 69 new chapters have been added and 163 chapters have been updated. 5. WE NEED YOUR HELP ! We welcome feedback from our users. We would especially appreciate that you notify us if you find that sequences belonging to your field of expertise are missing from the data bank. We also would like to be notified about annotations to be updated, as for example if the function of a protein has been clarified or if new post-translational information has become available. APPENDIX A: SOME STATISTICS A.1 Amino acid composition A.1.1 Composition in percent for the complete data bank Ala (A) 7.65 Gln (Q) 4.09 Leu (L) 9.11 Ser (S) 7.08 Arg (R) 5.23 Glu (E) 6.28 Lys (K) 5.85 Thr (T) 5.85 Asn (N) 4.44 Gly (G) 7.12 Met (M) 2.32 Trp (W) 1.30 Asp (D) 5.24 His (H) 2.27 Phe (F) 3.95 Tyr (Y) 3.21 Cys (C) 1.83 Ile (I) 5.44 Pro (P) 5.09 Val (V) 6.49 Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03 A.1.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp A.2 Repartition of the sequences by their organism of origin Total number of species represented in this release of SWISS-PROT: 2864 A.2.1 Table of the frequency of occurrence of species Species represented 1x: 1266 2x: 520 3x: 284 4x: 172 5x: 121 6x: 94 7x: 73 8x: 37 9x: 58 10x: 29 11- 20x: 101 21-100x: 86 >100x: 23 A.2.2 Table of the most represented species Number Frequency Species 1 1715 Human 2 1454 Escherichia coli 3 1001 Mouse 4 933 Rat 5 675 Baker's yeast (Saccharomyces cerevisiae) 6 486 Bovine 7 388 Fruit fly (Drosophila melanogaster) 8 354 Chicken 9 282 Bacillus subtilis 10 255 Rabbit 11 251 Vaccinia virus (strain Copenhagen) 12 246 African clawed frog (Xenopus laevis) 13 230 Pig 14 193 Human cytomegalovirus (strain AD169) 15 191 Salmonella typhimurium 16 160 Bacteriophage T4 17 152 Maize 18 128 Rice 19 113 Vaccinia virus (strain WR) 20 112 Tobacco Pea 22 106 Wheat 23 102 Staphylococcus aureus 24 96 Sheep 25 93 Barley Slime mold (Dictyostelium discoideum) 27 84 Agrobacterium tumefaciens Liverwort (Marchantia polymorpha) Spinach 30 83 Soybean 31 80 Fission yeast (Schizosaccharomyces pombe) 32 79 Pseudomonas aeruginosa Klebsiella pneumoniae 34 78 Dog 35 77 Neurospora crassa A.3 Repartition of the sequences by size From To Number From To Number 1- 50 1390 1001-1100 191 51- 100 2369 1101-1200 122 101- 150 3358 1201-1300 94 151- 200 2034 1301-1400 57 201- 250 1666 1401-1500 46 251- 300 1490 1501-1600 24 301- 350 1311 1601-1700 23 351- 400 1281 1701-1800 23 401- 450 979 1801-1900 27 451- 500 1057 1901-2000 22 501- 550 811 2001-2100 9 551- 600 555 2101-2200 23 601- 650 390 2201-2300 28 651- 700 288 2301-2400 11 701- 750 270 2401-2500 13 751- 800 214 >2500 50 801- 850 162 851- 900 172 901- 950 112 951-1000 100 Currently the ten largest sequences are: RYNR$RABIT 5037 a.a. RYNR$HUMAN 5032 a.a. APB$HUMAN 4563 a.a. APOA$HUMAN 4548 a.a. POLG$BVDV 3988 a.a. POLG$HCVA 3898 a.a. POLG$HCVB 3898 a.a. TRX$DROME 3759 a.a. ACVA$PENCH 3746 a.a. DMD$HUMAN 3685 a.a. APPENDIX B: ON-LINE EXPERTS B.1 List of on-line experts for PROSITE and SWISS-PROT Field of expertise Name Email address --------------------------- ------------------- -------------------------- Alcohol dehydrogenases Bengt P. email@example.com Aldehyde dehydrogenases Bengt P. firstname.lastname@example.org Alpha-2-macroglobulins Van Leuven F. email@example.com Apolipoproteins Boguski M.S. firstname.lastname@example.org Arrestins Kolakowski L.F. Jr. email@example.com Bacteriophage P4 Halling C. firstname.lastname@example.org Beta-lactamases Brannigan J. email@example.com Chitinases Henrissat B. firstname.lastname@example.org CTF/NF-I Mermod N. email@example.com Cytochromes P450 Holsztynska E.J. firstname.lastname@example.org email@example.com EF-hand calcium-binding Cox J.A. firstname.lastname@example.org Kretsinger R.H. email@example.com Eryf1-type zinc-fingers Boguski M.S. firstname.lastname@example.org Glucanases Henrissat B. email@example.com Beguin P. firstname.lastname@example.org G-protein coupled receptors Chollet A. email@example.com Attwood T.K. firstname.lastname@example.org GTPase-activating proteins Boguski M.S. email@example.com HMG1/2 and HMG-14/17 Landsman D. firstname.lastname@example.org Inorganic pyrophosphatases Kolakowski L.F. Jr. email@example.com Integrases Roy P.H. firstname.lastname@example.org Phytochromes Partis M.D. email@example.com Protein kinases Hanks S. firstname.lastname@example.org Restriction-modification Bickle T. email@example.com Roberts R.J. firstname.lastname@example.org Ring-cleavage dioxygenases Harayama S. email@example.com Subtilisin family proteases Brannigan J. firstname.lastname@example.org Thiol proteases Turks B. email@example.com Thiol proteases inhibitors Turks B. firstname.lastname@example.org TPR repeats Boguski M.S. email@example.com Transit peptides von Heijne G. firstname.lastname@example.org Type-II membrane antigens Levy S. email@example.com Xylose isomerase Jenkins J. firstname.lastname@example.org B.2 Requirements to fulfill to become an on-line expert An expert should be a scientist working with specific famili(es) of proteins (or specific domains) and which would: a) Review the protein sequences in SWISS-PROT and the patterns/matrices in PROSITE relevant to their field of research. b) Agree to be contacted by people that have obtained new sequence(s) which seem to belong to "their" familie(s) of proteins. c) Have access to electronic mail and be willing to use it to send and receive data. If you are willing to be part of this scheme please contact Amos Bairoch at one of the following electronic mail addresses: email@example.com firstname.lastname@example.org