You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.
Swiss-Prot release 19.0
Published August 1, 1991
SWISS-PROT RELEASE 19.0 RELEASE NOTES 1. INTRODUCTION 1.1 Evolution Release 19.0 of SWISS-PROT contains 21795 sequence entries, comprising 7'173'785 amino acids abstracted from 21773 references. This represents an increase of 6% over release 18. The recent growth of the data bank is summarized below. Release Date Number of entries Nb of amino acids 3.0 11/86 4160 969 641 4.0 04/87 4387 1 036 010 5.0 09/87 5205 1 327 683 6.0 01/88 6102 1 653 982 7.0 04/88 6821 1 885 771 8.0 08/88 7724 2 224 465 9.0 11/88 8702 2 498 140 10.0 03/89 10008 2 952 613 11.0 07/89 10856 3 265 966 12.0 10/89 12305 3 797 482 13.0 01/90 13837 4 347 336 14.0 04/90 15409 4 914 264 15.0 08/90 16941 5 486 399 16.0 11/90 18364 5 986 949 17.0 02/91 20024 6 524 504 18.0 05/91 20772 6 792 034 19.0 08/91 21795 7 173 785 1.2 Source of data Release 19.0 has been updated using protein sequence data from release 28.0 of the PIR (Protein Identification Resource) protein data bank, as well as translation of nucleotide sequence data from release 27.0 of the EMBL Nucleotide Sequence Database. As an indication to the source of the sequence data in the SWISS-PROT data bank we list here the statistics concerning the DR (Database cross- references) pointer lines: Entries with pointer(s) to only PIR entri(es): 3912 Entries with pointer(s) to only EMBL entri(es): 3542 Entries with pointer(s) to both EMBL and PIR entri(es): 13835 Entries with no pointers lines: 506 <PAGE> 2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 18 2.1 Sequences and annotations About 1040 sequences have been added since release 18, the sequence data of 194 existing entries has been updated and the annotations of 2220 entries have been revised. In particular we have used reviews articles to update the annotations of the following groups or families of proteins: - Adenylosuccinate synthetase - Aldose 1-epimerase - Argininosuccinate synthase - Amidases - Bacterial regulatory proteins, lacI family - Bacterial regulatory proteins, merR family - Bacterioferritin - Delta 1-pyrroline-5-carboxylate reductase - DNA polymerase family X - Eukaryotic molybdopterin-dependent oxidoreductases - Eukaryotic porin - Ferrochelatase - Fibrillarin - FMN-dependent alpha-hydroxy acid dehydrogenases - Hemerythrins - HlyD family secretion proteins - Hok/gef family cell toxic proteins - Matrixins - Methylmalonyl-CoA mutase - Phosphoribulokinase - Plant viruses icosahedral capsid proteins - Porphobilinogen deaminase - Pyrokinins - Ribonuclease T2 family - Stathmin family - Transglutaminases - Ubiquitin-activating enzyme - Zinc finger RFP/RPT-1 family 2.2 New line types: RP and RC The following change has been implemented in release 19; the RN line has been replaced by three line types: a modified RN (Reference Number) line type containing just the reference number, a new RP (Reference Position) line type containing the extent of the work carried out by the authors of the reference, and a new RC (Reference Comment) line type containing comments relevant to the reference (strain, tissue, etc.). Three examples of the usage of these new line types are given below. <PAGE> RN  RP SEQUENCE FROM N.A., AND SEQUENCE OF 1-23. RC STRAIN=K12; RN  RP SEQUENCE OF 24-56 AND 67-89. RC STRAIN=BALB/C; TISSUE=BRAIN; RN  RP X-RAY CRYSTALLOGRAPHY, 1.8 ANGSTROMS. RC MEDLINE=91002678; Within a reference block the RN and RP lines occur once, the RC line occurs zero or more times. The format of the RC line is: RC TOKEN1=Text; TOKEN2=Text;.... Where the following tokens are currently defined: MEDLINE, PLASMID, SPECIES, STRAIN, TISSUE, and TRANSPOSON. The `SPECIES' token is only used when an entry describes a sequence which is identical in more than one species; similarly the `PLASMID' is only used if an entry describes a sequence identical in more than one plasmid. 2.3 MEDLINE unique identifiers Starting with release 19 each journal reference listed in SWISS-PROT which exists in the National Library of Medicine (NLM) MEDLINE bibliographic data bank includes the `Unique Identifier' (UI) of that reference. This information is stored in the new RC line type using the `MEDLINE' token. Example: RC MEDLINE=90205618; It is planned that, in a few months, MEDLINE will add cross-references to SWISS-PROT. <PAGE> 2.4 New cross-references We have added cross-references to the Transcription Factors Database (TFD) of David Ghosh (for a description see: Nucleic Acids Res. (1990) 18:1749-1756); as well as to the Drosophila Genetic Maps database (DMAP) prepared by Michael Ashburner at the Department of Genetics in Cambridge, England. These cross-references are present in the DR lines: Data bank identifier: DMAP Primary identifier : Gene unique identifier number (UID). Secondary identifier: Latest release of DMAP that was used to derive the cross-references. Example : DR DMAP; 00055; RELEASE 9107. Data bank identifier: TFD Primary identifier : Unique identifier for the corresponding TFD POLYPEPTIDES table entry (the TFD_ID field). Secondary identifier: Latest release of TFD that was used to derive the cross-references. Example : DR TFD; P00040; RELEASE 3.0. 2.5 Minor change in the DT line format There is now a single space character between the date and the comment part of a DT line instead of the two spaces that used to exist in previous releases. Example: DT 01-MAY-1991 (REL. 18, CREATED) has been changed to: DT 01-MAY-1991 (REL. 18, CREATED) 2.6 Minor change in the RL lines for submissions References for sequence information submitted to the international nucleic acid databases (DDBJ, EMBL, Genbank) were represented by the following subtype of RL lines: RL SUBMITTED (JAN-1991) TO EMBL/GENBANK DATA BANKS. Starting with release 19, these RL lines use the following format: RL SUBMITTED (JAN-1991) TO EMBL/GENBANK/DDBJ DATA BANKS. <PAGE> 2.7 Status of cross-references to PIR We have continued adding cross-references to entries in the unannotated sections of PIR (known as PIR2 and PIR3); currently we have cross- references to 14078 sequence entries in PIR2/3 out of a total of 20265 entries in those sections in release 28 of PIR. 3. FORTHCOMING CHANGES 3.1 Change in the format of the entry names Starting with release 21 we will replace the dollar sign `$' in entry names by the underscore character `_'. This change is made on the behalf of users of sequence analysis software running under the Unix operating system, where the dollar sign is a reserved symbol. Example: the entry name `CYC$HUMAN' will be changed to `CYC_HUMAN'. 4. ENZYME AND PROSITE 4.1 The ENZYME data bank Release 6.0 of the ENZYME data bank is distributed along with release 19 of SWISS-PROT. ENZYME release 6.0 contains information relative to 3072 enzymes. The data bank is complete and up to date. Until new enzyme nomenclature data is published we only plan to update the SWISS-PROT pointers at each release of the protein sequence data bank, correct eventual errors, and complete the information concerning synonyms and cofactors using the literature. 4.2 The PROSITE data bank Release 7.10 of the PROSITE data bank is distributed along with release 19 of SWISS-PROT. Release 7.10 contains 441 documentation chapters that describes 508 different patterns. Release 7.10 does not really represent a new release; the only changes between release 7.0 and 7.1 are updating of the pointers to the SWISS-PROT entries whose name have been modified between release 18 and 19. The next release of PROSITE (8.0) will be distributed with release 20 of SWISS-PROT. 5. WE NEED YOUR HELP ! We welcome feedback from our users. We would especially appreciate that you notify us if you find that sequences belonging to your field of expertise are missing from the data bank. We also would like to be notified about annotations to be updated, as for example if the function of a protein has been clarified or if new post-translational information has become available. <PAGE> APPENDIX A: SOME STATISTICS A.1 Amino acid composition A.1.1 Composition in percent for the complete data bank Ala (A) 7.64 Gln (Q) 4.09 Leu (L) 9.11 Ser (S) 7.10 Arg (R) 5.23 Glu (E) 6.27 Lys (K) 5.85 Thr (T) 5.85 Asn (N) 4.46 Gly (G) 7.11 Met (M) 2.32 Trp (W) 1.30 Asp (D) 5.24 His (H) 2.27 Phe (F) 3.96 Tyr (Y) 3.21 Cys (C) 1.82 Ile (I) 5.45 Pro (P) 5.09 Val (V) 6.49 Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03 A.1.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp A.2 Repartition of the sequences by their organism of origin Total number of species represented in this release of SWISS-PROT: 2986 A.2.1 Table of the frequency of occurrence of species Species represented 1x: 1302 2x: 543 3x: 303 4x: 184 5x: 128 6x: 93 7x: 78 8x: 40 9x: 60 10x: 32 11- 20x: 111 21-100x: 88 >100x: 24 <PAGE> A.2.2 Table of the most represented species Number Frequency Species 1 1790 Human 2 1512 Escherichia coli 3 1043 Mouse 4 972 Rat 5 735 Baker's yeast (Saccharomyces cerevisiae) 6 499 Bovine 7 414 Fruit fly (Drosophila melanogaster) 8 362 Chicken 9 286 Bacillus subtilis 10 261 Rabbit 11 252 African clawed frog (Xenopus laevis) 12 251 Vaccinia virus (strain Copenhagen) 13 238 Pig 14 201 Salmonella typhimurium 15 193 Human cytomegalovirus (strain AD169) 16 166 Bacteriophage T4 17 154 Maize 18 132 Rice 19 113 Vaccinia virus (strain WR) 20 112 Tobacco Pea 22 110 Wheat 23 105 Staphylococcus aureus 24 101 Slime mold (Dictyostelium discoideum) 25 98 Sheep 26 97 Barley 27 90 Fission yeast (Schizosaccharomyces pombe) 28 89 Spinach 29 87 Pseudomonas aeruginosa 30 85 Soybean 31 84 Liverwort (Marchantia polymorpha) 32 81 Agrobacterium tumefaciens 33 80 Dog Klebsiella pneumoniae 35 79 Neurospora crassa <PAGE> A.3 Repartition of the sequences by size From To Number From To Number 1- 50 1458 1001-1100 203 51- 100 2439 1101-1200 124 101- 150 3466 1201-1300 97 151- 200 2112 1301-1400 63 201- 250 1757 1401-1500 49 251- 300 1558 1501-1600 27 301- 350 1395 1601-1700 24 351- 400 1365 1701-1800 26 401- 450 1052 1801-1900 29 451- 500 1121 1901-2000 22 501- 550 855 2001-2100 9 551- 600 583 2101-2200 24 601- 650 422 2201-2300 30 651- 700 304 2301-2400 11 701- 750 284 2401-2500 13 751- 800 226 >2500 51 801- 850 178 851- 900 190 901- 950 116 951-1000 112 Currently the ten largest sequences are: RYNR$RABIT 5037 a.a. RYNR$HUMAN 5032 a.a. APB$HUMAN 4563 a.a. APOA$HUMAN 4548 a.a. POLG$BVDV 3988 a.a. POLG$HCVA 3898 a.a. POLG$HCVB 3898 a.a. TRX$DROME 3759 a.a. ACVA$PENCH 3746 a.a. DMD$HUMAN 3685 a.a. <PAGE> APPENDIX B: ON-LINE EXPERTS B.1 List of on-line experts for PROSITE and SWISS-PROT Field of expertise Name Email address ----------------------------- ------------------ -------------------------- African swine fever virus Yanez R.J. firstname.lastname@example.org Alcohol dehydrogenases Bengt P. email@example.com Aldehyde dehydrogenases Bengt P. firstname.lastname@example.org Alpha-crystallins/HSP-20 Leunissen J.A.M. email@example.com Alpha-2-macroglobulins Van Leuven F. firstname.lastname@example.org Apolipoproteins Boguski M.S. email@example.com Arrestins Kolakowski L.F.Jr. firstname.lastname@example.org Bacteriophage P4 proteins Halling C. email@example.com Beta-lactamases Brannigan J. firstname.lastname@example.org Chitinases Henrissat B. email@example.com CTF/NF-I Mermod N. firstname.lastname@example.org Cytochromes P450 Holsztynska E.J. email@example.com firstname.lastname@example.org EF-hand calcium-binding Cox J.A. email@example.com Kretsinger R.H. firstname.lastname@example.org Eryf1-type zinc-fingers Boguski M.S. email@example.com fruR/lacI family HTH proteins Reizer J. firstname.lastname@example.org Glucanases Henrissat B. email@example.com Beguin P. firstname.lastname@example.org G-protein coupled receptors Chollet A. email@example.com Attwood T.K. firstname.lastname@example.org GTPase-activating proteins Boguski M.S. email@example.com HMG1/2 and HMG-14/17 Landsman D. firstname.lastname@example.org Inorganic pyrophosphatases Kolakowski L.F.Jr. email@example.com Integrases Roy P.H. firstname.lastname@example.org Phytochromes Partis M.D. email@example.com Prokaryotic carbohydrate Reizer J. firstname.lastname@example.org kinases Protein kinases Hanks S. email@example.com Restriction-modification Bickle T. firstname.lastname@example.org enzymes Roberts R.J. email@example.com Ribosomal protein S3 Hallick R. firstname.lastname@example.org Ribosomal protein S15 Ellis S.R. email@example.com Ring-cleavage dioxygenases Harayama S. firstname.lastname@example.org Sodium symporters Reizer J. email@example.com Subtilisin family proteases Brannigan J. firstname.lastname@example.org Thiol proteases Turks B. email@example.com Thiol proteases inhibitors Turks B. firstname.lastname@example.org TPR repeats Boguski M.S. email@example.com Transit peptides von Heijne G. firstname.lastname@example.org Type-II membrane antigens Levy S. email@example.com Xylose isomerase Jenkins J. firstname.lastname@example.org <PAGE> B.2 Requirements to fulfill to become an on-line expert An expert should be a scientist working with specific famili(es) of proteins (or specific domains) and which would: a) Review the protein sequences in SWISS-PROT and the patterns/matrices in PROSITE relevant to their field of research. b) Agree to be contacted by people that have obtained new sequence(s) which seem to belong to "their" familie(s) of proteins. c) Have access to electronic mail and be willing to use it to send and receive data. If you are willing to be part of this scheme please contact Amos Bairoch at one of the following electronic mail addresses: email@example.com firstname.lastname@example.org <PAGE>