Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 23.0

Published August 1, 1992




                    SWISS-PROT RELEASE 23.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 23.0  of SWISS-PROT  contains 26706 sequence entries, comprising
   9'011'391 amino  acids abstracted from 26485 references. This represents
   an increase  of 7.6% over release 22. The recent growth of the data bank
   is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391

   1.2  Source of data

   Release 23.0  has been  updated using protein sequence data from release
   33.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 31.0 of the
   EMBL Nucleotide Sequence Database.















<PAGE>




   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):               4368
   Entries with pointer(s) to only EMBL entri(es):              3365
   Entries with pointer(s) to both EMBL and PIR entri(es):     18444
   Entries with no pointers lines:                               529


      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 22


   2.1  Sequences and annotations

   About 1680 sequences have been added since release 22, the sequence data
   of 235  existing entries  has been  updated and  the annotations of 3400
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  AP endonucleases
   -  Bacterial regulatory proteins, lacI family
   -  Electron transfer flavoprotein alpha-subunit
   -  Enterobacterial virulence outer membrane protein
   -  Formate--tetrahydrofolate ligase
   -  Germin family
   -  Guanine-nucleotide releasing factors CDC25 family
   -  Lipoxygenases
   -  Prokaryotic ornithine and lysine decarboxylases
   -  Prokaryotic-type carbonic anhydrases
   -  Riboflavin synthase alpha chain family
   -  Ribosomal proteins
   -  Sigma-54 factors family
   -  Sigma-70 factors family
   -  Single strand binding protein family
   -  Stress-induced proteins SRP1/TIP1 family
   -  TNF family

                   3.0 CHANGES PLANNED FOR FUTURE RELEASES

   3.1  Change in the RA line concerning the author names format

   As from  release 25  in March  1993 we  will change the format of author
   names on  RA lines  to conform  to  that  used  by  major  bibliographic
   databases such  as Medline.  The main  change is  that the  periods  and
   hyphens ("-") which currently appear within initials will not appear any
   more. For example, the current:

   RA   Wilson A.C., Smith J.-C.;






<PAGE>




   will appear as:

   RA   Wilson AC, Smith JC;

   3.2  Weekly update of SWISS-PROT

   Starting with  release 24 in November 1992 we will provide weekly update
   of SWISS-PROT. Instructions  on  how  to access the update files will be
   given at the next release.



                            4. ENZYME AND PROSITE

   4.1  The ENZYME data bank

   Release 10.0  of the  ENZYME data bank is distributed along with release
   23 of  SWISS-PROT. ENZYME  release 10.0 contains information relative to
   3183 enzymes.  The data  bank will probably be significantly modified at
   the next  release due  to the publication of a new edition of the IUPAC-
   IUB Enzyme Nomenclature book which describes many new enzymes and update
   the information concerning existing ones.

   4.2  The PROSITE data bank

   Release 9.10  of the PROSITE data bank is distributed along with release
   23 of  SWISS-PROT. Release 9.10 contains 580 documentation chapters that
   describes 689 different patterns. Release 9.10 does not really represent
   a new  release; the  only changes  between  release  9.0  and  9.10  are
   updating of  the pointers to the SWISS-PROT entries whose name have been
   modified between  release 22  and 23. The next release of PROSITE (10.0)
   will be distributed with release 24 of SWISS-PROT.


                            5. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.














<PAGE>




                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.66   Gln (Q) 4.06   Leu (L) 9.15   Ser (S) 7.07
   Arg (R) 5.24   Glu (E) 6.25   Lys (K) 5.82   Thr (T) 5.84
   Asn (N) 4.45   Gly (G) 7.10   Met (M) 2.34   Trp (W) 1.31
   Asp (D) 5.25   His (H) 2.26   Phe (F) 3.97   Tyr (Y) 3.21
   Cys (C) 1.80   Ile (I) 5.50   Pro (P) 5.06   Val (V) 6.50

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 3497

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1537
                            2x:  612
                            3x:  345
                            4x:  222
                            5x:  148
                            6x:  117
                            7x:   76
                            8x:   60
                            9x:   71
                           10x:   30
                       11- 20x:  144
                       21- 50x:   78
                       51-100x:   24
                         >100x:   33












<PAGE>




        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        2018          Human
         2        1918          Escherichia coli
         3        1220          Mouse
         4        1154          Rat
         5        1053          Baker's yeast (Saccharomyces cerevisiae)
         6         556          Bovine
         7         485          Fruit fly (Drosophila melanogaster)
         8         428          Chicken
         9         402          Bacillus subtilis
        10         311          Salmonella typhimurium
        11         310          African clawed frog (Xenopus laevis)
        12         297          Rabbit
        13         273          Pig
        14         251          Vaccinia virus (strain Copenhagen)
        15         197          Maize
        16         193          Human cytomegalovirus (strain AD169)
        17         168          Bacteriophage T4
        18         159          Vaccinia virus (strain WR)
        19         153          Rice
        20         140          Tobacco
        21         138          Wheat
        22         128          Pea
        23         120          Barley
        24         119          Pseudomonas aeruginosa
                   119          Staphylococcus aureus
        26         117          Marchantia polymorpha (liverwort)
        27         116          Arabidopsis thaliana (Mouse-ear cress)
        28         111          Slime mold (Dictyostelium discoideum)
        29         110          Fission yeast (Schizosaccharomyces pombe)
        30         106          Soybean
        31         104          Caenorhabditis elegans
                   104          Sheep
                   104          Spinach
        34         100          Klebsiella pneumoniae
                   100          Pseudomonas putida
                   100          Dog

















<PAGE>




   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    1644             1001-1100      258
                 51- 100    2839             1101-1200      147
                101- 150    4010             1201-1300      129
                151- 200    2576             1301-1400       79
                201- 250    2168             1401-1500       64
                251- 300    1987             1501-1600       37
                301- 350    1804             1601-1700       32
                351- 400    1773             1701-1800       32
                401- 450    1340             1801-1900       35
                451- 500    1490             1901-2000       27
                501- 550    1053             2001-2100       10
                551- 600     742             2101-2200       32
                601- 650     512             2201-2300       39
                651- 700     378             2301-2400       13
                701- 750     367             2401-2500       14
                751- 800     291             >2500           73
                801- 850     216
                851- 900     220
                901- 950     140
                951-1000     135


   Currently the ten largest sequences are:


                            RYNR_RABIT  5037 a.a.
                            RYNR_HUMAN  5032 a.a.
                            APB_HUMAN   4563 a.a.
                            APOA_HUMAN  4548 a.a.
                            DYHC_TRIGR  4466 a.a.
                            POLG_BVDV   3988 a.a.
                            VGF1_IBVB   3951 a.a.
                            POLG_HCVA   3898 a.a.
                            POLG_HCVB   3898 a.a.
                            ACVT_PENCH  3791 a.a.
















<PAGE>




                         APPENDIX B: ON-LINE EXPERTS



   B.1  List of on-line experts for PROSITE and SWISS-PROT


Field of expertise            Name               Email address
---------------------------   ------------------ ----------------------------
Alcohol dehydrogenases        Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt@medfys.ki.se
Aldehyde dehydrogenases       Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt@medfys.ki.se
Alpha-crystallins/HSP-20      Leunissen J.A.M.   jackl@caos.caos.kun.nl
                              de Jong W.         u629000@hnykun11.bitnet
Alpha-2-macroglobulins        Van Leuven F.      fred@blekul13.bitnet
AA-tRNA synthetases class II  Leberman R.        leberman@frembl51.bitnet
Apolipoproteins               Boguski M.S.       boguski@ncbi.nlm.nih.gov
Arrestins                     Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Band 4.1 family proteins      Rees J.            jrees@vax.oxford.ac.uk
Beta-lactamases               Brannigan J.       jab5@vaxa.york.ac.uk
Beta-transducin family        Boguski M.S.       boguski@ncbi.nlm.nih.gov
Chalcone/stilbene synthases   Schroeder J.       raf@sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60       Georgopoulos C.    georgopo@cmu.unige.ch
Chaperonins TCP1 family       Willison K.R.      willison@icrf.ac.uk
Chitinases                    Henrissat B.       cermav@frgren81.bitnet
Clusterin                     Peitsch M.C.       peitsch@ulbio1.unil.ch
CTF/NF-I                      Mermod N.          nmermod@ulys.unil.ch
Cytochromes P450              Holsztynska E.J.   ela@netcom.uucp
                                                 netcom!ela@apple.com
DEAD-box helicases            Linder P.          linder@urz.unibas.ch
dnaJ family                   Kelley W.          kelley@cmu.unige.ch
EF-hand calcium-binding       Cox J.A.           cox@sc2a.unige.ch
                              Kretsinger R.H.    rhk5i@virginia.bitnet
Enoyl-CoA hydratase           Hofmann K.O.       khofmann@cipvax.biolan.uni-koeln.de
fruR/lacI family HTH proteins Reizer J.          jreizer@ucsd.edu
GATA-type zinc-fingers        Boguski M.S.       boguski@ncbi.nlm.nih.gov
Glucanases                    Henrissat B.       cermav@frgren81.bitnet
                              Beguin P.          phycel@pasteur.bitnet
G-protein coupled receptors   Chollet A.         chollet@clients.switch.ch
                              Attwood T.K.       bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins    Boguski M.S.       boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17          Landsman D.        landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Integrases                    Roy P.H.           2020000@saphir.ulaval.ca
Lipocalins                    Boguski M.S.       boguski@ncbi.nlm.nih.gov
                              Peitsch M.C.       peitsch@ulbio1.unil.ch
lysR family HTH proteins      Henikoff S.        henikoff@sparky.fhcrc.org
MAC components / perforin     Peitsch M.C.       peitsch@ulbio1.unil.ch
Malic enzymes                 Glynias M.         mglynias@ncsa.uiuc.edu
Myelin proteolipid protein    Hofmann K.O.       khofmann@cipvax.biolan.uni-koeln.de





<PAGE>




PEP requiring enzymes         Reizer J.          jreizer@ucsd.edu
pfkB carbohydrate kinases     Reizer J.          jreizer@ucsd.edu
Phytochromes                  Partis M.D.        partis@gcri.afrc.ac.uk
Protein kinases               Hanks S.           hanks@vuctrvax.bitnet
                              Hunter T.          hunter@salk.bitnet
PTS proteins                  Reizer J.          jreizer@ucsd.edu
Restriction-modification      Bickle T.          bickle@urz.unibas.ch
            enzymes           Roberts R.J.       roberts@neb.com
Ribosomal protein S3          Hallick R.         hallick%biotec@arizona.edu
Ribosomal protein S15         Ellis S.R.         srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.        harayama@cmu.unige.ch
Sodium symporters             Reizer J.          jreizer@ucsd.edu
Subtilases                    Brannigan J.       jab5@vaxa.york.ac.uk
Thiol proteases               Turk B.            turk@ijs.ac.mail.yu
Thiol proteases inhibitors    Turk B.            turk@ijs.ac.mail.yu
TNF family                    Jongeneel C.V.     vjongene@isrec.arcom.ch
TPR repeats                   Boguski M.S.       boguski@ncbi.nlm.nih.gov
Transit peptides              von Heijne G.      gunnar@cbts.sunet.se
Type-II membrane antigens     Levy S.            levy@cellbio.stanford.edu
Uracil-DNA glycosylase        Aasland R.         aasland@bio.uib.no
Xylose isomerase              Jenkins J.         jenkins@frira.afrc.ac.uk
WAP-type domain               Claverie J.-M.     jmc@ncbi.nlm.nih.gov
ZP domain                     Bork P.            bork@embl-heidelberg.de

African swine fever virus     Yanez R.J.         ryanez@cbm2.uam.es
Bacteriophage P4              Halling C.         chh9@midway.uchicago.edu
Drosophila                    Ashburner M.       ma11@phx.cam.ac.uk
Escherichia coli              Rudd K.            rudd@ncbi.nlm.nih.gov
Salmonella typhimurium        Rudd K.            rudd@ncbi.nlm.nih.gov
Snakes                        Stocklin R.        stocklin@cmu.unige.ch
Yeast chromosome I            Ouellette F.       francis@monod.biol.mcgill.ca


   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and which would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at one of the following electronic mail addresses:

                             bairoch@cmu.unige.ch
                           bairoch@cgecmu51.bitnet






<PAGE>




           APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:

                                                       **********************
                        *********************** <----- * EPD [Euk. Promot.] *
                        *  EMBL Nucleotide    * -----> **********************
                        *  Sequence Data      *
***************** ----> *  Library            *        **********************
* FLYBASE       * <---- *********************** <----- * ECD [E. coli map]  *
* [Drosophila   *                ^  |       ^          **********************
* genetic maps] * --------+      |  |       |
***************** <-----+ |      |  |       +--------- **********************
                        | |      |  |       +--------- * TFD [Trans. fact.] *
                        | |      |  |       | +------> **********************
                        | |      |  |       | |
*****************       | v      |  v       v |        **********************
* REBASE        *       ***********************        * ENZYME [Nomencl.]  *
* [Restriction  * <---- *  SWISS-PROT         * <----- **********************
*  enzymes]     *       *  Protein Sequence   *            |
*****************       *  Data Bank          *            v
                        ***********************        **********************
*****************         | ^  |  ^ |  ^ |  |          * OMIM   [Diseases]  *
* PROSITE       * <-------+ |  |  | |  | |  +--------> **********************
* [Patterns]    * ----------+  |  | |  | |
*****************              |  | |  | +-----------> **********************
             |                 |  | |  +-------------- * E. coli 2D gels    *
             |                 |  | |                  **********************
             |                 |  | |
             |                 |  | +----------------> **********************
             |                 |  +------------------- * EcoGene/EcoSeq     *
             |                 v                       **********************
             |          ***********************
             +--------> * PDB [3D structures] *
                        ***********************




















<PAGE>