Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 22.0

Published May 1, 1992




                    SWISS-PROT RELEASE 22.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 22.0  of SWISS-PROT  contains 25044 sequence entries, comprising
   8'375'696 amino  acids abstracted from 25613 references. This represents
   an increase  of 6.5% over release 21. The recent growth of the data bank
   is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696


   1.2  Source of data

   Release 22.0  has been  updated using protein sequence data from release
   31.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 30.0 of the
   EMBL Nucleotide Sequence Database.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):               4309
   Entries with pointer(s) to only EMBL entri(es):              3327
   Entries with pointer(s) to both EMBL and PIR entri(es):     16868
   Entries with no pointers lines:                               540






<PAGE>



      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 21


   2.1  Sequences and annotations

   About 1320 sequences have been added since release 21, the sequence data
   of 180  existing entries  has been  updated and  the annotations of 3800
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  Bacterial regulatory proteins, luxR family
   -  Bacterial regulatory proteins, ntrC family
   -  Band 4.1 proteins family
   -  Beta-amylases
   -  Beta-transducin family
   -  BetvI family pathogenesis proteins
   -  bglG-type antiterminator family
   -  Bromodomain proteins
   -  CBF/NF-Y subunits
   -  CDC48 / PAS1 / SEC18 / TBP-1 family
   -  Chaperonins cpn10
   -  Colipases
   -  Collagens
   -  D-amino acid oxidases
   -  D-isomer specific 2-hydroxyacid dehydrogenases
   -  Dihydrodipicolinate synthases
   -  DnaJ-like proteins
   -  Endoglucanases / exoglucanases/ xylanases
   -  Fork head domain proteins
   -  G-protein coupled receptors family 2
   -  GMC oxidoreductases
   -  Herpes viruses proteins
   -  Inositol monophosphatase family
   -  LIM domain proteins
   -  Mevalonate, and phosphomevalonate kinases.
   -  Mitochondrial creatine kinases
   -  NADH-ubiquinone oxidoreductases peripheral subunits
   -  Orotidine 5'-phosphate decarboxylase
   -  papD family chaperonins
   -  Polygalacturonases
   -  Polyprenyl synthetases
   -  Regulator of chromosome condensation (RCC1)
   -  Respiratory-chain NADH dehydrogenase nuclear encoded subunits
   -  Ribonuclease P protein component
   -  Serine proteases, V8-type family
   -  Snake toxins
   -  Thiol (cysteine) proteases
   -  TNF receptors / low-affinity NGF receptor / CD40 family
   -  Vinculins and alpha-catenins
   -  ZP domain proteins






<PAGE>



   2.2  New topic for the comments (CC) line type

   We have  introduced in  this release a new 'topic' for the comments (CC)
   line-type: PTM.  This  topic  is  used  to  describe  post-translational
   modification(s) whose  position(s) on  the sequenc  is not yet known and
   which can thus not be shown in the feature table.

   Examples of its usage:

   CC   -!- PTM: PHOSPHORYLATED IN VITRO ON SERINE(S) AND THREONINE(S)
   CC       BY PKC.

   CC   -!- PTM: HEAVILY GLYCOSYLATED WITH SULFATED N-LINKED CARBOHYDRATES.

   CC   -!- PTM: SIX DISULFIDE BONDS ARE PRESENT.



   2.3  New feature key

   The UNSURE key is used to describe region(s) of a sequence for which the
   authors are unsure about the sequence assignment.



                            3. ENZYME AND PROSITE

   3.1  The ENZYME data bank

   Release 9.0 of the ENZYME data bank is distributed along with release 22
   of SWISS-PROT.  ENZYME release 9.0 contains information relative to 3179
   enzymes. The  data bank  is complete  and up  to date.  Until new enzyme
   nomenclature data  is published  we only  plan to  update the SWISS-PROT
   pointers at  each release  of the  protein sequence  data bank,  correct
   eventual errors,  and complete  the information  concerning synonyms and
   cofactors using the literature.

   3.2  The PROSITE data bank

   Release 9.00  of the PROSITE data bank is distributed along with release
   22 of  SWISS-PROT. Release 9.00 contains 580 documentation chapters that
   describes 689 different patterns.

   Since the last major release of PROSITE (release 8.00 of December 1991),
   50 new  chapters have  been added  and  about  200  chapters  have  been
   updated. The new chapters are listed below.

   -  Acid phosphatases signature
   -  Bacterial protein export pilT protein family signature
   -  Bacterial regulatory proteins, luxR family signature
   -  Bacterial Ribonuclease P protein component signature
   -  Band 4.1 family domain signatures
   -  Beta-transducin family Trp-Asp repeats signature




<PAGE>



   -  Bromodomain
   -  CBF/NF-Y subunits signatures
   -  CDC48 / PAS1 / SEC18 family signature
   -  Chaperonins cpn10 signature
   -  C-type lectin domain signature
   -  Dihydrodipicolinate synthetase signatures
   -  dnaJ domains signatures
   -  D-amino acid oxidase signature
   -  Fork head domain signatures
   -  GHMP kinases putative ATP-binding domain
   -  Glycosyl hydrolases family 2 active site
   -  Glycosyl hydrolases family 32 active site
   -  Glycosyl hydrolases family 5 signature
   -  Glycosyl hydrolases family 6 signatures
   -  GMC oxidoreductases signatures
   -  Gram-negative pili assembly chaperone signature
   -  G-protein coupled receptors family 2 signatures
   -  Histidinol dehydrogenase active site
   -  Indole-3-glycerol phosphate synthase signature
   -  Inositol monophosphatase family signatures
   -  Leucine aminopeptidase signature
   -  Methionine aminopeptidase signature
   -  Neurotransmitters transporters signature
   -  NifH/frxC family signature
   -  Osteonectin domain signatures
   -  PTN/MK heparin-binding protein family signatures
   -  RecF protein signatures
   -  Regulator of chromosome condensation signatures
   -  Respiratory-chain NADH dehydrogenase subunit 1 signatures
   -  Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures
   -  Respiratory-chain NADH dehydrogenase 75 Kd subunit signatures
   -  Ribosomal protein L30 signature
   -  Ribosomal protein L9 signature
   -  Ribosomal protein S13 signature
   -  Ribosomal protein S19e signature
   -  Ribosomal protein S4 signature
   -  Serine proteases, V8 family, active sites
   -  Sigma-54 interaction domain signatures
   -  Thymidine phosphorylase signature
   -  Tissue factor signature
   -  TNFR/NGFR family cysteine-rich region signature
   -  Transcriptional antiterminators bglG family signature
   -  Vinculin family signatures
   -  ZP domain signature

                            4. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.




<PAGE>



                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition


        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.64   Gln (Q) 4.07   Leu (L) 9.14   Ser (S) 7.10
   Arg (R) 5.24   Glu (E) 6.26   Lys (K) 5.82   Thr (T) 5.84
   Asn (N) 4.47   Gly (G) 7.09   Met (M) 2.33   Trp (W) 1.30
   Asp (D) 5.25   His (H) 2.27   Phe (F) 3.97   Tyr (Y) 3.21
   Cys (C) 1.81   Ile (I) 5.49   Pro (P) 5.06   Val (V) 6.49

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp




   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 3325


        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1465
                            2x:  599
                            3x:  323
                            4x:  197
                            5x:  143
                            6x:  111
                            7x:   74
                            8x:   51
                            9x:   65
                           10x:   35
                       11- 20x:  140
                       21- 50x:   68
                       51-100x:   24
                         >100x:   30










<PAGE>




        A.2.2  Table of the most represented species


    Number   Frequency          Species
         1        1943          Human
         2        1785          Escherichia coli
         3        1152          Mouse
         4        1099          Rat
         5        1004          Baker's yeast (Saccharomyces cerevisiae)
         6         532          Bovine
         7         474          Fruit fly (Drosophila melanogaster)
         8         406          Chicken
         9         377          Bacillus subtilis
        10         290          African clawed frog (Xenopus laevis)
        11         282          Rabbit
        12         258          Pig
        13         251          Vaccinia virus (strain Copenhagen)
        14         245          Salmonella typhimurium
        15         193          Human cytomegalovirus (strain AD169)
        16         183          Maize
        17         168          Bacteriophage T4
        18         153          Vaccinia virus (strain WR)
        19         142          Rice
        20         129          Tobacco
        21         128          Wheat
        22         121          Pea
        23         113          Staphylococcus aureus
        24         112          Pseudomonas aeruginosa
        25         111          Barley
        26         109          Slime mold (Dictyostelium discoideum)
        27         106          Arabidopsis thaliana (Mouse-ear cress)
        28         104          Fission yeast (Schizosaccharomyces pombe)
        29         102          Sheep
                   102          Spinach
        31         100          Pseudomonas putida
                   100          Soybean
        33          97          Dog
        34          96          Caenorhabditis elegans
        35          95          Neurospora crassa

















<PAGE>



   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    1597             1001-1100      239
                 51- 100    2698             1101-1200      140
                101- 150    3791             1201-1300      117
                151- 200    2402             1301-1400       77
                201- 250    2024             1401-1500       62
                251- 300    1838             1501-1600       33
                301- 350    1683             1601-1700       29
                351- 400    1640             1701-1800       29
                401- 450    1257             1801-1900       34
                451- 500    1394             1901-2000       27
                501- 550     980             2001-2100       10
                551- 600     700             2101-2200       28
                601- 650     488             2201-2300       31
                651- 700     338             2301-2400       11
                701- 750     329             2401-2500       12
                751- 800     275             >2500           61
                801- 850     204
                851- 900     209
                901- 950     127
                951-1000     130



   Currently the ten largest sequences are:


                            RYNR_RABIT  5037 a.a.
                            RYNR_HUMAN  5032 a.a.
                            APB_HUMAN   4563 a.a.
                            APOA_HUMAN  4548 a.a.
                            DYHC_TRIGR  4466 a.a.
                            POLG_BVDV   3988 a.a.
                            POLG_HCVA   3898 a.a.
                            POLG_HCVB   3898 a.a.
                            ACVT_PENCH  3791 a.a.
                            TRX_DROME   3759 a.a.
















<PAGE>



                         APPENDIX B: ON-LINE EXPERTS


   B.1  List of on-line experts for PROSITE and SWISS-PROT

Field of expertise            Name               Email address
---------------------------   ------------------ ----------------------------
African swine fever virus     Yanez R.J.         ryanez@cbm2.uam.es
Alcohol dehydrogenases        Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt@medfys.ki.se
Aldehyde dehydrogenases       Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt@medfys.ki.se
Alpha-crystallins/HSP-20      Leunissen J.A.M.   jackl@caos.caos.kun.nl
                              de Jong W.         u629000@hnykun11.bitnet
Alpha-2-macroglobulins        Van Leuven F.      fred@blekul13.bitnet
AA-tRNA synthetases class II  Leberman R.        leberman@frembl51.bitnet
Apolipoproteins               Boguski M.S.       boguski@ncbi.nlm.nih.gov
Arrestins                     Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Band 4.1 family proteins      Rees J.            jrees@vax.oxford.ac.uk
Beta-lactamases               Brannigan J.       jab5@vaxa.york.ac.uk
Beta-transducin family        Boguski M.S.       boguski@ncbi.nlm.nih.gov
Chalcone/stilbene synthases   Schroeder J.       raf@sun1.ruf.uni-freiburg.de
Chitinases                    Henrissat B.       cermav@frgren81.bitnet
Clusterin                     Peitsch M.C.       peitsch@ulbio1.unil.ch
CTF/NF-I                      Mermod N.          nmermod@ulys.unil.ch
Cytochromes P450              Holsztynska E.J.   ela@netcom.uucp
                                                 netcom!ela@apple.com
DEAD-box helicases            Linder P.          linder@urz.unibas.ch
EF-hand calcium-binding       Cox J.A.           cox@sc2a.unige.ch
                              Kretsinger R.H.    rhk5i@virginia.bitnet
Enoyl-CoA hydratase           Hofmann K.O.       khofmann@cipvax.biolan.uni-koeln.de
fruR/lacI family HTH proteins Reizer J.          jreizer@ucsd.edu
GATA-type zinc-fingers        Boguski M.S.       boguski@ncbi.nlm.nih.gov
Glucanases                    Henrissat B.       cermav@frgren81.bitnet
                              Beguin P.          phycel@pasteur.bitnet
G-protein coupled receptors   Chollet A.         chollet@clients.switch.ch
                              Attwood T.K.       bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins    Boguski M.S.       boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17          Landsman D.        landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Integrases                    Roy P.H.           2020000@lavalvx1.bitnet
Lipocalins                    Boguski M.S.       boguski@ncbi.nlm.nih.gov
                              Peitsch M.C.       peitsch@ulbio1.unil.ch
lysR family HTH proteins      Henikoff S.        henikoff@sparky.fhcrc.org
MAC components / perforin     Peitsch M.C.       peitsch@ulbio1.unil.ch
Myelin proteolipid protein    Hofmann K.O.       khofmann@cipvax.biolan.uni-koeln.de
PEP requiring enzymes         Reizer J.          jreizer@ucsd.edu
pfkB carbohydrate kinases     Reizer J.          jreizer@ucsd.edu
Phytochromes                  Partis M.D.        partis@gcri.afrc.ac.uk
Protein kinases               Hanks S.           hanks@vuctrvax.bitnet
                              Hunter T.          hunter@salk.bitnet






<PAGE>


PTS proteins                  Reizer J.          jreizer@ucsd.edu
Restriction-modification      Bickle T.          bickle@urz.unibas.ch
            enzymes           Roberts R.J.       roberts@cshl.org
Ribosomal protein S3          Hallick R.         hallick%biotec@arizona.edu
Ribosomal protein S15         Ellis S.R.         srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.        harayama@cmu.unige.ch
Sodium symporters             Reizer J.          jreizer@ucsd.edu
Subtilases                    Brannigan J.       jab5@vaxa.york.ac.uk
Thiol proteases               Turk B.            turk@ijs.ac.mail.yu
Thiol proteases inhibitors    Turk B.            turk@ijs.ac.mail.yu
TPR repeats                   Boguski M.S.       boguski@ncbi.nlm.nih.gov
Transit peptides              von Heijne G.      gunnar@cbts.sunet.se
Type-II membrane antigens     Levy S.            levy@cellbio.stanford.edu
Uracil-DNA glycosylase        Aasland R.         aasland@bio.uib.no
Xylose isomerase              Jenkins J.         jenkins@frira.afrc.ac.uk
WAP-type domain               Claverie J.-M.     jmc@ncbi.nlm.nih.gov
ZP domain                     Bork P.            bork@embl-heidelberg.de


Bacteriophage P4              Halling C.         chh9@midway.uchicago.edu
Drosophila                    Ashburner M.       ma11@phx.cam.ac.uk
Escherichia coli              Rudd K.            rudd@ncbi.nlm.nih.gov
Salmonella typhimurium        Rudd K.            rudd@ncbi.nlm.nih.gov
Snakes                        Stocklin R.        stocklin@cmu.unige.ch
Yeast chromosome I            Ouellette F.       francis@monod.biol.mcgill.ca



   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and which would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at one of the following electronic mail addresses:

                             bairoch@cmu.unige.ch
                           bairoch@cgecmu51.bitnet












<PAGE>




           APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:

============================================
Relationships between biomolecular databases
============================================

Last updated: March, 1992.

The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:

                                                       **********************
                        *********************** <----- * EPD [Euk. Promot.] *
                        *  EMBL Nucleotide    * -----> **********************
                        *  Sequence Data      *
***************** ----> *  Library            *        **********************
* FLYBASE       * <---- *********************** <----- * ECD [E. coli map]  *
* [Drosophila   *                ^  |       ^          **********************
* genetic maps] * --------+      |  |       |
***************** <-----+ |      |  |       +--------- **********************
                        | |      |  |       +--------- * TFD [Trans. fact.] *
                        | |      |  |       | +------> **********************
                        | |      |  |       | |
*****************       | v      |  v       v |        **********************
* REBASE        *       ***********************        * ENZYME [Nomencl.]  *
* [Restriction  * <---- *  SWISS-PROT         * <----- **********************
*  enzymes]     *       *  Protein Sequence   *            |
*****************       *  Data Bank          *            v
                        ***********************        **********************
*****************         | ^  |  ^ |  ^ |  |          * OMIM   [Diseases]  *
* PROSITE       * <-------+ |  |  | |  | |  +--------> **********************
* [Patterns]    * ----------+  |  | |  | |
*****************              |  | |  | +-----------> **********************
             |                 |  | |  +-------------- * E. coli 2D gels    *
             |                 |  | |                  **********************
             |                 |  | |
             |                 |  | +----------------> **********************
             |                 |  +------------------- * EcoGene/EcoSeq     *
             |                 v                       **********************
             |          ***********************
             +--------> * PDB [3D structures] *
                        ***********************











<PAGE>