SubmitCancel

Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 31.0

Published February 1, 1995


                    SWISS-PROT RELEASE 31.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 31.0  of SWISS-PROT  contains 43470 sequence entries, comprising
   15'335'248 amino acids abstracted from 39750 references. This represents
   an increase  of 8.3% over release 30. The recent growth of the data bank
   is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   2.0        09/86               3939               900 163
   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091
   27.0       10/93              33329            11 484 420
   28.0       02/94              36000            12 496 420
   29.0       06/94              38303            13 464 008
   30.0       10/94              40292            14 147 368
   31.0       02/95              43470            15 335 248


      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 30

   2.1  Sequences and annotations

   About 3243 sequences have been added since release 30, the sequence data
   of 493  existing entries  has been  updated and  the annotations of 6729
   entries have been revised.






<PAGE>



   2.2  What's happening with the model organisms

   As we  announced in  the last four releases we have selected a number of
   organisms that  are the  target  of  genome  sequencing  and/or  mapping
   projects and for which we intend to:

   -  Be as  complete as  possible. All sequences available at a given time
      should be  immediately included  in SWISS-PROT.  This  also  includes
      sequence corrections and updates;
   -  Provide a higher level of annotation;
   -  Provide cross-references  to specialized  database(s)  that  contain,
      among other  data, some genetic information about the genes that code
      for these proteins;
   -  Provide specific indices or documents.

   What was  done since  the last  release or  in preparation  for the next
   release:

   -  We have added four species to the list of model organisms:

        Salmonella typhimurium. We added all the missing publicly available
        protein sequences  from this  species and  we cross-referenced  the
        Salmonella sequences  to the  StyGene database  from Ken  Rudd (see
        section 2.3.2).  A new  documentation file (SALTY.TXT) is available
        (see section  2.4) that list all the Salmonella typhimurium entries
        linked to StyGene.

        Schizosaccharomyces pombe (Fission yeast). For this organism we are
        looking for  a genomic  database to  which the  relevant SWISS-PROT
        entries can  be linked. In this release we have added a significant
        number of  S.pombe sequences.  A new documentation file (POMBE.TXT)
        is   available    (see   section    2.4)   that    list   all   the
        Schizosaccharomyces pombe entries and their gene designation(s).

        Arabidopsis thaliana  (Mouse-ear cress). As of this release we have
        not yet  cross-linked the entries originating from this organism to
        a specific  genomic database  nor have  we yet  made a  significant
        effort to  enter all the Arabidopsis entries in SWISS-PROT. Such an
        effort is planned for the next release.

        Sulfolobus solfataricus.  This archebacteria  is the  target  of  a
        genomic project carried out at the Dalhousie University in Halifax,
        Canada. It is expected that data originating from this project will
        be available very soon. In preparation for this development we have
        already entered  in SWISS-PROT  all  the  available  S.solfataricus
        sequences that are publicly available.

   -  SWISS-PROT is  now cross-referenced  to  the  Bacillus  subtilis  168
      SubtiList database  designed by  Ivan Moszer of the Pasteur Institute
      in Paris  (see section 2.3.3). We now have in SWISS-PROT, 100% of the
      publicly available  protein sequences  referenced in SubtiList. A new
      documentation file (SUBTILIS.TXT) is available (see section 2.4) that
      list all the Bacillus subtilis entries linked to SubtiList.




<PAGE>



   -  SWISS-PROT is  now  cross-referenced  to  the  yeast  LISTA  database
      designed by  Patrick Linder  of the University of Geneva (see section
      2.3.1). We  have worked together to insure that the gene name used in
      LISTA always  correspond to the first one (if more than one gene name
      exist)  listed   in  the  relevant  SWISS-PROT  entry.  All  proteins
      referenced in  LISTA are  now in  SWISS-PROT. We  also made  a  major
      effort to  integrate in  this release  all the new sequence data from
      the complete  sequences  of  chromosomes  I,  V,  VIII  and  IX.  New
      documentations files are available for each of these chromosomes (see
      section 2.4).


   Here is the current status of the model organisms:

   Organism        Database                    Index file       Number of
                   cross-referenced                             sequences
   --------------  --------------------------  --------------   ---------
   A.thaliana      None yet                    In preparation         271
   B.subtilis      SubtiList                   SUBTILIS.TXT          1083
   C.elegans       WormPep                     CELEGANS.TXT           690
   D.discoideum    DictyDB                     DICTY.TXT              199
   D.melanogaster  FlyBase                     In preparation         711
   E.coli          EcoGene                     ECOLI.TXT             3151
   H.sapiens       MIM                         MIMTOSP.TXT           3067
   S.cerevisiae    LISTA                       YEAST.TXT             3144
   S.typhimurium   StyGene                     SALTY.TXT              568
   S.pombe         None yet                    POMBE.TXT              279
   S.solfataricus  None yet                    None yet                58


        Other relevant information

   We apologizes  for running  behind with  new sequencing data from the C.
   elegans genomic  project. Starting  with the  next release, we will have
   implemented a  mechanism to  insure that data from the genome project is
   directly fed into the SWISS-PROT annotation 'pipeline'.

   With the  help of the group of Elizabeth Kutter from the Evergreen State
   College, we  have revised  and added  protein sequence  entries from the
   completed genome  of phage  T4. The next release will incorporate all of
   the data  from the complete genome of the Autographa californica nuclear
   polyhedrosis virus.


   2.3  Changes in the DR line

   In this  release, we have added cross-references from SWISS-PROT to five
   additional databases.  These cross-references  are  present  in  the  DR
   lines.

        2.3.1  LISTA

   The LISTA  database of  budding yeast  (Saccharomyces cerevisiae)  genes
   coding for  proteins prepared  under the supervisation of Patrick Linder



<PAGE>



   at the University of Geneva (See: Doelz R., Mosse M.-O., Slonimski P.P.,
   Bairoch A., and Linder P.; Nucleic Acids Res. (1994), 22:3459-3461).


   Data bank identifier:    LISTA
   Primary identifier  :    Unique identifier  attributed by  LISTA to  the
                            gene coding for the protein
   Secondary identifier:    The gene designation (name)
   Example             :    DR   LISTA; SC00018; ACT1.


        2.3.2  StyGene

   The  StyGene   section  of   the  StySeq/StyMap   integrated  Salmonella
   typhimurium LT2  database, both  prepared by  Ken Rudd  at the  National
   Center for Biotechnology Information (NCBI).

   Data bank identifier:    STYGENE
   Primary identifier  :    Unique identifier  attributed by StyGene to the
                            gene coding for the protein
   Secondary identifier:    The gene designation (name)
   Example             :    DR   STYGENE; SG10312; PROV.


        2.3.3  SubtiList

   The SubtiList  relational database  for the Bacillus subtilis 168 genome
   prepared under the supervisation of Ivan Moszer at the Pasteur Institute
   (See Moszer I., Glaser P., and Danchin A.; Microbiol. (1995), In press).

   Data bank identifier:    SUBTILIST
   Primary identifier  :    Unique identifier  attributed by  SubtiList  to
                            the gene coding for the protein
   Secondary identifier:    The gene designation (name)
   Example             :    DR   SUBTILIST; BG10774; OPPD.


        2.3.4  HSSP

   The database  of Homology-derived Secondary Structure of Proteins (HSSP)
   prepared under  the supervisation  of Chris  Sander at  the  EMBL  (See:
   Sander C., and Schneider R.; Nucleic Acids Res. (1993), 21:3105-3109).

   Data bank identifier:    HSSP
   Primary identifier  :    Accession number  of a  SWISS-PROT entry cross-
                            referenced to  a PDB  entry  whose structure is
                            expected to be  similar to that of the entry in
                            which the HSSP cross-reference is present
   Secondary identifier:    Entry name of the PDB structure related to that
                            of the entry in  which the HSSP cross-reference
                            is present
   Example             :    DR   HSSP; P00438; 1DOB.





<PAGE>



        2.3.5  Transfac

   The  transcription   factor  database   (Transfac)  developed  by  Edgar
   Wingender   and    Rainer   Knueppel    from   the   Gesellschaft   fuer
   Biotechnologische Forschung mbH in Braunschweig.

   Data bank identifier:    TRANSFAC
   Primary identifier  :    Unique identifier  (accession  number)  of  the
                            Tranfac entry
   Secondary identifier:    None; a dash '-' is stored in that field
   Example             :    DR   TRANSFAC; T00141; -.


   2.4  Status of the documentation files

   SWISS-PROT is  distributed with  a large  number of documentation files.
   Some of  these files  have been  available for  a long  time  (the  user
   manual, release  notes, the  various  indices  for  authors,  citations,
   keywords, etc.),  but  many  have  been  created  recently  and  we  are
   continuously  adding  new  files.  The  following  table  list  all  the
   documents that are currently available or that will be added in the next
   few months.

   USERMAN .TXT   User manual
   RELNOTES.TXT   Release notes
   SHORTDES.TXT   Short description of entries in SWISS-PROT

   JOURLIST.TXT   List of abbreviations for journals cited
   KEYWLIST.TXT   List of keywords in use
   SPECLIST.TXT   List of organism identification codes
   EXPERTS .TXT   List of on-line experts for PROSITE and SWISS-PROT

   ACINDEX .TXT   Accession number index
   AUTINDEX.TXT   Author index
   CITINDEX.TXT   Citation index
   KEYINDEX.TXT   Keyword index
   SPEINDEX.TXT   Species index

   7TMRLIST.TXT   List of 7-transmembrane G-linked receptors entries
   CDLIST  .TXT   CD nomenclature for surface proteins of human leucocytes
   CELEGANS.TXT   Index  of   Caenorhabditis  elegans   entries  and  their
                  corresponding  gene    designations  and  WormPep  cross-
                  references
   DICTY   .TXT   Index  of  Dictyostelium  discoideum  entries  and  their
                  corresponding   gene  designations  and   DictyDB  cross-
                  references
   EC2DTOSP.TXT   Index of  Escherichia coli  Gene-protein database entries
                  referenced in SWISS-PROT
   ECOLI   .TXT   Index of  Escherichia coli  K12 chromosomal  entries  and
                  their corresponding EcoGene cross-reference
   EMBLTOSP.TXT   Index of EMBL Database entries referenced in SWISS-PROT
   EXTRADOM.TXT   Nomenclature of extracellular domains





<PAGE>



   GLYCOSYL.TXT   Index of  glycosyl hydrolases  classified by  families on
                  the basis of sequence similarities [2]
   HOXLIST .TXT   Vertebrate homeobox proteins: nomenclature and index
   HUMCHR21.TXT   Index  of  protein  sequence  entries  encoded  on  human
                  chromosome 21 [1]
   HUMCHRY .TXT   Index  of  protein  sequence  entries  encoded  on  human
                  chromosome Y [1]
   MIMTOSP .TXT   Index of MIM entries referenced in SWISS-PROT
   NOMLIST .TXT   List of nomenclature related references for proteins
   PDBTOSP .TXT   Index of Brookhaven PDB entries referenced in SWISS-PROT
   PLASTID .TXT   List of chloroplast and cyanelle encoded proteins
   POMBE   .TXT   Index of  Schizosaccharomyces pombe entries in SWISS-PROT
                  and their corresponding gene designations [1]
   RESTRIC .TXT   List of restriction enzymes and methylases entries
   RIBOSOMP.TXT   Index of ribosomal proteins classified by families on the
                  basis of sequence similarities [2]
   SALTY   .TXT   Index of  Salmonella typhimurium  LT2 chromosomal entries
                  and their corresponding StyGene cross-references [1]
   SUBTILIS.TXT   Index of  Bacillus subtilis  168 chromosomal  entries and
                  their corresponding SubtiList cross-references [1]
   YEAST   .TXT   Index  of  Saccharomyces  cerevisiae  entries  and  their
                  corresponding gene designations
   YEAST1  .TXT   Yeast Chromosome I entries [1]
   YEAST2  .TXT   Yeast Chromosome II entries
   YEAST3  .TXT   Yeast Chromosome III entries
   YEAST5  .TXT   Yeast Chromosome V entries [1]
   YEAST8  .TXT   Yeast Chromosome VIII entries [1]
   YEAST9  .TXT   Yeast Chromosome IX entries [1]
   YEAST11 .TXT   Yeast Chromosome XI entries

   Notes:

   [1]  New in release 31.
   [2]  Will be available starting with release 32 in June 1995.


   2.5  The Expasy World-Wide Web server

        2.5.1 Background information

   The World-Wide Web (WWW), which originated at CERN, is a powerful global
   information  system   merging  networked   information   retrieval   and
   hypertext. It  gives access, using hypertext links, to the documents and
   information contained  in all the existing WWW servers around the world,
   as well  as to  the data  obtainable through other information retrieval
   systems like WAIS, Gopher, X500, etc. To access a WWW server, one has to
   run on a local computer a client program (a WWW browser), which displays
   hypertext documents.  The user  can then either request a keyword search
   or jump  to another  document by following a hypertext link. WWW has the
   outstanding advantage  of extending  the hypertext  model to  the  whole
   world (by allowing hypertext jumps to documents anywhere on the internet
   network) and  by being  device and  user-interface independent (browsers
   exist for  a variety  of computers  and user-interfaces,  including Unix




<PAGE>



   workstations  running  XWindows,  MacIntoshes  and  PCs  with  Microsoft
   Windows).

   The ExPASy  WWW server  allows access, using the user-friendly hypertext
   model, to  the SWISS-PROT,  PROSITE,  ENZYME,  SWISS-2DPAGE  and  SWISS-
   3DIMAGE databases and, through any SWISS-PROT protein sequence entry, to
   other  databases   such  as   EMBL,  REBASE,  FlyBase,  GCRDb,  MaizeDB,
   SubtiList, OMIM,  PDB, HSSP,  YEPD and Medline. Using a browser which is
   able to  display images  one can also remotely access 2D gels image data
   from SWISS-2DPAGE.

   A WWW  server can  be accessed  on  the  internet  through  its  Uniform
   Resource Locator  (URL), the addressing system defined by the WWW model.
   The URL for the ExPASy WWW server is:

                           http://expasy.hcuge.ch/
   or
                            http://129.195.254.61/

   To access a WWW server, you need to run a browser (or client) program on
   your local computer. Browsers exist for a variety of machines and may be
   obtained by  anonymous ftp. ExPASy can be used with any WWW browser, but
   we recommend  either NCSA  Mosaic or  Netscape Navigator.  Both are very
   flexible and powerful browsers with a graphical user interface; they are
   available for  Unix boxes  using X11/Motif; for Apple McIntoshes and for
   Microsoft Windows. You can get them from various FTP sites, for example:

      ftp.ncsa.uiuc.edu (for Mosaic)
      ftp1.netscape.com (for Netscape)

   For more  information on  the  ExPASy  WWW  server,  you  can  read  the
   following article:

      Appel R.D., Bairoch A., Hochstrasser D.F.
      A new  generation of  information retrieval tools for biologists: the
      example of the ExPASy WWW server.
      Trends Biochem. Sci. 19:258-260(1994).

   Or you can contact Dr. Ron Appel:

      Email: appel@cih.hcuge.ch
      Fax: +41-22-372 61 98


        2.5.2 SWISS-SHOP

   Thanks to the work of Manuel Peitsch from the Geneva Glaxo Institute for
   Molecular Biology,  we can  provide, on ExPASy, an important new service
   called SWISS-SHOP. SWISS-Shop allows any users of SWISS-PROT to indicate
   what proteins  he/she is  interested in.  This can be done using various
   criteria that can be combined:

   -  By entering  one  or  more  words  that  should  be  present  in  the
      description line;



<PAGE>



   -  By entering one or more species name(s) or taxonomic division(s);
   -  By entering one or more keywords;
   -  By entering one or more author names;
   -  By entering the accession number (or entry name) of a PROSITE pattern
      or a user-defined sequence pattern;
   -  By entering  the accession  number (or  entry name)  of  an  existing
      SWISS-PROT entry or by entering a "private" sequence.

   Every week,  the new  sequences entered  in SWISS-PROT are automatically
   compared with all the criteria that have been defined by the users. If a
   sequence corresponds  to the  selection criteria defined by a user, that
   sequence is sent by electronic mail.


        2.5.3 What else is new on ExPASy

   Since the  last release,  there has been a number of new developments on
   the ExPASy WWW server. Here are some highlights of these changes:

   -  Access to  the ENZYME data bank has been fully implemented in ExPASy.
      Many different  access options  are allowed.  Hypertext links between
      ENZYME and  SWISS-PROT, PROSITE, MIM and the Japanese Ligand database
      are available.

   -  WWW  links   have  been  implemented  between  SWISS-PROT  and  HSSP,
      SubtiList, and YEPD.

   -  Cross-references from  SWISS-PROT to Flybase now use the links to the
      new server for that database (at Harvard).

   -  The  page   giving  access  to  the  SWISS-PROT  documents  has  been
      completely updated.  A new  page allows  access to  all  of  the  old
      release notes.


   2.6  Important forthcoming change

   In the next release, the RM (Reference Medline) line will be replaced by
   a more 'generic' line called RX (Reference cross-references). The format
   of that line will be:

   RX   bibliographic_database_name; identifier.

   As of the next release, the only "bibliographic_database_name" that will
   be used  will be  "MEDLINE" and the associated "identifier" is the eight
   digit  Medline  Unique  Identifier  (UID).  But  it  is  'rumored'  that
   additional bibliographic  databases are  interested to  be linked to the
   sequence databases.

   Example:

   RM   91002678





<PAGE>



   will be changed to:

   RX   MEDLINE; 91002678.


   2.7  Weekly updates of SWISS-PROT

   Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
   are updated at each update:

   new_seq.dat    Contains all the new entries since the last full release.
   upd_seq.dat    Contains the entries for which the sequence data has been
                  updated since the last release.
   upd_ann.dat    Contains the  entries for  which one  or more  annotation
                  fields have been updated since
                  the last release.

   Currently these  files are  available on  the  following  anonymous  ftp
   servers:

   Organization   ExPASy (Geneva University Expert Protein Analysis System)
   Address        expasy.hcuge.ch  (or 129.195.254.61)
   Directory      /databases/swiss-prot/updates

   Organization   National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov (or 130.14.20.1)
   Directory      /repository/swiss-prot/updates

   Organization   European Bioinformatics Institute (EBI)
   Address        ftp.ebi.ac.uk (or 193.62.196.6)
   Directory      /pub/databases/swissprot/new

   !! Important notes !!!

   Although we  try to  follow a  regular schedule,  we do  not promise  to
   update these  files every  week. In some cases two weeks will elapse in-
   between two updates.

   Due to  the current  mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free. Also,
   for the  same reason,  new  entries  do  not  contain  an  OC  (Organism
   Classification) line.



                            3. ENZYME AND PROSITE

   3.1  The ENZYME data bank

   Release 18.0  of the  ENZYME data bank is distributed with release 31 of
   SWISS-PROT. ENZYME  release 18.0  contains information  relative to 3546
   enzymes.





<PAGE>



   In this release we have added cross-references from the ENZYME data bank
   to the PROSITE data bank document entries that mention specific types of
   enzymes. These  lines are  present in  a new line type (PR) whose format
   is:

   PR   PROSITE; PSITE_DOC_AC_NB

   where 'PSITE_DOC_AC_NB'  is a  PROSITE document  entry accession number.
   Example:

   PR   PROSITE; PDOC00065;


   3.2  The PROSITE data bank

   Release 12.2  of the PROSITE data bank is distributed with release 30 of
   SWISS-PROT.  Release  12.2  contains  785  documentation  chapters  that
   describes 1029  different patterns, rules and profiles/matrices. Release
   12.2 does  not really  represent a new release; the only changes between
   releases 12.1  and 12.2  are updating  of the pointers to the SWISS-PROT
   entries whose  name have  been modified between  releases 30 and 31. The
   next release  of PROSITE  (13.0) will  be distributed with release 32 of
   SWISS-PROT.




                             WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about  annotations to be updated, if, for example, the function
   of a protein has been clarified or if new post-translational information
   has become available.






















<PAGE>



                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.57   Gln (Q) 4.02   Leu (L) 9.24   Ser (S) 7.19
   Arg (R) 5.21   Glu (E) 6.29   Lys (K) 5.88   Thr (T) 5.80
   Asn (N) 4.50   Gly (G) 6.91   Met (M) 2.36   Trp (W) 1.28
   Asp (D) 5.30   His (H) 2.24   Phe (F) 4.02   Tyr (Y) 3.21
   Cys (C) 1.73   Ile (I) 5.64   Pro (P) 4.98   Val (V) 6.51

   Asx (B) 0.002  Glx (Z) 0.002  Xaa (X) 0.02


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 4714

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 2139
                            2x:  756
                            3x:  426
                            4x:  264
                            5x:  191
                            6x:  191
                            7x:  117
                            8x:   80
                            9x:  102
                           10x:   46
                       11- 20x:  185
                       21- 50x:  128
                       51-100x:   43
                         >100x:   45













<PAGE>



        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        3151          Escherichia coli
         2        3144          Baker's yeast (Saccharomyces cerevisiae)
         3        3067          Human
         4        1843          Mouse
         5        1683          Rat
         6        1083          Bacillus subtilis
         7         735          Bovine
         8         711          Fruit fly (Drosophila melanogaster)
         9         690          Caenorhabditis elegans
        10         573          Chicken
        11         568          Salmonella typhimurium
        12         449          African clawed frog (Xenopus laevis)
        13         403          Rabbit
        14         357          Pig
        15         279          Fission yeast (Schizosaccharomyces pombe)
        16         275          Bacteriophage T4
        17         271          Arabidopsis thaliana (Mouse-ear cress)
        18         258          Maize
        19         251          Vaccinia virus (strain Copenhagen)
        20         219          Rice
        21         214          Pseudomonas aeruginosa
        22         199          Slime mold (Dictyostelium discoideum)
        23         195          Tobacco
        24         193          Human cytomegalovirus (strain AD169)
        25         183          Vaccinia virus (strain WR)
        26         180          Pea
        27         173          Wheat
        28         168          Barley
        29         154          Staphylococcus aureus
        30         153          Dog
        31         151          Marchantia polymorpha (Liverwort)
                   151          Neurospora crassa
        33         147          Soybean
        34         146          Variola virus
        35         144          Pseudomonas putida
                   144          Rhodobacter capsulatus
        37         141          Sheep
        38         135          Spinach
        39         131          Klebsiella pneumoniae
        40         120          Bacillus stearothermophilus
        41         117          Tomato
        42         116          Agrobacterium tumefaciens
        43         111          Potato
        44         107          Rhizobium meliloti
        45         102          Lactococcus lactis (subsp. lactis)









<PAGE>



   A.3  Repartition of the sequences by size

               From   To  Number             From   To   Number
                  1-  50    2381             1001-1100      404
                 51- 100    4198             1101-1200      292
                101- 150    5781             1201-1300      214
                151- 200    4200             1301-1400      140
                201- 250    3732             1401-1500      127
                251- 300    3303             1501-1600       70
                301- 350    3099             1601-1700       57
                351- 400    3136             1701-1800       52
                401- 450    2365             1801-1900       58
                451- 500    2477             1901-2000       38
                501- 550    1775             2001-2100       21
                551- 600    1247             2101-2200       51
                601- 650     898             2201-2300       55
                651- 700     682             2301-2400       22
                701- 750     629             2401-2500       26
                751- 800     497             >2500          132
                801- 850     375
                851- 900     405
                901- 950     281
                951-1000     250


   A.4  Longest sequences

   The longest sequences (>=4000 residues) are listed here:

                               HTS1_COCCA  5217
                               FAT_DROME   5147
                               RYNR_RABIT  5037
                               RYNR_HUMAN  5032
                               RYNC_RABIT  4969
                               DYHC_DICDI  4725
                               DYHC_RAT    4644
                               DYHC_DROME  4639
                               APB_HUMAN   4563
                               APOA_HUMAN  4548
                               RRPA_CVMJH  4488
                               DYHC_TRIGR  4466
                               DYHC_ANTCR  4466
                               GRSB_BACBR  4451
                               PKSK_BACSU  4447
                               PKSL_BACSU  4427
                               PLEC_RAT    4140
                               DYHC_YEAST  4092
                               RRPA_CVH22  4085









<PAGE>



   A.5  List of the most cited journals in SWISS-PROT

   Citations            Journal abbreviation

   4517                 J. BIOL. CHEM.
   3097                 NUCLEIC ACIDS RES.
   2886                 PROC. NATL. ACAD. SCI. U.S.A.
   1869                 J. BACTERIOL.
   1571                 FEBS LETT.
   1543                 GENE
   1454                 EUR. J. BIOCHEM.
   1313                 EMBO J.
   1263                 NATURE
   1215                 BIOCHEM. BIOPHYS. RES. COMMUN.
   1183                 BIOCHEMISTRY
    933                 J. MOL. BIOL.
    931                 BIOCHIM. BIOPHYS. ACTA
    914                 CELL
    860                 MOL. CELL. BIOL.
    738                 MOL. GEN. GENET.
    689                 VIROLOGY
    650                 BIOCHEM. J.
    647                 PLANT MOL. BIOL.
    566                 SCIENCE
    539                 J. BIOCHEM.
    509                 MOL. MICROBIOL.
    443                 J. VIROL.
    392                 J. GEN. VIROL.
    284                 J. CELL BIOL.
    264                 GENOMICS
    253                 GENES DEV.
    249                 BIOL. CHEM. HOPPE-SEYLER
    229                 CURR. GENET.
    218                 ARCH. BIOCHEM. BIOPHYS.
    214                 YEAST
    213                 HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
    212                 J. IMMUNOL.
    190                 MOL. BIOCHEM. PARASITOL.
    184                 J. GEN. MICROBIOL.
    173                 MOL. ENDOCRINOL.
    164                 INFECT. IMMUN.
    156                 J. CLIN. INVEST.
    149                 ONCOGENE
    143                 PLANT PHYSIOL.
    142                 DNA
    137                 FEMS MICROBIOL. LETT.
    132                 HUM. MOL. GENET.
    129                 J. EXP. MED.
    120                 AM. J. HUM. GENET.
    117                 J. MOL. EVOL.
    111                 GENETICS
    102                 BLOOD
    101                 AGRIC. BIOL. CHEM.




<PAGE>


           APPENDIX B: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:

                         ***********************
******************       *  EMBL Nucleotide    *       **********************
* EPD [Euk.Prom] * <---> *  Sequence Data      * <---- * ECD [E. coli map]  *
******************       *  Library      [EBI] *       **********************
                         ***********************
                          ^  ^ ^  ^  ^  ^  ^
******************        |  | |  I  |  |  |
* FlyBase        * <------+  | |  I  |  |  |           **********************
******************        |  | |  I  |  |  +---------> * GCRDb [7TM recep.] *
                          |  | |  I  |  |  |           **********************
******************        |  | |  I  |  |  |
* SubtiList      * <---------+ |  I  |  |  |           **********************
******************        |  | |  I  |  +------------> * EcoGene [E.coli]   *
                          |  | |  I  |  |  |           **********************
******************        |  | |  I  |  |  |
* MaizeDb        * <-----------+  I  |  |  |           **********************
******************        |  | |  I  +---------------> * LISTA (Yeast)      *
                          |  | |  I  |  |  |           **********************
******************        |  | |  I  |  |  |
* WormPep        *        |  | |  I  |  |  |           **********************
* [C.elegans]    * <----+ |  | |  I  |  |  |  +------> * DictyDB [D.disco.] *
******************      | |  | |  I  |  |  |  |        **********************
                        | |  | |  I  |  |  |  |
******************      | v  v v  v  v  v  v  v        **********************
* REBASE         *      ***********************        * ENZYME [Nomencl.]  *
* [Restriction   * <--- *  SWISS-PROT         * <----- **********************
*  enzymes]      *      *  Protein Sequence   *            |
******************      *  Data Bank          *            v
                        ***********************        **********************
******************      ^ ^ ^  ^ ^  ^ | ^ ^ |          * OMIM   [Diseases]  *
* StyGene        *      | | |  | |  | | | | +--------> **********************
* [S.Typhimurium]* <----+ | |  | |  | | | |
******************        | |  | |  | | | |            **********************
                          | |  | |  | | | +----------> * ECO2DBASE     [2D] *
******************        | |  | |  | | |              **********************
* Transfac       * <------+ |  | |  | | |
******************          |  | |  | | |              **********************
                            |  | |  | | +------------> * SWISS-2DPAGE  [2D] *
******************          |  | |  | |                **********************
* PROSITE        * <--------+  | |  | |
* [Patterns]     *             | |  | |                **********************
******************             | |  | +--------------> * Aarhus/Ghent  [2D] *
             |                 | |  |                  **********************
             |                 | |  |
             |                 | |  +----------------> **********************
             |                 | |                     * YEPD (Yeast)  [2D] *
             |                 | +-----------------+   **********************
             |                 v                   |
             |          ***********************    +-> **********************
             +--------> * PDB [3D structures] * <----- * HSSP (3D simil.)   *
                        ***********************        **********************


<PAGE>