Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 20.0

Published November 1, 1991




                    SWISS-PROT RELEASE 20.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 20.0  of SWISS-PROT  contains 22654 sequence entries, comprising
   7'500'130 amino  acids abstracted from 22830 references. This represents
   an increase of 5% over release 19. The recent growth of the data bank is
   summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130

   1.2  Source of data

   Release 20.0  has been  updated using protein sequence data from release
   29.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 28.0 of the
   EMBL Nucleotide Sequence Database.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):           4129
   Entries with pointer(s) to only EMBL entri(es):          2970
   Entries with pointer(s) to both EMBL and PIR entri(es): 15061
   Entries with no pointers lines:                           494









<PAGE>




      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 19


   2.1  Sequences and annotations

   About 890  sequences have been added since release 19, the sequence data
   of 187  existing entries  has been  updated and  the annotations of 3030
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  Aminotransferases class-II and class-III
   -  Aromatic amino acids permeases
   -  Avidin/Streptavidin
   -  Bacterial porins
   -  Bacterial regulatory proteins, asnC family
   -  Bacterial regulatory proteins, crp family
   -  Bacterial ring hydroxylating dioxygenases
   -  Beta-ketoacyl synthases
   -  C3HC4-class zinc finger proteins
   -  Chalcone and resveratrol synthases
   -  Chromo domain proteins
   -  DEAD-box family ATP-dependent helicases
   -  Flagella basal body rod proteins
   -  GATA transcription factors
   -  Glutamate / Leucine / Phenylalanine dehydrogenases
   -  Glycosyl hydrolase family 1
   -  Glycosyl hydrolase family 9
   -  Glycosyl hydrolase family 10
   -  Glycosyl hydrolase family 17
   -  Heme oxygenase
   -  High potential iron-sulfur proteins
   -  HIV envelope proteins
   -  Initiation factor 5a (eIF-5a)
   -  Interferon regulatory factors
   -  Myelin P0 protein
   -  Myelin proteolipid protein
   -  PEP-utilizing enzymes
   -  pfkB family prokaryotic carbohydrate kinases
   -  Plant lipid transfer proteins
   -  PTS Hpr component
   -  Ribosomal proteins
   -  Saposins
   -  Serine/threonine protein kinases











<PAGE>



   2.2  Changes in the feature table

        2.2.1  The LIPID key

   Until this  release SWISS-PROT  was very  inconsistent in the use of the
   feature table  to annotate  the covalent binding of lipids (fatty acids,
   prenyl groups,  or glycolipids)  to a  specific position  in  a  protein
   sequence. The  attachment of  a myristate  group was  indicated  by  the
   `MYRISTYL' key,  the attachment of a palmitate group was indicated using
   either the  `MOD_RES' or  the `BINDING' keys, prenylation and GPI-anchor
   were indicated using the `BINDING' key.

   To correct  these inconsistencies  we have  introduced in this release a
   new key `LIPID' and we have deleted the `MYRISTYL' key.

   Definition of the new key:

                LIPID  - Covalent binding of a lipidic moiety

   The  chemical  nature  of  the  bound  lipid  moiety  is  given  in  the
   description. The general format of the LIPID description field is:

        FT   LIPID       xxx    xxx       MODIFICATION (COMMENT).

   The modifications which are currently defined are the following:

   MYRISTATE          Myristate group attached through an amide bond to the
                      N-terminal glycine  residue of  the mature  form of a
                      protein [1,2].

   PALMITATE          Palmitate group  attached through a thioether bond to
                      a cysteine  residue or  through an  ester bond  to  a
                      serine or threonine residue [1,2].

   FARNESYL           Farnesyl group attached through a thioether bond to a
                      cysteine residue [3].

   GERANYL-GERANYL    Geranyl-geranyl group  attached through  a  thioether
                      bond to a cysteine residue [3].

   GPI-ANCHOR         Glycosyl-phosphatidylinositol (GPI)  group linked  to
                      the alpha-carboxyl group of the C-terminal residue of
                      the mature form of a protein [4,5].

   N-ACYL DIGLYCERIDE N-terminal  cysteine   of  the   mature  form   of  a
                      prokaryotic lipoprotein  with an  amide-linked  fatty
                      acid and  a glyceryl  group to  which two fatty acids
                      are linked by ester linkages [6].

   [1] Grand R.J.A.
       Biochem. J. 258:626-638(1989).
   [2] McLhinney R.A.J.
       Trends Biochem. Sci. 15:387-391(1990).




<PAGE>



   [3] Glomset J.A., Gelb M.H., Farnsworth C.C.
       Trends Biochem. Sci. 15:139-142(1990).
   [4] Low M.G.
       FASEB J. 3:1600-1608(1989).
   [5] Low M.G.
       Biochimica Biophysica Acta 988:427-454(1989).
   [6] Hayashi S., Wu H.C.
       J. Bioenerg. Biomembr. 22:451-471(1990).

   Examples of LIPID key feature lines:

      FT   LIPID         1      1       MYRISTATE.
      FT   LIPID        65     65       PALMITATE (BY SIMILARITY).
      FT   LIPID       354    354       GPI-ANCHOR.


        2.2.2  The VARSPLIC key

   There are  some genes  which, by  the mechanism of alternative splicing,
   encode closely  related proteins  that differs  only by  the presence or
   absence of  one or  more domains.  Generally  a  single  sequence  entry
   represents the longest form of the protein and the feature table is used
   to indicate  the regions  which differ in alternative spliced forms. The
   `VARIANT' key was used for such purpose. For example (from entry P04085;
   PGDA$HUMAN):

      FT   VARIANT     194    196       GRP -> DVR (IN SHORT FORM).
      FT   VARIANT     197    211       MISSING (IN SHORT FORM).

   It was  very difficult  to write  software tools  that could distinguish
   between the  above usage of the `VARIANT' key and the more classical use
   to describe  polymorphisms  or  natural  mutations.  We  have  therefore
   introduced  in   this  release  a  new  key  `VARSPLIC'  which  is  used
   specifically to describe splicing variants.

   The example shown above is know represented by:

      FT   VARSPLIC    194    196       GRP -> DVR (IN SHORT FORM).
      FT   VARSPLIC    197    211       MISSING (IN SHORT FORM).


   2.3  Changes in the cross-references lines (DR)

        2.3.1  Cross-references to EcoGene

   Starting with this release we have added cross-references to the EcoGene
   section  of  the  EcoSeq/EcoMap  integrated  Escherichia  coli  database
   prepared  by   Ken  Rudd   at  the  National  Center  for  Biotechnology
   Information (NCBI)  (for a description see: Rudd K.E., Miller W., Werner
   C., Ostell  J., Tolstoshev  C., and Satterfield S.G.; Nucleic Acids Res.
   (1991) 19:637-647).






<PAGE>


   These cross-references are present in the DR lines:

   Data bank identifier: ECOGENE
   Primary identifier  : EcoGene gene accession number
   Secondary identifier: Gene designation
   Example             : DR   ECOGENE; EG10075; AROC.

   The collaboration  with Ken  Rudd goes  much further  than simply adding
   these cross-references.  Thanks to  this collaboration we have been able
   to update  hundreds of  Escherichia coli  sequence entries  (to add data
   concerning the function of some proteins, to resolve sequence conflicts,
   to add references and comments, etc.), we are also using his master list
   of  sequenced   genes  to  pinpoint  missing  sequences.  We  have  also
   implemented his  gene name  nomenclature for hypothetical proteins. This
   scheme is described below.

   Unnamed Escherichia  coli hypothetical  proteins and proteins of unknown
   function are  assigned gene  names based  upon their  position on the E.
   coli genomic  physical map. They all begin with the letter `Y'. The next
   two letters  designate which  1/100th of  the map  (starting at  the thr
   locus) contain  the ORF  in the  order YAA, YAB, ..YAJ, YBA, YBB, ..YBJ,
   YCA, YCB, ..YJJ. ORF's within any one of these 100 intervals are given a
   fourth letter  (A-Z) that serves to distinguish them but is not meant to
   convey position information.

        2.3.2  FlyBase

   The official name of Michael Ashburner Drosophila Genetics Maps Database
   has been changed from `DMAP' to `FlyBase'; we have therefore changed the
   data bank  identifier in  the DR  line from  `DMAP' to `FLYBASE' for all
   cross-references to that data collection.



   2.4  Minor change in the RL lines for thesis references

   Up till  now, thesis  references have been formatted as in the following
   example:

        RL   UNPUBLISHED (1972) THESIS, GEORGE WASHINGTON UNIVERSITY, USA.

   In recognition  of the  fact  that  theses  are  generally  regarded  as
   published references  we will  format them  as follows starting with the
   current release:

        RL   THESIS (19YY), INSTITUTION_NAME, COUNTRY.

   Example:

        RL   THESIS (1972), GEORGE WASHINGTON UNIVERSITY, USA.







<PAGE>



   For those  of you  who write  software to  parse reference  blocks,  the
   presence of  the word `THESIS' as the first word on the first RL line of
   a reference  block will  thus indicate a thesis reference. The remaining
   text consists  of a  parenthesized year followed by the institution name
   followed by the country where that institution is localized.


                            3. FORTHCOMING CHANGES

   The following changes will be implemented starting with release 21.

   3.1  Change in the format of the entry names

   The dollar  sign `$'  in entry  names will be replaced by the underscore
   character `_'.  This change  is made  on the behalf of users of sequence
   analysis software  running under  the Unix  operating system,  where the
   dollar sign  is a  reserved symbol.  Example: the entry name `CYC$HUMAN'
   will be changed to `CYC_HUMAN'.

   3.2  New line type GN

   The GN  (Gene Name) line is a new line that will be used to indicate the
   name(s) of  the gene(s)  that encodes  for the  protein being described.
   Currently this  information is  found in  the DE  line as  shown in  the
   following example:

        DE   SERUM ALBUMIN PRECURSOR (GENE NAME: ALB).

   The format of the GN line will be:

        GN   NAME1[ AND|OR NAME2...].

   Examples:

        GN   ALB.
        GN   REX-1.

   It often  occurs that  more than  one gene  name has been assigned to an
   individual locus. In that case all the synonyms will be listed. The word
   `OR' separates the different designations. The first name in the list is
   assumed to be the most correct (or most current) designation. Example:

        GN   HNS OR DRDX OR OSMZ OR BGLY.

   In a few cases, multiple genes encode for an identical protein sequence.
   In that case all the different gene names will be listed. The word `AND'
   separates the designations. Example:

        GN   CECA1 AND CECA2.








<PAGE>



   In very  rare cases  (only one  occurrence has been found in the current
   release) `AND'  and `OR' could be both present. In that case parenthesis
   are used as shown in the following example:

        GN   GVPA AND (GVPB OR GVPA2).


   3.3  New line type RM

   The RM  (Reference Medline)  line will  be used  to indicate the Medline
   Unique Identifier  (UID) of  a reference.  This information is currently
   listed in  the RC  line using  the  `MEDLINE'  token  as  shown  in  the
   following example:

        RC   MEDLINE=90205618;

   The format of the RM line will be:

        RM   nnnnnnnn

   where `nnnnnnnn' is the eight digit Medline Unique Identifier (UID).

   Example:

        RM   90205618


   3.4  Secondary structure information

   Thanks to  the help of Chris Sander of the Biocomputing group at EMBL we
   are going  to add  in the  feature table  of each  sequence  entry  that
   belongs to  a protein  whose tertiary  structure is known, the secondary
   structure information  corresponding to  that protein.  Complete details
   regarding this  new feature  will be  communicated in  the next  release
   notes.



                            4. ENZYME AND PROSITE

   4.1  The ENZYME data bank

   Release 7.0 of the ENZYME data bank is distributed along with release 19
   of SWISS-PROT.  ENZYME release 7.0 contains information relative to 3072
   enzymes. The  data bank  is complete  and up  to date.  Until new enzyme
   nomenclature data  is published  we only  plan to  update the SWISS-PROT
   pointers at  each release  of the  protein sequence  data bank,  correct
   eventual errors,  and complete  the information  concerning synonyms and
   cofactors using the literature.








<PAGE>



   4.2  The PROSITE data bank

   Release 8.00  of the PROSITE data bank is distributed along with release
   20 of  SWISS-PROT. Release 8.00 contains 530 documentation chapters that
   describes 605 different patterns.

        4.2.1  What's new in release 8.0

   Since the  last major release of PROSITE (release 7.01 of June 1991), 88
   new chapters have been added and 210 chapters have been updated. The new
   chapters are:

   -  Fibrinogen beta and gamma chains C-terminal domain signature
   -  Somatomedin B domain signature
   -  Cellulose-binding domain, bacterial type
   -  Cellulose-binding domain, fungal type
   -  Zinc finger, C3HC4 type, signature
   -  IRF family signature
   -  TEA domain signature
   -  Fibrillarin signature
   -  Bacterial regulatory proteins, asnC family signature
   -  Bacterial regulatory proteins, merR family signature
   -  HMG-I and HMG-Y DNA-binding domain (A+T-hook)
   -  Chromo domain
   -  Nuclear transition protein 1 signature
   -  Ribosomal protein L6 signature
   -  Ribosomal protein L16 signature
   -  Ribosomal protein L29 signature
   -  Ribosomal protein L33 signature
   -  Ribosomal protein L19e signature
   -  Ribosomal protein L32e signature
   -  Ribosomal protein S3 signature
   -  Ribosomal protein S5 signature
   -  Ribosomal protein S14 signature
   -  Ribosomal protein S4e signature
   -  Ribosomal protein S6e signature
   -  Ribosomal protein S24e signature
   -  FMN-dependent alpha-hydroxy acid dehydrogenases active site
   -  Eukaryotic molybdopterin oxidoreductases signature
   -  Delta 1-pyrroline-5-carboxylate reductase signature
   -  Pyridine nucleotide-disulphide oxidoreductases class-II active site
   -  Respiratory chain NADH dehydrogenase 30 Kd subunit signature
   -  Respiratory chain NADH dehydrogenase 49 Kd subunit signature
   -  Bacterial ring hydroxylating dioxygenases alpha-subunit signature
   -  Heme oxygenase signature
   -  Beta-ketoacyl synthases active site
   -  Transglutaminases active site
   -  Aminotransferases class-II pyridoxal-phosphate attachment site
   -  Aminotransferases class-III pyridoxal-phosphate attachment site
   -  Phosphoserine aminotransferase signature







<PAGE>



   -  pfkB family prokaryotic carbohydrate kinases signatures
   -  Phosphoribulokinase signature
   -  Thymidine kinase cellular-type signature
   -  DNA polymerase family X signature
   -  PEP-utilizing enzymes phosphorylation site signature
   -  cAMP phosphodiesterases class-II signature
   -  Ribonuclease III family signature
   -  Ribonuclease T2 family histidine active sites
   -  Glycosyl hydrolases family 1 active site
   -  Glycosyl hydrolases family 9 active site
   -  Glycosyl hydrolases family 10 active site
   -  Glycosyl hydrolases family 17 signature
   -  Alkylbase DNA glycosidases alkA family signature
   -  Matrixins cysteine switch
   -  Amidases signature
   -  ATP synthase c subunit signature
   -  Phosphoenolpyruvate carboxykinase (ATP) signature
   -  Fructose-bisphosphate aldolase class-II signature
   -  Porphobilinogen deaminase cofactor-binding site
   -  Ferrochelatase signature
   -  Aldose 1-epimerase putative active site
   -  Methylmalonyl-CoA mutase signature
   -  Ubiquitin-activating enzyme signature
   -  Adenylosuccinate synthetase active site
   -  Argininosuccinate synthase signatures
   -  Cytochrome b559 subunits heme-binding site signature
   -  High potential iron-sulfur proteins signature
   -  Bacterioferritin signature
   -  Hemerythrins signature
   -  Avidin / Streptavidin family signature
   -  Plant lipid transfer proteins signature
   -  Aromatic amino acids permeases signature
   -  General diffusion gram-negative porins signature
   -  Eukaryotic porin signature
   -  Myelin basic protein signature
   -  Myelin P0 protein signature
   -  Myelin proteolipid protein signature
   -  Synaptophysin / synaptoporin signature
   -  Flagella basal body rod proteins signature
   -  Plant viruses icosahedral capsid proteins 'S' region signature
   -  Bacterial chemotaxis sensory transducers signature
   -  Interleukin-10 signature
   -  LIF / OSM family signature
   -  Pyrokinins signature
   -  Hok/gef family cell toxic proteins signature
   -  Lambdoid phages regulatory protein CIII signature
   -  Stathmin family signature
   -  HlyD family secretion proteins signature
   -  Seminal vesicle protein II repeats signature








<PAGE>



        4.2.2  New /SKIP-FLAG qualifier for CC lines

   Some  PROSITE  keys  such  as  those  describing  commonly  found  post-
   translational modifications  (a typical  example is N-glycosylation) are
   found in  the majority of known protein sequences. While it is generally
   useful to note their presence, some programs may want, in some cases, to
   ignore those  keys. For  this purpose  these keys are indicated with the
   following qualifier in their CC lines:

   CC   /SKIP-FLAG=TRUE;


        4.2.3  The new 3D line type

   We have  introduced a new line: 3D (3D-structure), which is used to list
   the code(s)  of X-ray  crystallography Protein  Data Bank  (PDB) entries
   that contain structural data corresponding the sequence region described
   in a PROSITE entry. The format of the 3D line is:

   3D   name; [name2;...]

   Example:

   3D   7WGA; 9WGA; 1WGC; 2WGC;



                            5. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.






















<PAGE>




                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.65   Gln (Q) 4.08   Leu (L) 9.12   Ser (S) 7.10
   Arg (R) 5.23   Glu (E) 6.27   Lys (K) 5.85   Thr (T) 5.85
   Asn (N) 4.46   Gly (G) 7.10   Met (M) 2.32   Trp (W) 1.30
   Asp (D) 5.25   His (H) 2.27   Phe (F) 3.95   Tyr (Y) 3.21
   Cys (C) 1.81   Ile (I) 5.45   Pro (P) 5.08   Val (V) 6.49

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 3029

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1303
                            2x:  548
                            3x:  308
                            4x:  189
                            5x:  131
                            6x:  100
                            7x:   73
                            8x:   47
                            9x:   64
                           10x:   32
                       11- 20x:  117
                       21-100x:   92
                         >100x:   25













<PAGE>





        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        1847          Human
         2        1596          Escherichia coli
         3        1083          Mouse
         4        1014          Rat
         5         778          Baker's yeast (Saccharomyces cerevisiae)
         6         511          Bovine
         7         435          Fruit fly (Drosophila melanogaster)
         8         374          Chicken
         9         311          Bacillus subtilis
        10         266          Rabbit
                   266          African clawed frog (Xenopus laevis)
        12         251          Vaccinia virus (strain Copenhagen)
        13         245          Pig
        14         208          Salmonella typhimurium
        15         193          Human cytomegalovirus (strain AD169)
        16         166          Bacteriophage T4
        17         160          Maize
        18         133          Rice
        19         118          Vaccinia virus (strain WR)
                   118          Tobacco
        21         113          Pea
        22         111          Wheat
        23         110          Staphylococcus aureus
        24         101          Slime mold (Dictyostelium discoideum)
                   101          Barley
        26         100          Sheep
        27          94          Fission yeast (Schizosaccharomyces pombe)
        28          93          Pseudomonas aeruginosa
        29          92          Spinach
        30          89          Pseudomonas putida
        31          86          Soybean
                    86          Neurospora crassa
        33          85          Dog
        34          84          Liverwort (Marchantia polymorpha)
        35          82          Klebsiella pneumoniae

















<PAGE>





   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    1494             1001-1100      210
                 51- 100    2500             1101-1200      131
                101- 150    3559             1201-1300      108
                151- 200    2181             1301-1400       64
                201- 250    1817             1401-1500       54
                251- 300    1634             1501-1600       32
                301- 350    1494             1601-1700       25
                351- 400    1436             1701-1800       26
                401- 450    1104             1801-1900       30
                451- 500    1188             1901-2000       23
                501- 550     881             2001-2100        9
                551- 600     619             2101-2200       25
                601- 650     438             2201-2300       31
                651- 700     319             2301-2400       11
                701- 750     297             2401-2500       13
                751- 800     232             >2500           53
                801- 850     187
                851- 900     194
                901- 950     119
                951-1000     116


   Currently the ten largest sequences are:


                            RYNR$RABIT  5037 a.a.
                            RYNR$HUMAN  5032 a.a.
                            APB$HUMAN   4563 a.a.
                            APOA$HUMAN  4548 a.a.
                            DYHC$TRIGR  4466 a.a.
                            POLG$BVDV   3988 a.a.
                            POLG$HCVA   3898 a.a.
                            POLG$HCVB   3898 a.a.
                            TRX$DROME   3759 a.a.
                            ACVA$PENCH  3746 a.a.















<PAGE>



                         APPENDIX B: ON-LINE EXPERTS


   B.1  List of on-line experts for PROSITE and SWISS-PROT


Field of expertise            Name               Email address
---------------------------   ------------------ ----------------------------
African swine fever virus     Yanez R.J.         ryanez@cbm2.uam.es
Alcohol dehydrogenases        Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt@medfys.ki.se
Aldehyde dehydrogenases       Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt@medfys.ki.se
Alpha-crystallins/HSP-20      Leunissen J.A.M.   jackl@caos.caos.kun.nl
                              de Jong W.         u629000@hnykun11.bitnet
Alpha-2-macroglobulins        Van Leuven F.      fred@blekul13.bitnet
Apolipoproteins               Boguski M.S.       boguski@ncbi.nlm.nih.gov
Arrestins                     Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Bacteriophage P4 proteins     Halling C.         chh9@midway.uchicago.edu
Beta-lactamases               Brannigan J.       jab5@vaxa.york.ac.uk
Chitinases                    Henrissat B.       cermav@frgren81.bitnet
Clusterin                     Peitsch M.C.       peitsch@ulbio1.unil.ch
CTF/NF-I                      Mermod N.          nmermod@ulys.unil.ch
Cytochromes P450              Holsztynska E.J.   ela@netcom.uucp
                                                 netcom!ela@apple.com
EF-hand calcium-binding       Cox J.A.           cox@sc2a.unige.ch
                              Kretsinger R.H.    rhk5i@virginia.bitnet
Enoyl-CoA hydratase           Hofmann K.O.       khofmann@cipvax.biolan.uni-koeln.de
fruR/lacI family HTH proteins Reizer J.          jreizer@ucsd.edu
GATA-type zinc-fingers        Boguski M.S.       boguski@ncbi.nlm.nih.gov
Glucanases                    Henrissat B.       cermav@frgren81.bitnet
                              Beguin P.          phycel@pasteur.bitnet
G-protein coupled receptors   Chollet A.         chollet@clients.switch.ch
                              Attwood T.K.       bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins    Boguski M.S.       boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17          Landsman D.        landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Integrases                    Roy P.H.           2020000@lavalvx1.bitnet
Lipocalins                    Boguski M.S.       boguski@ncbi.nlm.nih.gov
                              Peitsch M.C.       peitsch@ulbio1.unil.ch
MAC components / perforin     Peitsch M.C.       peitsch@ulbio1.unil.ch
Myelin proteolipid protein    Hofmann K.O.       khofmann@cipvax.biolan.uni-koeln.de
PEP requiring enzymes         Reizer J.          jreizer@ucsd.edu
Phytochromes                  Partis M.D.        partis@gcri.afrc.ac.uk
Prokaryotic carbohydrate      Reizer J.          jreizer@ucsd.edu
            kinases
Protein kinases               Hanks S.           hanks@vuctrvax.bitnet
                              Hunter T.          hunter@salk.bitnet
PTS proteins                  Reizer J.          jreizer@ucsd.edu
Restriction-modification      Bickle T.          bickle@urz.unibas.ch
            enzymes           Roberts R.J.       roberts@cshl.org






<PAGE>




Ribosomal protein S3          Hallick R.         hallick%biotec@arizona.edu
Ribosomal protein S15         Ellis S.R.         srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.        harayama@cmu.unige.ch
Sodium symporters             Reizer J.          jreizer@ucsd.edu
Subtilases                    Brannigan J.       jab5@vaxa.york.ac.uk
Thiol proteases               Turk B.            turk@ijs.ac.mail.yu
Thiol proteases inhibitors    Turk B.            turk@ijs.ac.mail.yu
TPR repeats                   Boguski M.S.       boguski@ncbi.nlm.nih.gov
Transit peptides              von Heijne G.      gunnar@cbts.sunet.se
Type-II membrane antigens     Levy S.            levy@cellbio.stanford.edu
Uracil-DNA glycosylase        Aasland R.         aasland@bio.uib.no
Xylose isomerase              Jenkins J.         jenkins@frira.afrc.ac.uk



   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and which would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at one of the following electronic mail addresses:

                           bairoch@cgecmu51.bitnet
                             bairoch@cmu.unige.ch
























<PAGE>




           APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:

                                                       **********************
                        *********************** <----- * EPD [Euk. Promot.] *
                        *  EMBL Nucleotide    * -----> **********************
                        *  Sequence Data      *
***************** ----> *  Library            *        **********************
* FLYBASE       *       *********************** <----- * ECD [E. coli map]  *
* [Drosophila   *                ^  |       ^          **********************
* genetic maps] * --------+      |  |       |
***************** <-----+ |      |  |       +--------- **********************
                        | |      |  |       +--------- * TFD [Trans. fact.] *
                        | |      |  |       | +------> **********************
                        | |      |  |       | |
*****************       | v      |  v       v |        **********************
* REBASE        *       ***********************        * ENZYME [Nomencl.]  *
* [Restriction  * <---- *  SWISS-PROT         * <----- **********************
*  enzymes]     *       *  Protein Sequence   *            |
*****************       *  Data Bank          *            v
                        ***********************        **********************
*****************         | ^     |  ^ | |  |          * OMIM   [Diseases]  *
* PROSITE       * <-------+ |     |  | | |  +--------> **********************
* [Patterns]    * ----------+     |  | | |
*****************                 |  | | +-----------> **********************
             |                    |  | |               * E. coli 2D gels    *
             |                    |  | |               **********************
             |                    |  | |
             |                    |  | +-------------> **********************
             |                    v  +---------------- * EcoGene/EcoSeq     *
             |          ***********************        **********************
             +--------> * PDB [3D structures] *
                        ***********************





















<PAGE>