Skip Header

You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.

Swiss-Prot release 34.0

Published October 1, 1996


                    SWISS-PROT RELEASE 34.0 RELEASE NOTES


                               1.  INTRODUCTION

   Release 34.0  of SWISS-PROT contains 59'021 sequence entries, comprising
   21'210'389  amino   acids  abstracted   from  50'052   references.  This
   represents an  increase of 14.5% over release 33. The growth of the data
   bank is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   2.0        09/86               3939               900 163
   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091
   27.0       10/93              33329            11 484 420
   28.0       02/94              36000            12 496 420
   29.0       06/94              38303            13 464 008
   30.0       10/94              40292            14 147 368
   31.0       02/95              43470            15 335 248
   32.0       11/95              49340            17 385 503
   33.0       02/96              52205            18 531 384
   34.0       10/96              59021            21 210 389



      2.  DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 33

   2.1  Sequences and annotations

   6'892 sequences  have been  added since release 33, the sequence data of
   1118 existing  entries has  been updated  and the  annotations of 10'629
   entries have been revised.

   2.2  What's happening with the model organisms

   We have  selected a  number of  organisms that  are the target of genome
   sequencing and/or mapping projects and for which we intend to:

   -  Be as  complete as  possible. All sequences available at a given time
      should be  immediately included  in SWISS-PROT.  This  also  includes
      sequence corrections and updates;
   -  Provide a higher level of annotation;
   -  Provide cross-references  to specialized  database(s)  that  contain,
      among other  data, some genetic information about the genes that code
      for these proteins;
   -  Provide specific indices or documents.

   What was  done since  the last  release or  in preparation  for the next
   release concerning model organisms:

   -  We have  added  Mycobacterium  tuberculosis  to  the  list  of  model
      organisms. The  genome  of  this  important  pathogenic  bacteria  is
      currently being  sequenced at the Sanger Genome Center in Hinxton. We
      have already annotated 474 putative proteins from M.tuberculosis.

   -  We have  continued our  effort in  catching up  with the  backlog  of
      sequences from eukaryotic model organisms. In particular we added 687
      entries from  yeast, 525  from human,  316  from  S.pombe,  202  from
      C.elegans, 62 from A.thaliana and 92 from Drosophila.

   -  We have  added in SWISS-PROT, all the sequences from yeast chromosome
      VII and XIV. We plan to integrate data from the remaining chromosomes
      (IV, XII, XIII, XV and XVI) very soon so as to have a complete set of
      annotated yeast sequences.

   -  We plant  to finish,  for the  next release,  the annotation  of  the
      Haemophilus influenzae  and Mycoplasma  genitalium  sequence  entries
      which are not yet part of SWISS-PROT.


   Here is the current status of the model organisms:


   Organism         Database               Index file       Number of
                    cross-referenced                        sequences
   --------------   ---------------------  --------------   ---------
   A.thaliana       None yet               In preparation         562
   B.subtilis       SubtiList              SUBTILIS.TXT          1783
   C.albicans       None yet               CALBICAN.TXT           124
   C.elegans        WormPep                CELEGANS.TXT          1208
   D.discoideum     DictyDB                DICTY.TXT              265
   D.melanogaster   FlyBase                In preparation         910
   E.coli           EcoGene                ECOLI.TXT             3606
   H.influenzae     None yet               HAEINFLU.TXT          1591
   H.sapiens        MIM                    MIMTOSP.TXT           4000
   M.genitalium     None yet               In preparation         425
   M.tuberculosis   None yet               None yet               474
   S.cerevisiae     LISTA/SGD              YEAST.TXT             4340
   S.typhimurium    StyGene                SALTY.TXT              617
   S.pombe          None yet               POMBE.TXT              956
   S.solfataricus   None yet               None yet                42


   Collectively the  entries from the above model organisms represent 35.4%
   of all SWISS-PROT entries.



   2.3  Change in the GN line

   Starting with  release 34,  we allow  more than  a single  GN line to be
   present in  an entry.  This small change was rendered necessary to allow
   the representation  of all  gene names for a number of protein sequences
   encoded by a multiplicity of genes or for genes with many synonyms.

   Examples:

   GN   (MSP-31 OR R05F9.13) AND (MSP-40 OR C33F10.9) AND (MSP-142 OR
   GN   K05F1.2) AND C34F11.4 AND F58A6.8 AND K07F5.1 AND ZK1248.6.

   GN   (RPL44A OR RPL44 OR SCL41A OR RPL41A OR YNL162W OR N1722) AND
   GN   (RPL44B OR RPL44 OR SCL41B OR RPL41B OR MAK18 OR YHR141C).


   2.4  Changes concerning cross-references (DR line)

   We have  added cross-references  from SWISS-PROT  to the Maize genome 2D
   Electrophoresis database.  These cross-references  are present in the DR
   lines:

   Data bank identifier:  MAIZE-2DPAGE
   Primary identifier:    The protein spot unique identifier [1]
   Secondary identifier:  The tissue of origin [2]
   Example:               MAIZE-2DPAGE; P80607; COLEOPTILE.


   [1]  The Maize-2PAGE  database uses SWISS-PROT primary accession numbers
      as the  alphanumeric designation  of spots  that are linked to SWISS-
      PROT entries
   [2]  Currently only `COLEOPTILE' is used.


   Small changes  have been  made to  the syntax of cross-references to the
   MIM and REBASE databases:

   o  In DR  lines pointing  to MIM, the secondary identifier which used to
      be the  release number  of that  database has  been replaced by a '-'
      (dash). This  change became  necessary due to the fact the MIM is now
      updated on a daily basis and that there are no longer release numbers
      for this database.

   o  REBASE  has  recently  introduced  accession  numbers.  We  therefore
      changed the  format of  DR lines  pointing to  this database. The new
      REBASE accession  numbers are  used as  primary identifiers  and  the
      names of the restriction systems as secondary identifiers.

   Examples:

   DR   MIM; 249900; -.
   DR   REBASE; RB0005; ECORI.



                             3.0  PLANNED CHANGES

   3.1  Accession numbers

   With the  creation of  the TREMBL database (see section 6) and the rapid
   increase in  the amount of sequence data, we are faced with a problem of
   availability of  accession numbers. Currently we use a system based on a
   one-letter prefix followed by 5 digits. This system was also used by the
   nucleotide sequence  databases which  had originally reserved for SWISS-
   PROT the prefix letters 'P' and 'Q'. The nucleotide databases having run
   out of  space (due  mainly to  EST's), have been forced to start using a
   new format based on a two-letter prefix followed by 6 digits.

   We will  soon have used up all possible numbers with 'P' and 'Q' and the
   only letter prefix which was not used by the nucleotide database is 'O'.
   As we  believe that changing the format of the accession numbers to that
   used now  by the nucleotide database would create havocs on the numerous
   software packages  using SWISS-PROT, we have decided to keep a system of
   accession numbers  based on a six-character code, but with the following
   planned changes:

   1) As soon  as we  have used  up all  'P' and 'Q' numbers, we will start
      using 'O'.  This extra  letter should  allow the  continuation of the
      present format (1 prefix letter + 5 digits) for at least a year.

   2) When we  will have  finished using up 'O', we will introduce a system
      based on the following format:

       1        2       3          4            5            6
       [O,P,Q]  [0-9]  [A-Z, 0-9]  [A-Z, 0-9]   [A-Z, 0-9]   [0-9]

      What the  above means is that we  will keep a six-character code, but
      that in  positions 3, 4 and 5 of this code any combination of letters
      and numbers  can be present. This format allows a total of 14 million
      accession numbers (up from 300'000 with the current system).

      We only  allow numbers  in positions  2 and  6 so that the SWISS-PROT
      accession numbers  can not  be mistaken  with gene  names,  acronyms,
      other type of accession numbers or any type of words !

      Examples: P0A3S2, Q2ASD4, O13YX2, P9B123



   3.2  Introduction of a new CC line-type topic (DATABASE)

   There are  an increasing  number of databases that caters for a specific
   protein or  a for  a very  limited number  of proteins.  Most  of  these
   databases are  mutation databases, reporting defects linked to a genetic
   disease. We  want to  add cross-references  to these databases when they
   are available  electronically, either by WWW or by FTP. We are therefore
   adding, in  the next  release, a  new comments  (CC) line-type  'topic':
   "DATABASE" whose syntax will be the following:

   CC   -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][;
            FTP="Address"].

   Where:

   NAME is the name of the database;
   NOTE is an optional free text note;
   WWW  is the WWW address (URL) of the database;
   FTP  is the  anonymous FTP  address (including the directory name) where
        the database file(s) are stored.


   Examples of its usage:

   CC   -!- DATABASE: NAME=CD40LBASE;
   CC       WWW="HTTP://www.expasy.ch/www/cd40lbase.html";
   CC       FTP="www.expasy.ch/databases/cd40lbase".

   CC   -!- DATABASE: NAME=HAEMB; NOTE=HAEMOPHILIA B DATABASE;
   CC       FTP="ftp.ebi.ac.uk/pub/databases/haemb/".

   Please note  that this  is the  first part  of SWISS-PROT to allow lower
   case characters (yes, we plan to go to mixed cases soon !).



                    4.  STATUS OF THE DOCUMENTATION FILES

   SWISS-PROT is  distributed with  a large  number of documentation files.
   Some of  these files  have been  available for  a long  time  (the  user
   manual, release  notes, the  various  indices  for  authors,  citations,
   keywords, etc.),  but  many  have  been  created  recently  and  we  are
   continuously adding  new files.  Since release  33, we  have added 6 new
   document files.  The following  table list  all the  documents that  are
   either currently  available or  that we  plan to  add in  the  next  few
   months.

   USERMAN .TXT   User manual
   RELNOTES.TXT   Release notes
   SHORTDES.TXT   Short description of entries in SWISS-PROT

   JOURLIST.TXT   List of abbreviations for journals cited
   KEYWLIST.TXT   List of keywords in use
   SPECLIST.TXT   List of organism identification codes
   TISSLIST.TXT   List of tissues (in RC line) [1]
   EXPERTS .TXT   List of on-line experts for PROSITE and SWISS-PROT
   SUBMIT  .TXT   Submission of sequence data to SWISS-PROT

   ACINDEX .TXT   Accession number index
   AUTINDEX.TXT   Author index
   CITINDEX.TXT   Citation index
   KEYINDEX.TXT   Keyword index
   SPEINDEX.TXT   Species index

   7TMRLIST.TXT   List of 7-transmembrane G-linked receptors entries
   AATRNASY.TXT   List of aminoacyl-tRNA synthetases
   ALLERGEN.TXT   Nomenclature and index of allergen sequences
   CALBICAN.TXT   Index of Candida albicans entries and their corresponding
                  gene designations
   CDLIST  .TXT   CD nomenclature for surface proteins of human leucocytes
   CELEGANS.TXT   Index  of   Caenorhabditis  elegans   entries  and  their
                  corresponding gene designations and WormPep cross-
                  references
   DICTY   .TXT   Index  of  Dictyostelium  discoideum  entries  and  their
                  corresponding gene designations and DictyDB cross-
                  references
   EC2DTOSP.TXT   Index of  Escherichia coli  Gene-protein database entries
                  referenced in SWISS-PROT
   ECOLI   .TXT   Index of  Escherichia coli  K12 chromosomal  entries  and
                  their corresponding EcoGene cross-references
   EMBLTOSP.TXT   Index of EMBL Database entries referenced in SWISS-PROT
   EXTRADOM.TXT   Nomenclature of extracellular domains
   GLYCOSID.TXT   Classification of  glycosyl hydrolases families and index
                  of glycosyl hydrolase entries
   HAEINFLU.TXT   Index of Haemophilus influenzae RD chromosomal entries
   HOXLIST .TXT   Vertebrate homeotic Hox proteins: nomenclature and index
   HUMCHR20.TXT   Index  of  protein  sequence  entries  encoded  on  human
                  chromosome 20 [1]
   HUMCHR21.TXT   Index  of  protein  sequence  entries  encoded  on  human
                  chromosome 21
   HUMCHR22.TXT   Index  of  protein  sequence  entries  encoded  on  human
                  chromosome 22
   HUMCHRX .TXT   Index  of  protein  sequence  entries  encoded  on  human
                  chromosome X [1]
   HUMCHRY .TXT   Index  of  protein  sequence  entries  encoded  on  human
                  chromosome Y
   MIMTOSP .TXT   Index of MIM entries referenced in SWISS-PROT
   MYGENIT .TXT   Index of Mycoplasma genitalium chromosomal entries [2]
   NOMLIST .TXT   List of nomenclature related references for proteins
   PDBTOSP .TXT   Index of  X-ray crystallography  Protein Data  Bank (PDB)
                  entries referenced in SWISS-PROT
   PEPTIDAS.TXT   Classification  of   peptidase  families   and  index  of
                  peptidase entries
   PLASTID .TXT   List of chloroplast and cyanelle encoded proteins
   POMBE   .TXT   Index of  Schizosaccharomyces pombe entries in SWISS-PROT
                  and their corresponding gene designations
   RESTRIC .TXT   List of restriction enzyme and methylase entries
   RIBOSOMP.TXT   Index of ribosomal proteins classified by families on the
                  basis of sequence similarities [1]
   SALTY   .TXT   Index of  Salmonella typhimurium  LT2 chromosomal entries
                  and their corresponding StyGene cross-references
   SUBTILIS.TXT   Index of  Bacillus subtilis  168 chromosomal  entries and
                  their corresponding SubtiList cross-references
   YEAST   .TXT   Index  of  Saccharomyces  cerevisiae  entries  and  their
                  corresponding gene designations
   YEAST1  .TXT   Yeast Chromosome I entries
   YEAST2  .TXT   Yeast Chromosome II entries
   YEAST3  .TXT   Yeast Chromosome III entries
   YEAST5  .TXT   Yeast Chromosome V entries
   YEAST6  .TXT   Yeast Chromosome VI entries
   YEAST7  .TXT   Yeast Chromosome VII entries [1]
   YEAST8  .TXT   Yeast Chromosome VIII entries
   YEAST9  .TXT   Yeast Chromosome IX entries
   YEAST10 .TXT   Yeast Chromosome X entries
   YEAST11 .TXT   Yeast Chromosome XI entries
   YEAST13 .TXT   Yeast Chromosome XIII entries [2]
   YEAST14 .TXT   Yeast Chromosome XIV entries [1]

   Notes:

   [1]  New in release 34.
   [2]  Will be available starting with release 35 of February 1997.


   We have  continued to  include in  some SWISS-PROT  document  files  the
   references of  World-Wide  Web  sites  relevant  to  the  subject  under
   consideration. There are now 12 documents that include such links.



                     5.  THE EXPASY WORLD-WIDE WEB SERVER

   5.1  Background information

   The most  efficient and  user-friendly way  to browse  interactively  in
   SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and other databases. is to use
   the World-Wide  Web (WWW)  molecular biology  server ExPASy.  WWW  is  a
   global information  retrieval system  merging the  power  of  world-wide
   networks, hypertext  and multimedia.  Through hypertext  links, it gives
   access to  documents and  information available  on thousands of servers
   around the  world. To  access a  WWW server  one needs  a  WWW  browser.
   Currently, the  most popular  browser  is  Netscape  Navigator(TM)  from
   Netscape Communications Corp. (available from ftp.netscape.com). Using a
   WWW browser, one has access to all the hypertext documents stored on the
   ExPASy server as well as many other WWW servers.

   The ExPASy server was made available to the public in September 1993. On
   October 1996  a cumulative  total of 8 million connections was attained.
   It may  be accessed  through its  Uniform Resource  Locator (URL  -  the
   addressing system defined in WWW), which is:

        http://www.expasy.ch/

   The ExPASy  WWW server  allows access, using the user-friendly hypertext
   model, to  the SWISS-PROT,  PROSITE, ENZYME, SWISS-2DPAGE, SWISS-3DIMAGE
   and CD40Lbase  databases and,  through any  SWISS-PROT protein  sequence
   entry, to  other databases  such as  EMBL, Eco2DBASE,  EcoCyc,  FlyBase,
   GCRDb, MaizeDB,  SubtiList/NRSub, OMIM,  PDB, HSSP, ProDom, REBASE, SGD,
   YEPD and  Medline. Using  a browser  which is able to display images one
   can also  remotely access  2D gels  image data from SWISS-2DPAGE. ExPAsy
   also offers  many tools  for the  analysis of  protein sequences  and 2D
   gels.

   For more  information on  the  ExPASy  WWW  server,  you  can  read  the
   following article:

      Appel R.D., Bairoch A., Hochstrasser D.F.
      A new  generation of  information retrieval tools for biologists: the
      example of the ExPASy WWW server.
      Trends Biochem. Sci. 19:258-260(1994).

   Or you can contact Dr. Ron Appel:

      Email: ron.appel@dim.hcuge.ch


   5.2  SWISS-SHOP

   Thanks to the work of Manuel Peitsch from the Geneva Glaxo Institute for
   Molecular Biology,  we can  provide, on ExPASy, a  service called SWISS-
   SHOP. SWISS-Shop  allows  any  users  of  SWISS-PROT  to  indicate  what
   proteins he/she  is interested  in.  This  can  be  done  using  various
   criteria that can be combined:

   -  By entering  one  or  more  words  that  should  be  present  in  the
      description line;
   -  By entering one or more species name(s) or taxonomic division(s);
   -  By entering one or more keywords;
   -  By entering one or more author names;
   -  By entering the accession number (or entry name) of a PROSITE pattern
      or a user-defined sequence pattern;
   -  By entering  the accession  number (or  entry name)  of  an  existing
      SWISS-PROT entry or by entering a "private" sequence.

   Every week,  the new  sequences entered  in SWISS-PROT are automatically
   compared with all the criteria that have been defined by the users. If a
   sequence corresponds  to the  selection criteria defined by a user, that
   sequence is sent by electronic mail.


   5.3  What is new on ExPASy

   Since  the   last  release,  there  has  been  a  large  number  of  new
   developments on the ExPASy WWW server. Here are some highlights of these
   changes:

   -  CD40Lbase, The  European CD40L  Defect Database  prepared  by  Manuel
      Peitsch, has  been made accessible through the ExPASy WWW server. The
      purpose of  CD40Lbase is  to collect  clinical and  molecular data on
      CD40 ligand defects leading to X-linked Hyper-IgM syndrome.

   -  Two new tool are available from the "Tools" page:

      PeptideMass: this  program is  designed to  calculate the theoretical
      masses of peptides generated by the chemical or enzymatic cleavage of
      proteins,  to   assist  in   the  interpretation   of  peptide   mass
      fingerprinting and  peptide mapping  experiments.  When  proteins  of
      interest are  specified from  SWISS-PROT, the  program considers  all
      annotations for that protein in the database, and uses these in order
      to generate  the correct peptide masses and warn users about peptides
      that are  not likely  to  be  found  when  undertaking  peptide  mass
      fingerprinting. Many  protein post-translational  modifications which
      affect the masses of peptides can thus be taken into consideration.

      TagIdent: this  a protein  identification tool  which improves on and
      superspeed the tool previously known as 'GuessProt'. The user can now
      identify  proteins  from  2-D  gels  by  giving  protein  pI  and  MW
      estimates, a  species or  organism classification  of interest, and a
      short sequence  tag of  up to  6 amino acids. This tag can be derived
      from the  N-terminus, the  C-terminus or  from internal peptides of a
      protein. The  results are  now sent  to the  user by e-mail, allowing
      many searches to be done at the same time.

   -  In PROSITE  and Enzyme,  we have  added the  possibility to  save all
      referenced SWISS-PROT entries to a user-defined file on our anonymous
      FTP server "outgoing" directory.

   -  At the  end of  each page displaying a SWISS-PROT entry we have added
      links to  some of our sequence analysis tools so as to allow users to
      directly submit the displayed sequence to these tools.

   -  An email  option has  been added to the tool ScanProsite, if you want
      to scan  a pattern  against SWISS-PROT,  you have  now the  option of
      having sent  the results  of your  query by email, which should avoid
      previously frequent  timeout problems  and is particularly useful for
      complex patterns.

   -  WWW links  have  been  implemented  between  SWISS-PROT  entries  and
      nucleotide entries from DDBJ, the DNA Data Bank of Japan (in addition
      to the  existing links  to EMBL  at EBI and GenBank at NCBI). We have
      also added  direct WWW  links to:  SubtiList, the  Bacillus  subtilis
      genomic database (http://www.pasteur.fr/Bio/SubtiList.html); YPD, the
      Yeast Protein  Database (http://quest7.proteome.com/YPDhome.html) and
      ECO2DBASE,     the      Escherichia     coli      2DPAGE     database
      (http://pcsf.brcf.med.umich.edu/eco2dbase).

   -  Links have  been established  from most  feature (FT) lines of SWISS-
      PROT entries  to  pages  that  highlight the subsequence in question,
      both in 1- and in 3-letter amino acid codes.

   -  2D Hunt,  a database  created and  continuously updated by the Marvin
      (http://www.hon.ch/MedHunt/Marvin.html) robot contains  sites related
      to electrophoresis and  more specifically  to 2-D electrophoresis. It
      is accessible from the SWISS-2DPAGE top page of ExPASy.

   -  We have  continued to build a list of Biomolecular servers, this list
      is available on the ExPASy top page or directly from:

                http://www.expasy.ch/www/amos_www_links.html

   -  Many other changes have been made to all parts of the server.


                   6.  TREMBL - A SUPPLEMENT TO SWISS-PROT

   The ongoing  genome sequencing  and mapping  projects have  dramatically
   increased the number of protein sequences to be incorporated into SWISS-
   PROT. Since we do not want to dilute the quality standards of SWISS-PROT
   by incorporating  sequences into  the database  without proper  sequence
   analysis and  annotation, we  cannot speed  up the  incorporation of new
   incoming data  indefinitely. But  as we  also want to make the sequences
   available as  fast as  possible, we  have introduced  with SWISS-PROT an
   computer annotated  supplement. This  supplement consists  of entries in
   SWISS-PROT-like format  derived  from  the  translation  of  all  coding
   sequences (CDS)  in the  EMBL nucleotide sequence database, except those
   already  included   in  SWISS-PROT.   We  name  this  supplement  TREMBL
   (TRanslation from  EMBL). It  can be considered as a preliminary section
   of SWISS-PROT.  TREMBL is split in two main sections; SP-TREMBL and REM-
   TREMBL:

   SP-TREMBL (SWISS-PROT TREMBL) contains the entries (86'040) which should
   be incorporated  into SWISS-PROT. SWISS-PROT accession numbers have been
   assigned for all SP-TREMBL entries.

   REM-TREMBL (REMaining  TREMBL) contains  the entries (19'255) that we do
   not want  to include  in SWISS-PROT  for a variety of reasons (synthetic
   sequences, pseudogenes,  translations of  uncorrect open reading frames,
   fragments with  less than  eight amino  acids, patent-derived sequences,
   immunoglobulins and T-cell receptors, etc.)

   TREMBL is  available by  FTP from  the EBI server (ftp.ebi.ac.uk) in the
   directory '/pub/databases/trembl'.  It can  be queried on WWW by the EBI
   SRS server (http://www.ebi.ac.uk/srs/srsc). It  is also available on the
   SWISS-PROT CD-ROM and is searchable on the FASTA and BLITZ email servers
   of the EBI.



                       7.  WEEKLY UPDATES OF SWISS-PROT

   Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
   are updated at each update:

   new_seq.dat    Contains all the new entries since the last full release;
   upd_seq.dat    Contains the entries for which the sequence data has been
                  updated since the last release;
   upd_ann.dat    Contains the  entries for  which one  or more  annotation
                  fields have been updated since the last release.

   Currently these  files are  available on  the  following  anonymous  ftp
   servers:

   Organization   ExPASy (Geneva University Expert Protein Analysis System)
   Address        www.expasy.ch
   Directory      /databases/swiss-prot/updates

   Organization   National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov
   Directory      /repository/swiss-prot/updates

   Organization   European Bioinformatics Institute (EBI)
   Address        ftp.ebi.ac.uk
   Directory      /pub/databases/swissprot/new

   Organization   Bioinformatics Unit, Weizmann Institute of Science (WIS)
   Address        bioinformatics.weizmann.ac.il
   Directory      /pub/databases/swiss-prot/updates


   !! Important notes !!!

   Although we  try to  follow a  regular schedule,  we do  not promise  to
   update these  files every  week. In some cases two weeks will elapse in-
   between two updates.

   Due to  the current  mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free.



                            8.  ENZYME AND PROSITE

   8.1  The ENZYME data bank

   Release 21.0  of the  ENZYME data bank is distributed with release 34 of
   SWISS-PROT. ENZYME  release 21.0  contains information  relative to 3646
   enzymes.

   8.2  The PROSITE data bank

   Release 13.2  of the PROSITE data bank is distributed with release 34 of
   SWISS-PROT. This  release of  PROSITE contains 889 documentation entries
   that describe  1'167 different  patterns, rules  and  profiles/matrices.
   Release 13.2  does not  really represent a new release; the only changes
   between releases  13.0 and  13.2 are  updating of  the pointers  to  the
   SWISS-PROT entries whose name have been modified between releases 32 and
   34. The  next release of PROSITE (14.0) will be distributed with release
   35 of SWISS-PROT.



                           9.  WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about  annotations to be updated, if, for example, the function
   of a protein has been clarified or if new post-translational information
   has become available.


   ========================================================================


                         APPENDIX A: SOME STATISTICS


   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.55   Gln (Q) 4.02   Leu (L) 9.33   Ser (S) 7.22
   Arg (R) 5.15   Glu (E) 6.32   Lys (K) 5.93   Thr (T) 5.74
   Asn (N) 4.52   Gly (G) 6.84   Met (M) 2.35   Trp (W) 1.25
   Asp (D) 5.30   His (H) 2.24   Phe (F) 4.07   Tyr (Y) 3.19
   Cys (C) 1.69   Ile (I) 5.72   Pro (P) 4.92   Val (V) 6.52

   Asx (B) 0.001  Glx (Z) 0.001  Xaa (X) 0.01


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 5389

   The first twenty species represent 28511 sequences: 48.3 % of the total
   number of entries.


   A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 2447
                            2x:  844
                            3x:  477
                            4x:  299
                            5x:  220
                            6x:  199
                            7x:  131
                            8x:   98
                            9x:  116
                           10x:   51
                       11- 20x:  229
                       21- 50x:  162
                       51-100x:   54
                         >100x:   62


   A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        4340          Baker's yeast (Saccharomyces cerevisiae)
         2        4000          Human
         3        3606          Escherichia coli
         4        2429          Mouse
         5        2121          Rat
         6        1783          Bacillus subtilis
         7        1591          Haemophilus influenzae
         8        1208          Caenorhabditis elegans
         9         956          Fission yeast (Schizosaccharomyces pombe)
        10         910          Fruit fly (Drosophila melanogaster)
        11         899          Bovine
        12         709          Chicken
        13         617          Salmonella typhimurium
        14         582          African clawed frog (Xenopus laevis)
        15         562          Arabidopsis thaliana (Mouse-ear cress)
        16         502          Rabbit
        17         474          Mycobacterium tuberculosis
        18         446          Pig
        19         425          Mycoplasma genitalium
        20         381          Maize
        21         276          Rice
        22         275          Bacteriophage T4
        23         265          Slime mold (Dictyostelium discoideum)
        24         262          Pseudomonas aeruginosa
        25         253          Vaccinia virus (strain Copenhagen)
        26         229          Tobacco
        27         219          Porphyra purpurea
        28         217          Pea
        29         207          Dog
        30         193          Wheat
                   193          Human cytomegalovirus (strain AD169)
        32         190          Barley
        33         186          Staphylococcus aureus
                   186          Soybean
        35         184          Vaccinia virus (strain WR)
        36         183          Sheep
        37         173          Neurospora crassa
        38         172          Pseudomonas putida
        39         171          Rhodobacter capsulatus
        40         169          Mycobacterium leprae
        41         161          Potato
        42         157          Synechocystis sp. (strain PCC 6803)
                   157          Klebsiella pneumoniae
        44         154          Tomato
                   154          Bacillus stearothermophilus
                   154          Autographa californica nuclear polyhedrosis virus
        47         150          Marchantia polymorpha (Liverwort)
        48         148          Spinach
        49         146          Variola virus
        50         142          Cyanophora paradoxa
        51         139          Agrobacterium tumefaciens
        52         138          Odontella sinensis
        53         137          Rhizobium meliloti
        54         127          Lactococcus lactis (subsp. lactis)
        55         125          Chlamydomonas reinhardtii
        56         124          Candida albicans
        57         121          Guinea pig
        58         116          Streptomyces coelicolor
        59         109          Aspergillus nidulans
        60         108          Horse
        61         107          Trypanosoma brucei brucei
        62         101          Anabaena sp. (strain PCC 7120)



   A.3  Repartition of the sequences by size

               From   To  Number             From   To   Number
                  1-  50    2831             1001-1100      534
                 51- 100    5243             1101-1200      405
                101- 150    7359             1201-1300      290
                151- 200    5678             1301-1400      186
                201- 250    5207             1401-1500      165
                251- 300    4745             1501-1600       99
                301- 350    4445             1601-1700       84
                351- 400    4533             1701-1800       70
                401- 450    3420             1801-1900       80
                451- 500    3320             1901-2000       47
                501- 550    2455             2001-2100       30
                551- 600    1735             2101-2200       53
                601- 650    1292             2201-2300       63
                651- 700     971             2301-2400       27
                701- 750     877             2401-2500       34
                751- 800     721             >2500          176
                801- 850     544
                851- 900     570
                901- 950     391
                951-1000     341



   A.4  Longest sequences

   The longest sequences (>=4000 residues) are listed here:

                               HTS1_COCCA  5217
                               FAT_DROME   5147
                               RYNR_RABIT  5037
                               RYNR_PIG    5035
                               RYNR_HUMAN  5032
                               RYNC_RABIT  4969
                               LRP_CAEEL   4753
                               DYHC_DICDI  4725
                               PLEC_RAT    4687
                               LRP2_RAT    4660
                               DYHC_RAT    4644
                               DYHC_DROME  4639
                               APB_HUMAN   4563
                               APOA_HUMAN  4548
                               LRP1_HUMAN  4544
                               LRP1_CHICK  4543
                               RRPA_CVMJH  4488
                               DYHC_ANTCR  4466
                               DYHC_TRIGR  4466
                               GRSB_BACBR  4451
                               PKSK_BACSU  4447
                               PKSL_BACSU  4427
                               PGBM_HUMAN  4393
                               YP73_CAEEL  4385
                               DYHC_NEUCR  4367
                               DYHC_EMENI  4344
                               PKD1_HUMAN  4303
                               DYHC_YEAST  4092
                               RRPA_CVH22  4085


   A.5  Statistics for journal citations


   Total number of journals cited in this release of SWISS-PROT: 776


   A.5.1 Table of the frequency of journal citations

        Journals cited 1x: 295 
                       2x:  97 
                       3x:  64 
                       4x:  31 
                       5x:  29 
                       6x:  23 
                       7x:   9 
                       8x:   8 
                       9x:  13 
                      10x:  10 
                  11- 20x:  68 
                  21- 50x:  42 
                  51-100x:  23 
                    >100x:  64 


   A.5.2  List of the most cited journals in SWISS-PROT

   Citations          Journal abbreviation
   ---------          ----------------------------------
   5458               J. BIOL. CHEM.
   3394               PROC. NATL. ACAD. SCI. U.S.A.
   3266               NUCLEIC ACIDS RES.
   2322               J. BACTERIOL.
   2059               GENE
   1825               FEBS LETT.
   1713               EUR. J. BIOCHEM.
   1540               EMBO J.
   1526               BIOCHEM. BIOPHYS. RES. COMMUN.
   1425               NATURE
   1384               BIOCHEMISTRY
   1235               BIOCHIM. BIOPHYS. ACTA
   1090               J. MOL. BIOL.
   1069               CELL
   1043               MOL. CELL. BIOL.
    860               MOL. GEN. GENET.
    834               PLANT MOL. BIOL.
    768               BIOCHEM. J.
    736               VIROLOGY
    677               SCIENCE
    645               MOL. MICROBIOL.
    613               J. BIOCHEM.
    535               GENOMICS
    486               J. VIROL.
    423               J. GEN. VIROL.
    378               J. CELL BIOL.
    370               PLANT PHYSIOL.
    349               YEAST
    341               GENES DEV.
    288               CURR. GENET.
    286               HUM. MOL. GENET.
    282               J. IMMUNOL.
    267               ARCH. BIOCHEM. BIOPHYS.
    259               BIOL. CHEM. HOPPE-SEYLER
    256               INFECT. IMMUN.
    252               MOL. BIOCHEM. PARASITOL.
    231               ONCOGENE
    214               MOL. ENDOCRINOL.
    213               HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
    208               J. GEN. MICROBIOL.
    201               AM. J. HUM. GENET.
    198               FEMS MICROBIOL. LETT.
    195               J. CLIN. INVEST.
    179               DEVELOPMENT
    165               NAT. GENET.
    164               J. MOL. EVOL.
    160               GENETICS
    151               DNA
    150               HUM. MUTAT.
    148               J. EXP. MED.
    143               BLOOD
    140               DNA CELL BIOL.
    138               HUM. GENET.
    129               NEURON
    128               DEV. BIOL.
    123               APPL. ENVIRON. MICROBIOL.
    114               PLANT CELL
    109               IMMUNOGENETICS
    109               HEMOGLOBIN
    105               AGRIC. BIOL. CHEM.
    103               DNA SEQ.
    101               BIOCHIMIE
    101               BIOORG. KHIM.
    101               ENDOCRINOLOGY

   ========================================================================

           APPENDIX B: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:


                         ***********************
******************       *  EMBL Nucleotide    *       **********************
* EPD [Euk.Prom] * <---> *  Sequence Database  * <---- * ECDC [E.coli map]  *
******************       *       [EBI]         *       **********************
                         ***********************
                          ^  ^ ^  ^  ^ ^ ^  ^
******************        |  | |  I  | | |  |
* FlyBase        * <------+  | |  I  | | |  |          **********************
* [D.melanogas.] *        |  | |  I  | | |  +--------> * GCRDb [7TM recep.] *
******************        |  | |  I  | | |  |          **********************
                          |  | |  I  | | |  |
******************        |  | |  I  | | |  |          **********************
* SubtiList      * <---------+ |  I  | | +-----------> * EcoGene [E.coli]   *
* [B.subtilis]   *        |  | |  I  | | |  |          **********************
******************        |  | |  I  | | |  |
                          |  | |  I  | | |  |          **********************
******************        |  | |  I  +---------------> * SGD [Yeast]        *
* MaizeDb        * <-----------+  I  | | |  |          **********************
* [Zea mays]     *        |  | |  I  | | |  |
******************        |  | |  I  | | |  |          **********************
                          |  | |  I  | +-------------> * DictyDB [D.disco.] *
******************        |  | |  I  | | |  |          **********************
* WormPep        *        |  | |  I  | | |  |
* [C.elegans]    * <----+ |  | |  I  | | |  |          **********************
******************      | |  | |  I  | | |  | +------  * ENZYME [Nomencl.]  *
                        | |  | |  I  | | |  | |        **********************
******************      | v  v v  v  v v v  v v            v
* REBASE         *      ***********************        **********************
* [Restriction   * <--- *  SWISS-PROT         * -----> * OMIM [Human]       *
*  enzymes]      *      *  Protein Sequence   *        **********************
******************      *  Data Bank          *            
                        ***********************        **********************
******************      ^ ^ ^ ^ ^ ^ ^ | ^ ^ ^          * ECO2DBASE     [2D] *
* StyGene        *      | | | | | | | | | | +--------> **********************
* [S.Typhimurium]* <----+ | | | | | | | | |
******************        | | | | | | | | |            **********************
                          | | | | | | | | +----------> * Maize-2DPAGE  [2D] *
******************        | | | | | | | |              **********************
* Transfac       * <------+ | | | | | | |
******************          | | | | | | |              **********************
                            | | | | | | +------------> * SWISS-2DPAGE  [2D] *
******************          | | | | | |                **********************
* Harefield [2D] * <--------+ | | | | |
******************            | | | | |                **********************
                              | | | | +--------------> * Aarhus/Ghent  [2D] *
******************            | | | |                  **********************
* PROSITE        *            | | | |
* [Patterns and  * <----------+ | | +----------------> **********************
* profiles]      *              | |                    * YEPD [Yeast]  [2D] *
******************              | +----------------+   **********************
             |                  v                  |
             |          ***********************    +-> **********************
             +--------> * PDB [3D structures] * <----- * HSSP [3D similar.] *
                        ***********************        **********************

   =End=of=SWISS-PROT=release=34=notes=====================================