SubmitCancel

Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 36.0

Published July 1, 1998

                   SWISS-PROT RELEASE 36.0 RELEASE NOTES

 !! Important: do not forget to read section 11 of these release notes. It
 contains an important announcement relevant to SWISS-PROT and PROSITE !!


                   1.  INTRODUCTION


 Release 36.0 of  SWISS-PROT contains 74'019  sequence entries,  comprising
 26'840'295 amino acids abstracted from 59'911 references.  This represents
 an increase  of  7% over  release  35. The  growth  of the  data  bank  is
 summarized below.

 Release      Date           Number of       Number of amino
                               entries                 acids
    2.0       09/86               3939               900 163
    3.0       11/86               4160               969 641
    4.0       04/87               4387             1 036 010
    5.0       09/87               5205             1 327 683
    6.0       01/88               6102             1 653 982
    7.0       04/88               6821             1 885 771
    8.0       08/88               7724             2 224 465
    9.0       11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091
   27.0       10/93              33329            11 484 420
   28.0       02/94              36000            12 496 420
   29.0       06/94              38303            13 464 008
   30.0       10/94              40292            14 147 368
   31.0       02/95              43470            15 335 248
   32.0       11/95              49340            17 385 503
   33.0       02/96              52205            18 531 384
   34.0       10/96              59021            21 210 389
   35.0       11/97              69113            25 083 768
   36.0       07/98              74019            26 840 295



     2.  DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 35

 2.1  Sequences and annotations

 4'976 sequences have been added since release 35, the sequence data of 712
 existing entries has  been updated  and the annotations  of 9'954  entries
 have been revised.


 2.2  What's happening with the model organisms

 We have  selected a  number of  organisms that  are the  target of  genome
 sequencing and/or mapping projects and for which we intend to:

 . Be as  complete as possible.  All sequences  available at  a given  time
   should  be  immediately  included  in  SWISS-PROT.  This  also  includes
   sequence corrections and updates;
 . Provide a higher level of annotation;
 . Provide cross-references to specialized database(s) that  contain, among
   other data, some genetic information about the genes that code for these
   proteins;
 . Provide specific indices or documents.
   What was  done since  the  last release  or in  preparation  for the  next release concerning model organisms:
 
 - We have  continued  our  effort  in  catching up  with  the  backlog  of
   sequences from other model organisms.  In particular we added about  350
   entries from human and  from E.coli, 300 from  mouse, 250 from  S.pombe,
   200 from M.jannaschii, 150 from C.elegans, 100 from B.subtilis, H.pylori
   and from M.tuberculosis.
   
 - We  plan to  finish  as  quickly  as  possible  the  annotation  of  the
   Escherichia coli and Haemophilus  influenzae sequence entries which  are
   not yet part of SWISS-PROT.

 Here is the current status of the model organisms in SWISS-PROT:

 Organism        Database            Index file       Number of
                 cross-referenced                     sequences
 --------------  ----------------    --------------   ---------
 A.thaliana      None yet            In preparation         719
 B.subtilis      SubtiList           SUBTILIS.TXT          1970
 C.albicans      None yet            CALBICAN.TXT           192
 C.elegans       Wormpep             CELEGANS.TXT          1887
 D.discoideum    DictyDB             DICTY.TXT              280
 D.melanogaster  FlyBase             FLY.TXT               1042
 E.coli          EcoGene             ECOLI.TXT             4416
 H.influenzae    HiDB (TIGR)         HAEINFLU.TXT          1693
 H.sapiens       MIM                 MIMTOSP.TXT           4980
 H.pylori        HpDB (TIGR)         HPYLORI.TXT            334
 M.genitalium    MgDB (TIGR)         MGENITAL.TXT           470
 M.musculus      MGD                 MGDTOSP.TXT           3253
 M.jannaschii    MjDB (TIGR)         MJANNASC.TXT          1283
 M.tuberculosis  None yet            None yet               873
 S.cerevisiae    SGD                 YEAST.TXT             4787
 S.typhimurium   StyGene             SALTY.TXT              706
 S.pombe         None yet            POMBE.TXT             1315
 S.solfataricus  None yet            None yet                72

 Collectively the entries from the above model organisms represent 40.9% of
 all SWISS-PROT entries.


 2.3  Changes affecting the accession numbers

 With the creation  of the TrEMBL  database (see section  6) and the  rapid
 increase in the amount of  sequence data, we are  faced with a problem  of
 availability of accession numbers.  Currently we use a  system based on  a
 one-letter prefix followed by 5  digits. This system was also used  by the
 nucleotide sequence databases which had originally reserved for SWISS-PROT
 the prefix letters 'P' and 'Q'. The nucleotide databases having run out of
 space (due mainly to EST's), have been forced to start using  a new format
 based on a two-letter prefix followed by 6 digits.

 We have used up all possible numbers with 'P' and 'Q' and  the only letter
 prefix which was not used by the nucleotide database is 'O'. As we believe
 that changing the format of the accession numbers to that used  now by the
 nucleotide database would create havoc  on the numerous software  packages
 using SWISS-PROT, we have  decided to keep a  system of accession  numbers
 based on a six-character code, but with the following changes:

 1)   We have  started  using 'O'.  This  extra letter  should  allow the
 continuation of  the present  format (1  prefix letter  + 5  digits) for
 approximately one year.
 2)   When we will have finished using up 'O', we will introduce a system
 based on the following format:

      1        2       3          4            5            6
     [O,P,Q]  [0-9]  [A-Z, 0-9]  [A-Z, 0-9]   [A-Z, 0-9]   [0-9]

 What the above means is that  we will keep a six-character code,  but that
 in positions  3, 4  and 5  of this  code any  combination  of letters  and
 numbers can be present. This format allows a total of 14 million accession
 numbers (up from 300'000 with the current system).

 We only  allow  numbers  in positions  2  and  6 so  that  the  SWISS-PROT
 accession numbers can  not be  mistaken with gene  names, acronyms,  other
 type of accession numbers or any type of words!

 Examples: P0A3S2, Q2ASD4, O13YX2, P9B123


 2.4  Changes concerning the reference location line (RL)

 The (IN) prefix  used for books  is now  also used for  references to  the
 electronic Plant Gene Register (See http://www.tarweed.com/pgr/). Example:

 RL   (IN) PLANT GENE REGISTER PGR98-023.


 2.5  Cleaning up of the SIMILARITY comment line (CC) topic

 We started a major overhaul  of the "SIMILARITY" topic. We would  like the
 majority of the information stored in this topic to be  usable by computer
 programs (while being human-readable). We are therefore  standardizing the
 format of this topic  using two different subformats.  One to describe  to
 which family a protein belongs to:

 CC   - !-  SIMILARITY: BELONGS TO THE {Name1} FAMILY [OF {Name2}].
 CC         [{Name3} SUBFAMILY.]

 Examples:

 CC   - !-  SIMILARITY: BELONGS TO THE 14-3-3 FAMILY.
 CC   - !-  SIMILARITY: BELONGS TO THE 6-PHOSPHOGLUCONATE DEHYDROGENASE
 CC         FAMILY.
 CC   - !-  SIMILARITY: BELONGS TO THE AAA FAMILY OF ATPASES.
 CC   - !-  SIMILARITY: BELONGS TO THE IRON/ASCORBATE-DEPENDENT FAMILY OF
 CC         OXIDOREDUCTASES.
 CC   - !-  SIMILARITY: BELONGS TO THE ANTP FAMILY OF HOMEOBOX PROTEINS.
 CC         "DEFORMED" SUBFAMILY.
 CC   - !-  SIMILARITY: BELONGS TO THE KINESIN-LIKE PROTEIN FAMILY. KINESIN
 CC         SUBFAMILY.

 And one to describe which domains are found in a given protein:

 CC   - !-  SIMILARITY: CONTAINS n {Name} [DOMAIN|REPEAT][S].

 Examples:

 CC   - !-  SIMILARITY: CONTAINS 1 FHA DOMAIN.
 CC   - !-  SIMILARITY: CONTAINS 45 EGF-LIKE DOMAINS.
 CC   - !-  SIMILARITY: CONTAINS 2 SH3 DOMAINS.
 CC   - !-  SIMILARITY: CONTAINS 2 SUSHI (SCR) REPEATS.

 We already have updated many entries in this release and  plan to continue
 to do so for the next release.


 2.6  Changes concerning cross-references (DR line)

 We have added cross-references from  SWISS-PROT to the Mendel database,  a
 plant gene  nomenclature  database  from the  Commission  for  Plant  Gene
 Nomenclature (CPGN). These cross-references are present in the DR lines:

 Data bank identifier:  MENDEL
 Primary identifier  :  The Mendel accession number for a gene  in a  given
                        species.
 Secondary identifier:  Composed of the acronym of  the species  (generally
                        the same five-letter  code as that defined and used
                        by SWISS-PROT in the entry name), the gene name and
                        a number.
 Example:               DR   MENDEL; 294; Amahy;psbA;1.



                  3. PLANNED CHANGES

 3.1  Extension of the accession number system

 As already explained in  detail under 2.3, we  will extend the accession
 number system when  we will  have used  up the  'O' series  of accession
 numbers. This can be anticipated for October 1998.


 3.2  Switch to the NCBI taxonomy

 To standardize the taxonomies used by different databases we will change
 with release 37 our taxonomy. We will switch to the NCBI taxonomy, which
 is  already  used  as  the  common  taxonomy  by  the  DDBJ/EMBL/GenBank
 nucleotide sequence databases.


 3.3  Introduction of RT lines

 With release 37  we will introduce  a new  line type,  the RT (Reference
 Title) line. This  optional line will  be placed  between the  RA and RL
 line. The  RT line  gives the  title  of the  paper (or  other  work) as
 exactly as possible given the limitations of the computer character set.
 The form which will  be used is that  which would be used  in a citation
 rather than displayed at  the top of the  published paper. For instance,
 where journals capitalize major  title words this is  not preserved. The
 title is enclosed  in double quotes,  and may be  continued over several
 lines as necessary.  The title lines  are terminated by  a semicolon. An
 example of the use of RT lines is shown below:

 RT   "Sequence analysis of the genome of the unicellular cyanobacterium
 RT   Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb
 RT   region from map positions 64% to 92% of the genome.";



                  4. STATUS OF THE DOCUMENTATION FILES

 SWISS-PROT is distributed with a large number of documentation files. Some
 of these  files have  been available  for a  long time  (the user  manual,
 release notes,  the  various  indices for  authors,  citations,  keywords,
 etc.), but many have been created recently and we are  continuously adding
 new files. Since release 35,  we have added three new document  files. The
 following table lists all the documents that are currently available.

 USERMAN.TXT    User manual
 RELNOTES.TXT   Release notes
 OLDRLNOT.TXT   Release notes for previous release [1,2]
 SHORTDES.TXT   Short description of entries in SWISS-PROT
 JOURLIST.TXT   List of abbreviations for journals cited [3]
 KEYWLIST.TXT   List of keywords in use
 SPECLIST.TXT   List of organism identification codes
 TISSLIST.TXT   List of tissues [4]
 EXPERTS.TXT    List of on-line experts for PROSITE and SWISS-PROT
 SUBMIT.TXT     Submission of sequence data to SWISS-PROT

 ACINDEX.TXT    Accession number index
 AUTINDEX.TXT   Author index
 CITINDEX.TXT   Citation index
 KEYINDEX.TXT   Keyword index
 SPEINDEX.TXT   Species index
 DELETEAC.TXT   Deleted accession number index

 7TMRLIST.TXT   List of 7-transmembrane G-linked receptors entries
 AATRNASY.TXT   List of aminoacyl-tRNA synthetases
 ALLERGEN.TXT   Nomenclature and index of allergen sequences
 BLOODGRP.TXT   List of blood group antigen proteins
 CALBICAN.TXT   Index   of  Candida  albicans  entries   and  their
                corresponding gene designations
 CDLIST.TXT     CD  nomenclature  for  surface  proteins  of  human
                leucocytes
 CELEGANS.TXT   Index  of Caenorhabditis elegans entries  and their
                corresponding gene Wormpep cross-references
 DICTY.TXT      Index   of  Dictyostelium  discoideum  entries  and
                their  corresponding gene designations  and DictyDb
                cross-references
 EC2DTOSP.TXT   Index  of  Escherichia coli  Gene-protein  database
                entries referenced in SWISS-PROT
 ECOLI.TXT      Index  of Escherichia coli K12  chromosomal entries
                and their corresponding EcoGene cross-references
 EMBLTOSP.TXT   Index  of   EMBL  Database  entries  referenced  in
                SWISS-PROT
 EXTRADOM.TXT   Nomenclature of extracellular domains
 FLY.TXT        Index  of  Drosophila  entries and  FlyBase  cross-
                references
 GLYCOSID.TXT   Classification  of glycosyl hydrolase  families and
                index of glycosyl hydrolase entries
 HAEINFLU.TXT   Index  of  Haemophilus  influenzae  RD  chromosomal
                entries
 HOXLIST.TXT    Vertebrate  homeotic Hox proteins: nomenclature and
                index
 HPYLORI.TXT    Index   of   Helicobacter   pylori   strain   26695
                chromosomal entries
 HUMCHR17.TXT   Index of protein  sequence entries encoded on human
                chromosome 17 [1]
 HUMCHR18.TXT   Index of protein  sequence entries encoded on human
                chromosome 18
 HUMCHR19.TXT   Index of protein  sequence entries encoded on human
                chromosome 19
 HUMCHR20.TXT   Index of protein  sequence entries encoded on human
                chromosome 20
 HUMCHR21.TXT   Index of protein  sequence entries encoded on human
                chromosome 21
 HUMCHR22.TXT   Index of protein  sequence entries encoded on human
                chromosome 22
 HUMCHRX.TXT    Index of protein  sequence entries encoded on human
                chromosome X
 HUMCHRY.TXT    Index of protein  sequence entries encoded on human
                chromosome Y
 HUMPVAR.TXT    Index of human proteins with sequence variants [1]
 INITFACT.TXT   List and index of translation initiation factors
 MIMTOSP.TXT    Index of MIM entries referenced in SWISS-PROT
 METALLO.TXT    Classification  of  metallothioneins and  index  of
                entries in SWISS-PROT
 MGDTOSP.TXT    Index of MGD entries referenced in SWISS-PROT
 MGENITAL.TXT   Index  of Mycoplasma genitalium chromosomal entries
 MJANNASC.TXT   Index of Methanococcus jannaschii entries
 NGR234.TXT     Table  of   putative  genes  in  Rhizobium  plasmid
                pNGR234a
 NOMLIST.TXT    List   of  nomenclature   related  references   for
                proteins
 PCC6803.TXT    Index of Synechocystis strain PCC 6803 entries
 PDBTOSP.TXT    Index  of X-ray  crystallography Protein Data  Bank
                (PDB) entries referenced in SWISS-PROT
 PEPTIDAS.TXT   Classification  of peptidase families and  index of
                peptidase entries
 PLASTID.TXT    List of chloroplast and cyanelle encoded proteins
 POMBE.TXT      Index   of  Schizosaccharomyces  pombe  entries  in
                SWISS-PROT    and    their    corresponding    gene
                designations
 RESTRIC.TXT    List of restriction enzyme and methylase entries
 RIBOSOMP.TXT   Index of  ribosomal proteins classified by families
                on the basis of sequence similarities
 SALTY.TXT      Index  of  Salmonella typhimurium  LT2  chromosomal
                entries  and  their  corresponding  StyGene  cross-
                references
 SUBTILIS.TXT   Index of  Bacillus subtilis 168 chromosomal entries
                and their corresponding SubtiList cross-references
 UPFLIST.TXT    UPF  (Uncharacterized  Protein Families)  list  and
                index of members
 YEAST.TXT      Index   of  Saccharomyces  cerevisiae  entries  and
                their corresponding gene designations
 YEAST1.TXT     Yeast Chromosome I entries
 YEAST2.TXT     Yeast Chromosome II entries
 YEAST3.TXT     Yeast Chromosome III entries
 YEAST5.TXT     Yeast Chromosome V entries
 YEAST6.TXT     Yeast Chromosome VI entries
 YEAST7.TXT     Yeast Chromosome VII entries
 YEAST8.TXT     Yeast Chromosome VIII entries
 YEAST9.TXT     Yeast Chromosome IX entries
 YEAST10.TXT    Yeast Chromosome X entries
 YEAST11.TXT    Yeast Chromosome XI entries
 YEAST13.TXT    Yeast Chromosome XIII entries
 YEAST14.TXT    Yeast Chromosome XIV entries

 Notes:

 1    New in release 36.
 2    We  apologize  for  having  not   included,  with  release  35,   the
      corresponding release notes. We are therefore including it  with this
      release. As we believe that it may be useful to always distribute the
      release notes of  the previous release,  we will start  to do so  and
      such a file will be now known as "OLDRLNOT.TXT".
 3    Has been extensively updated and contains Web links to more  than 640
      journals.
 4    Has been  extensively  updated and  now  includes synonyms  for  many
      tissues.

 We have  continued  to  include  in some  SWISS-PROT  document  files  the
 references of Web sites relevant to the subject under consideration. There
 are now 24 documents that include such links.



                  5. THE EXPASY WORLD-WIDE WEB SERVER

 5.1  Background information

 The most  efficient and  user-friendly  way to  browse  interactively in
 SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and other databases. is to use
 the World-Wide  Web (WWW)  molecular biology  server ExPASy.  The ExPASy
 server was  made  available  to the  public in  September  1993,  it  is
 reachable at the following address:

                              http://www.expasy.ch/

 The ExPASy WWW server  allows access, using  the user-friendly hypertext
 model, to the  SWISS-PROT, PROSITE,  ENZYME, SWISS-2DPAGE, SWISS-3DIMAGE
 and CD40Lbase  databases and,  through any  SWISS-PROT  protein sequence
 entry, to  other databases  such  as EMBL,  Eco2DBASE,  EcoCyc, FlyBase,
 GCRDb, MaizeDB, SubtiList/NRSub,  OMIM, PDB, HSSP,  ProDom, REBASE, SGD,
 YEPD and  Medline. ExPAsy  also offers  many tools  for the  analysis of
 protein sequences and 2D gels.


 5.2  SWISS-SHOP

 We provide, on  ExPASy, a  service called SWISS-SHOP.  SWISS-Shop allows
 any users of SWISS-PROT  to indicate what proteins  he/she is interested
 in. This can be done using various criteria that can be combined:

 -    By entering  one  or  more words  that  should  be  present  in the
      description line;
 -    By entering one or more species name(s) or taxonomic division(s);
 -    By entering one or more keywords;
 -    By entering one or more author names;
 -    By entering  the  accession number  (or  entry name)  of  a PROSITE
      pattern or a user-defined sequence pattern;
 -    By entering the  accession number  (or entry  name) of  an existing
      SWISS-PROT entry or by entering a "private" sequence.

 Every week, the  new sequences  entered in SWISS-PROT  are automatically
 compared with all the criteria that have been defined by the users. If a
 sequence corresponds to the  selection criteria defined by  a user, that
 sequence is sent by electronic mail.


 5.3  What is new on ExPASy

 ExPASy is constantly modified  and improved. If you  wish to be informed
 on the changes made to the server you can either:

 -    Read  the  document  "History  of  changes,  improvements  and  new
      features" which is available at the address:

              http://www.expasy.ch/www/history.html

 -    Subscribe to SWISS-Flash, a service that reports news of databases,
      software and services developments. By subscribing to this service,
      you will  automatically  get  SWISS-Flash  bulletins  by electronic
      mail. To subscribe use the address:

              http://www.expasy.ch/www/swiss-flash.html



                  6. TREMBL - A SUPPLEMENT TO SWISS-PROT

 The ongoing  genome  sequencing  and mapping  projects  have  dramatically
 increased the number of protein  sequences to be incorporated into  SWISS-
 PROT. Since we do not  want to dilute the quality standards  of SWISS-PROT
 by incorporating  sequences  into  the database  without  proper  sequence
 analysis and  annotation, we  cannot  speed up  the incorporation  of  new
 incoming data  indefinitely. But as  we also  want to  make the  sequences
 available as  fast  as possible,  we  have introduced  with  SWISS-PROT  a
 computer annotated  supplement. This  supplement  consists of  entries  in
 SWISS-PROT-like  format  derived  from  the  translation  of   all  coding
 sequences (CDS)  in the EMBL  nucleotide sequence  database, except  those
 already included in SWISS-PROT.
   We  name this  supplement  TrEMBL  (Translation  from  EMBL).  It  can  be
 considered as a preliminary section of SWISS-PROT. This SWISS-PROT release
 is supplemented by TrEMBL release 6. TrEMBL is split in two main sections;
 SP-TrEMBL and REM-TrEMBL:
   - SP-TrEMBL (SWISS-PROT TrEMBL) contains the entries (150'329 in release
   6) which should  be incorporated into SWISS-PROT. SWISS-PROT accession
   numbers have been assigned for all SP-TrEMBL entries.

 - REM-TrEMBL (REMaining TrEMBL) contains  the entries (27'428 in release
   6) that  we do not want to include  in  SWISS-PROT  for  a  variety of
   reasons (synthetic  sequences,  pseudogenes, translations of uncorrect
   open reading frames,  fragments  with  less  than eight  amino  acids,
   patent-derived sequences, immunoglobulins and T-cell receptors, etc.)

 TrEMBL is  available by FTP  from the  EBI server  (ftp.ebi.ac.uk) in  the
 directory '/pub/databases/trembl'. It can be queried on WWW by the EBI SRS
 server (http://www.ebi.ac.uk/). It is also available on the SWISS-PROT CD-
 ROM and is searchable on the FASTA, BIC and BLAST servers of the EBI.
   

                  7. WEEKLY UPDATES OF SWISS-PROT

 Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
 are updated at each update:

 new_seq.dat    Contains all the new entries since the last full release;
 upd_seq.dat    Contains the entries for which the sequence data has been
                updated since the last release;
 upd_ann.dat    Contains  the entries  for which  one or  more annotation
                fields have been updated since the last release.

 Currently these  files  are  available on  the  following  anonymous FTP
 servers:

 Organization   Swiss Institute of Bioinformatics (SIB)
 Address        ftp.expasy.ch
 Directory      /databases/swiss-prot/updates

 Organization   European Bioinformatics Institute (EBI)
 Address        ftp.ebi.ac.uk
 Directory      /pub/databases/swissprot/new
  
 !! Important notes !!
 
 - Although  we try to  follow a  regular schedule,  we do  not promise  to
   update these files every  week. In some cases two weeks will  elapse in-
   between two updates.
 - Due to  the current mechanism used to build  a release the entries  that
   are provided in these updates are not guaranteed to be error free.
 - Instead  of using  the above  files, you  can, every  week, download  an
   updated copy of the  SWISS-PROT database. This file is available  in the
   directory containing the non-redundant database (see next section).



                  8. NON-REDUNDANT DATABASE

 A few  months ago, we  started to  distribute on  the ExPASy  and EBI  FTP
 servers, files that  make up  a non-redundant (see  further) and  complete
 protein sequence database consisting of three components:

 1) SWISS-PROT
 2) TrEMBL
 3) New  entries to be  later integrated  into TrEMBL  (hereafter known  as
    TrEMBL_New)

 Every week  three files  are completely  rebuilt. These  files are  named:
 sprot.dat.Z, trembl.dat.Z and trembl_new.dat.Z. As indicated by their ".Z"
 extension these are Unix "compress" format files which, when decompressed,
 will produce ASCII files in SWISS-PROT format.

 Three others  files  are  also available  (sprot.fas.Z,  trembl.fas.Z  and
 trembl_new.fas.Z)  Which are  compressed  "fasta"  format  sequence  files
 useful for building the databases used by FASTA, BLAST and  other sequence
 similarity search  programs.  Please do  not  use these  files  for  other
 purpose as you loose all annotations by using this very primitive format.

 The files  for the  non-redundant  database are  stored in  the  directory
 "/databases/sp_tr_nrdb" on the  ExPASy FTP server  (ftp.expasy.ch) and  in the  directory   "/pub/databases/sp_tr_nrdb"  on   the   EBI  FTP   server
 (ftp.ebi.ac.uk).

 Additional notes

 - The SWISS-PROT  file continuously grows as  new annotated sequences  are
   added.
 - The TrEMBL  file decreases in size  as sequences are  moved out of  that
   section after  being annotated and moved  into SWISS-PROT. Four times  a
   year a new release  of TrEMBL is built at EBI, at this point  the TrEMBL
   file  increases in size  as it then  includes all of  the new data  (see
   next section) that has accumulated since the last release.
 - The TrEMBL_New file starts as a very small file and grows  in size until
   a new release of TrEMBL is available.

 - SWISS-PROT  and  TrEMBL share  the  same system  of  accession  numbers.
   Therefore  you will  not find  any primary  accession number  duplicated
   between the two  sections. A TrEMBL entry (and its  associated accession
   number(s)) can either move to SWISS-PROT as new entry or  be merged with
   an  existing  SWISS-PROT  entry.  In  the  later   case,  the  accession
   number(s)  of that  TrEMBL entry  are added  to that  of the  SWISS-PROT
   entry.
 - TrEMBL_New  does  not  have  real  accession  numbers.  However  it  was
   necessary  to  have an  "AC"  line so  as  to be  able  to use  it  with
   different  software   products.  This  AC  line  contains  a   temporary
   identifier which consists of the pID (protein identifier)  of the coding
   sequence in the parent nucleotide sequence.

 - While  these  three files  allow  you to  build  what we  call  a  "non-
   redundant"  database, it must  be noted  that this is  not completely  a
   true statement.  Without going into a long  explanation we can say  that
   this is currently the best attempt in providing a  complete selection of
   protein  sequence entries yet  trying to  eliminate redundancies.  While
   SWISS-PROT is  completely (well 99.994% !) non-redundant, TrEMBL  is far
   from  being non-redundant and  the addition  of SWISS-PROT  + TrEMBL  is
   even less.
   - To  describe to your  users the  version of  the non-redundant  database
   that you are providing to them, you should use a statement of the form:

      SWISS-PROT release 36 and updates until {current_date};
      TrEMBL  release  6  minus  data  integrated  into  SWISS-PROT  as  of
      {current_date};
      New preliminary TrEMBL entries created since release 6 of TrEMBL


                  9.  ENZYME and PROSITE

 9.1  The ENZYME data bank

 Release 23.0 of  the ENZYME data  bank is distributed  with release 36  of
 SWISS-PROT. ENZYME  release 23.0  contains  information relative  to  3704
 enzymes. It also differs from  the previous release (22 of November  1997)
 in that the "DE" (Description), "AN" (Alternative Names),  "CF" (Cofactor)
 and "CC"  (Comments) lines  are now  in mixed-case  characters instead  of
 being all in UPPER case.

 Example, what was before:

 ID   1.4.4.2
 DE   GLYCINE DEHYDROGENASE (DECARBOXYLATING).
 AN   GLYCINE DECARBOXYLASE.
 AN   GLYCINE CLEAVAGE SYSTEM P-PROTEIN.
 CA   GLYCINE + LIPOYLPROTEIN = S-AMINOMETHYLDIHYDROLIPOYLPROTEIN + CO(2).
 CF   PYRIDOXAL-PHOSPHATE.
 CC   -!- LIPOAMIDE CAN ALSO ACT AS ACCEPTOR.
 CC   -!- A COMPONENT, WITH EC 2.1.2.10, OF THE GLYCINE CLEAVAGE SYSTEM,
 CC       PREVIOUSLY KNOWN AS GLYCINE SYNTHASE.
 DI   NONKETOTIC HYPERGLYCINEMIA TYPE II; MIM:238310.
 DR   P54376, GCS1_BACSU;  P54377, GCS2_BACSU;  P49361, GCSA_FLAPR;
 DR   P49362, GCSB_FLAPR;  P15505, GCSP_CHICK;  P33195, GCSP_ECOLI;
 DR   O49850, GCSP_FLAAN;  O49852, GCSP_FLATR;  P23378, GCSP_HUMAN;
 DR   Q50601, GCSP_MYCTU;  P26969, GCSP_PEA  ;  Q09785, GCSP_SCHPO;
 DR   O49954, GCSP_SOLTU;  P49095, GCSP_YEAST;
 //

 is now:

 ID   1.4.4.2
 DE   Glycine dehydrogenase (decarboxylating).
 AN   Glycine decarboxylase.
 AN   Glycine cleavage system P-protein.
 CA   GLYCINE + LIPOYLPROTEIN = S-AMINOMETHYLDIHYDROLIPOYLPROTEIN + CO(2).
 CF   Pyridoxal-phosphate.
 CC   -!- Lipoamide can also act as acceptor.
 CC   -!- A component, with EC 2.1.2.10, of the glycine cleavage system,
 CC       previously known as glycine synthase.
 DI   NONKETOTIC HYPERGLYCINEMIA TYPE II; MIM:238310.
 DR   P54376, GCS1_BACSU;  P54377, GCS2_BACSU;  P49361, GCSA_FLAPR;
 DR   P49362, GCSB_FLAPR;  P15505, GCSP_CHICK;  P33195, GCSP_ECOLI;
 DR   O49850, GCSP_FLAAN;  O49852, GCSP_FLATR;  P23378, GCSP_HUMAN;
 DR   Q50601, GCSP_MYCTU;  P26969, GCSP_PEA  ;  Q09785, GCSP_SCHPO;
 DR   O49954, GCSP_SOLTU;  P49095, GCSP_YEAST;
 //

 We plan to convert the  "CA" (Catalytic Activity) lines to mixed-case  for
 the next release.


 9.2  The PROSITE data bank

 Release 15.0 of the  PROSITE data bank is  distributed with release 36  of
 SWISS-PROT. This release  of PROSITE contains  1014 documentation  entries
 that describe 1'352 different patterns, rules and profiles/matrices.



                  10. WE NEED YOUR HELP !

 We welcome feedback from our users.  We would especially appreciate that
 you notify us  if you  find that  sequences belonging  to your  field of
 expertise are  missing from  the data  bank. We  also  would like  to be
 notified about annotations to be updated,  if, for example, the function
 of a protein has been clarified or if new post-translational information
 has become  available. To  facilitate such  feedback's  we offer  on the
 ExPASY WWW server  a form that  allows the submission  of updates and/or
 corrections to SWISS-PROT:

               http://www.expasy.ch/sprot/sp_update_form.html

 It is also  possible, from  any entries in  SWISS-PROT displayed  by the
 ExPASy server, to submit updates and/or  corrections for that particular
 entry. Finally, you  can also send  your comments by  electronic mail to
 the address:

                            swiss-prot@expasy.ch



                  11. IMPORTANT ANNOUNCEMENT

 It became obvious in the  last years that the tremendous increase  in data
 flow has created a requirement for resources which cannot be  addressed in
 full by  public funding.  This is  causing databases  to  fall behind  the
 research. We believe that the  only solution to the resource shortfall  is
 to ask commercial  users to participate  by paying a  license fee. No  fee
 will be charged to academic users, nor will any restriction  be imposed on
 their use or reuse of the data. both SWISS-PROT and  PROSITE are concerned
 by these changes while this is not the case of ENZYME.

 A document fully  describing what will  be the impact  of this change  for
 SWISS-PROT is  available with  the SWISS-PROT  distribution  files on  FTP
 (SP_98.TXT). You can also  access the document as  well as other  relevant
 ones from:

                       http://www.expasy.ch/announce/
                       http://www.ebi.ac.uk/news.html

 If you do  not have the  time to  read this document,  the most  important
 take-home message is that these changes should not have any  impact on the
 way SWISS-PROT or  PROSITE are accessed  or redistributed. Academic  users
 will not be affected by these changes. Industrial end-users will  also not
 directly be affected as long  as their employer pays the license  fee. The
 same  holds  true  for  bioinformatics  companies.  Academic  software  or
 database developers as well as providers of database distribution services
 will be only minimally affected  by these changes. We  hope to be able  to
 keep the  spirit of  SWISS-PROT and  PROSITE alive  and at  the same  time ensure their long-term financial survival.  We sincerely hope and  believe
 that in the next two  years the only change that  will matter will be  the
 increase in scope and timeliness of the databases.

 Finally, it should be noted  that release 36 of SWISS-PROT and  release 15
 of PROSITE are not concerned  by these changes. There are no  restrictions
 on their use and their distribution.

   ========================================================================


                         APPENDIX A: SOME STATISTICS


   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.58   Gln (Q) 3.99   Leu (L) 9.42   Ser (S) 7.15
   Arg (R) 5.14   Glu (E) 6.35   Lys (K) 5.93   Thr (T) 5.69
   Asn (N) 4.47   Gly (G) 6.83   Met (M) 2.37   Trp (W) 1.24
   Asp (D) 5.28   His (H) 2.24   Phe (F) 4.08   Tyr (Y) 3.18
   Cys (C) 1.67   Ile (I) 5.80   Pro (P) 4.91   Val (V) 6.56

   Asx (B) 0.001  Glx (Z) 0.001  Xaa (X) 0.01


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Lys, Ile, Thr, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 6002

   The first twenty species represent 35826 sequences: 48.4 % of the total
   number of entries.


   A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 2754
                            2x:  951
                            3x:  479
                            4x:  332
                            5x:  238
                            6x:  212
                            7x:  159
                            8x:   99
                            9x:  102
                           10x:   73
                       11- 20x:  277
                       21- 50x:  176
                       51-100x:   72
                         >100x:   78


   A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        4980          Human
         2        4787          Baker's yeast (Saccharomyces cerevisiae)
         3        4416          Escherichia coli
         4        3253          Mouse
         5        2491          Rat
         6        1970          Bacillus subtilis
         7        1887          Caenorhabditis elegans
         8        1693          Haemophilus influenzae
         9        1315          Fission yeast (Schizosaccharomyces pombe)
        10        1283          Methanococcus jannaschii
        11        1088          Bovine
        12        1042          Fruit fly (Drosophila melanogaster)
        13         873          Mycobacterium tuberculosis
        14         840          Chicken
        15         719          Arabidopsis thaliana (Mouse-ear cress)
        16         706          Salmonella typhimurium
        17         697          African clawed frog (Xenopus laevis)
        18         616          Synechocystis sp. (strain PCC 6803)
        19         607          Pig
        20         563          Rabbit
        21         489          Mycoplasma pneumoniae
        22         470          Mycoplasma genitalium
        23         406          Maize
        24         403          Rhizobium sp. (strain NGR234)
        25         345          Pseudomonas aeruginosa
        26         334          Helicobacter pylori
        27         304          Rice
        28         284          Dog
        29         280          Slime mold (Dictyostelium discoideum)
        30         278          Tobacco
        31         273          Bacteriophage T4
        32         253          Vaccinia virus (strain Copenhagen)
        33         250          Mycobacterium leprae
        34         244          Sheep
        35         240          Pea
        36         219          Porphyra purpurea
        37         215          Barley
        38         212          Staphylococcus aureus
        39         209          Neurospora crassa
        40         208          Soybean
        41         205          Wheat
        42         195          Tomato
        43         193          Rhodobacter capsulatus
                   193          Human cytomegalovirus (strain AD169)
        45         192          Candida albicans
                   192          Potato
        47         191          Klebsiella pneumoniae
        48         190          Methanobacterium thermoautotrophicum
        49         185          Bacillus stearothermophilus
        50         184          Vaccinia virus (strain WR)
        51         178          Pseudomonas putida
        52         164          Agrobacterium tumefaciens
        53         160          Spinach
                   160          Guinea pig
        55         158          Chlamydomonas reinhardtii
        56         157          Rhizobium meliloti
        57         154          Autographa californica nuclear polyhedrosis virus
        58         150          Marchantia polymorpha (Liverwort)
        59         146          Variola virus
                   146          Cyanophora paradoxa
        61         145          Aspergillus nidulans
        62         139          Odontella sinensis
        63         136          Streptomyces coelicolor
                   136          Golden hamster
                   136          Lactococcus lactis (subsp. lactis)
        66         134          Orgyia pseudotsugata multicapsid polyhedrosis virus
        67         130          Horse
        68         127          Kluyveromyces lactis
        69         125          Thermus aquaticus (subsp. thermophilus)
        70         124          Trypanosoma brucei brucei
        71         122          Synechococcus sp. (strain PCC 7942)
        72         114          Anabaena sp. (strain PCC 7120)
        73         113          Bradyrhizobium japonicum
        74         111          Alcaligenes eutrophus
        75         110          Bombyx mori (Silk moth)
        76         107          Archaeoglobus fulgidus
        77         105          Yersinia enterocolitica
        78         101          Brassica napus (Rape)



   A.3  Repartition of the sequences by size

               From   To  Number             From   To   Number
                  1-  50    3048             1001-1100      667
                 51- 100    6272             1101-1200      511
                101- 150    9004             1201-1300      348
                151- 200    7032             1301-1400      233
                201- 250    6626             1401-1500      193
                251- 300    6172             1501-1600      119
                301- 350    5852             1601-1700      112
                351- 400    5882             1701-1800       86
                401- 450    4500             1801-1900       91
                451- 500    4176             1901-2000       58
                501- 550    3138             2001-2100       33
                551- 600    2191             2101-2200       68
                601- 650    1688             2201-2300       67
                651- 700    1221             2301-2400       35
                701- 750    1095             2401-2500       41
                751- 800     891             >2500          207
                801- 850     685
                851- 900     736
                901- 950     509
                951-1000     432



   A.4  Longest sequences

   The longest sequences (>=4000 residues) are listed here:

                               HTS1_COCCA  5217
                               MUC2_HUMAN  5179
                               FAT_DROME   5147
                               RYNR_RABIT  5037
                               RYNR_PIG    5035
                               RYNR_HUMAN  5032
                               RYNC_RABIT  4969
                               LRP_CAEEL   4753
                               DYHC_DICDI  4725
                               PLEC_RAT    4687
                               LRP2_RAT    4660
                               DYHC_RAT    4644
                               DYHC_DROME  4639
                               DYHC_CAEEL  4568
                               DYHB_CHLRE  4568
                               APB_HUMAN   4563
                               APOA_HUMAN  4548
                               LRP1_HUMAN  4544
                               LRP1_CHICK  4543
                               DYHC_PARTE  4540
                               RRPA_CVMJH  4488
                               DYHG_CHLRE  4485
                               DYHC_ANTCR  4466
                               DYHC_TRIGR  4466
                               GRSB_BACBR  4451
                               PKSK_BACSU  4447
                               PKSL_BACSU  4427
                               PGBM_HUMAN  4393
                               YP73_CAEEL  4385
                               DYHC_NEUCR  4367
                               DYHC_NECHA  4349
                               DYHC_EMENI  4344
                               PKD1_HUMAN  4303
                               DYHC_SCHPO  4196
                               DYHC_YEAST  4092
                               RRPA_CVH22  4085


   A.5  Statistics for journal citations


   Total number of journals cited in this release of SWISS-PROT: 913


   A.5.1 Table of the frequency of journal citations

        Journals cited 1x: 339
                       2x: 124
                       3x:  70
                       4x:  39
                       5x:  37
                       6x:  23
                       7x:  17
                       8x:  15
                       9x:  14
                      10x:  10
                  11- 20x:  63
                  21- 50x:  65
                  51-100x:  24
                    >100x:  73


   A.5.2  List of the most cited journals in SWISS-PROT

   Nb    Citations       Journal abbreviation
   --    ---------       ----------------------------------
    1    6303            J. BIOL. CHEM.
    2    3814            PROC. NATL. ACAD. SCI. U.S.A.
    3    3384            NUCLEIC ACIDS RES.
    4    2714            J. BACTERIOL.
    5    2498            GENE
    6    2058            FEBS LETT.
    7    1932            EUR. J. BIOCHEM.
    8    1780            BIOCHEM. BIOPHYS. RES. COMMUN.
    9    1732            BIOCHEMISTRY
   10    1713            EMBO J.
   11    1617            NATURE
   12    1438            BIOCHIM. BIOPHYS. ACTA
   13    1339            J. MOL. BIOL.
   14    1228            CELL
   15    1184            MOL. CELL. BIOL.
   16     953            MOL. GEN. GENET.
   17     929            PLANT MOL. BIOL.
   18     888            BIOCHEM. J.
   19     873            GENOMICS
   20     808            SCIENCE
   21     768            MOL. MICROBIOL.
   22     764            VIROLOGY
   23     682            J. BIOCHEM.
   24     515            J. VIROL.
   25     464            YEAST
   26     461            J. CELL BIOL.
   27     445            J. GEN. VIROL.
   28     417            PLANT PHYSIOL.
   29     407            GENES DEV.
   30     376            HUM. MOL. GENET.
   31     346            J. IMMUNOL.
   32     342            HUM. MUTAT.
   33     323            ARCH. BIOCHEM. BIOPHYS.
   34     319            CURR. GENET.
   35     312            ONCOGENE
   36     312            INFECT. IMMUN.
   37     305            MOL. BIOCHEM. PARASITOL.
   38     270            FEMS MICROBIOL. LETT.
   39     264            BIOL. CHEM. HOPPE-SEYLER
   40     261            STRUCTURE
   41     254            AM. J. HUM. GENET.
   42     247            NAT. GENET.
   43     239            DEVELOPMENT
   44     237            MOL. ENDOCRINOL.
   45     234            J. CLIN. INVEST.
   46     218            J. MOL. EVOL.
   47     218            J. GEN. MICROBIOL.
   48     213            HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
   49     204            MICROBIOLOGY
   50     202            GENETICS
   51     191            HUM. GENET.
   52     188            NAT. STRUCT. BIOL.
   53     186            DNA CELL BIOL.
   54     182            J. EXP. MED.
   55     181            BLOOD
   56     175            DEV. BIOL.
   57     174            APPL. ENVIRON. MICROBIOL.
   58     172            NEURON
   59     157            PROTEIN SCI.
   60     153            DNA
   61     145            IMMUNOGENETICS
   62     137            ENDOCRINOLOGY
   63     136            DNA SEQ.
   64     125            PLANT CELL
   65     115            HEMOGLOBIN
   66     113            CANCER RES.
   67     113            BIOCHIMIE
   68     109            J. NEUROCHEM.
   69     109            BIOORG. KHIM.
   70     108            MOL. BIOL. EVOL.
   71     107            AGRIC. BIOL. CHEM.
   72     106            BRAIN RES. MOL. BRAIN RES.
   73     105            PLANT J.
         
         
   ========================================================================


   APPENDIX B: RELATIONSHIPS BETWEEN SWISS-PROT AND SOME BIOMOLECULAR
               DATABASES

   The current  status of  the relationships (cross-references) between
   SWISS-PROT and some biomolecular databases is shown in the following
   schematic:


                         ***********************
                         *  EMBL Nucleotide    *
                         *  Sequence Database  *
                         *       [EBI]         *
                         ***********************
                           ^ ^ ^  ^  ^ ^ ^ ^ ^
******************         | | |  I  | | | | |         **********************
* FlyBase        * <-------+ | |  I  | | | | +-------> * MGD [Mouse]        *
******************         | | |  I  | | | | |         **********************
                           | | |  I  | | | | |
******************         | | |  I  | | | | |         **********************
* SubtiList      * <---------+ |  I  | | | +---------> * GCRDb [7TM recep.] *
* [B.subtilis]   *         | | |  I  | | | | |         **********************
******************         | | |  I  | | | | |
                           | | |  I  | | | | |         **********************
******************         | | |  I  | | +-----------> * EcoGene [E.coli]   *
* Mendel [Plant] * <-----+ | | |  I  | | | | |         **********************
******************       | | | |  I  | | | | |
                         | | | |  I  | | | | |         **********************
******************       | | | |  I  +---------------> * SGD [Yeast]        *
* MaizeDb        * <-----------+  I  | | | | |         **********************
* [Zea mays]     *       | | | |  I  | | | | |
******************       | | | |  I  | | | | |         **********************
                         | | | |  I  | +-------------> * DictyDB [D.disco.] *
******************       | | | |  I  | | | | |         **********************
* WormPep        *       | | | |  I  | | | | |
* [C.elegans]    * <---+ | | | |  I  | | | | |         **********************
******************     | | | | |  I  | | | | | +-----> * ENZYME [Nomencl.]  *
                       | | | | |  I  | | | | | |       **********************
******************     | v v v v  v  v v v v v v           v
* REBASE         *     *************************       **********************
* [Restriction   * <-- *   SWISS-PROT          * ----> * OMIM [Human]       *
*  enzymes]      *     *   Protein Sequence    *       **********************
******************     *   Data Bank           *
                       *************************       **********************
******************      ^ ^ ^ ^ ^ ^ ^ | ^ ^ ^          * ECO2DBASE     [2D] *
* StyGene        *      | | | | | | | | | | +--------> **********************
* [S.Typhimurium]* <----+ | | | | | | | | |
******************        | | | | | | | | |            **********************
                          | | | | | | | | +----------> * Maize-2DPAGE  [2D] *
******************        | | | | | | | |              **********************
* Transfac       * <------+ | | | | | | |
******************          | | | | | | |              **********************
                            | | | | | | +------------> * SWISS-2DPAGE  [2D] *
******************          | | | | | |                **********************
* Harefield [2D] * <--------+ | | | | |
******************            | | | | |                **********************
                              | | | | +--------------> * Aarhus/Ghent  [2D] *
******************            | | | |                  **********************
* PROSITE        *            | | | |
* [Patterns and  * <----------+ | | +----------------> **********************
* profiles]      *              | |                    * YEPD [Yeast]  [2D] *
******************              | +----------------+   **********************
             |                  v                  |
             |          ***********************    +-> **********************
             +--------> * PDB [3D structures] * <----- * HSSP [3D similar.] *
                        ***********************        **********************

   =End=of=SWISS-PROT=release=36=notes=====================================