Skip Header

You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.

Swiss-Prot release 18.0

Published May 1, 1991



                    SWISS-PROT RELEASE 18.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 18.0  of SWISS-PROT  contains 20772 sequence entries, comprising
   6'792'034 amino  acids abstracted from 20580 references. This represents
   an increase of 4% over release 17. The recent growth of the data bank is
   summarized below:

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034


   1.2  Source of data

   Release 18.0  has been  updated using protein sequence data from release
   27.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 26.0 of the
   EMBL Nucleotide Sequence Data Library.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank  we list  here the  statistics  concerning  the  DR  (Databank
   Reference) pointer lines:

   Entries with pointer(s) to only PIR entri(es):           3863
   Entries with pointer(s) to only EMBL entri(es):          3435
   Entries with pointer(s) to both EMBL and PIR entri(es): 13001
   Entries with no pointers lines:                           473



      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 17


   2.1  Sequences and annotations

   About 780  sequences have been added since release 17, the sequence data
   of 139  existing entries  has been  updated and  the annotations of 3010
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  Bacterial luciferase subunits
   -  Beta-amylases
   -  Carboxylesterases type-B
   -  Class-I aminoacyl-tRNA synthetases
   -  Clusterins
   -  Fumarate reductases / succinate dehydrogenases
   -  G-protein coupled receptors
   -  GTPase-activating proteins
   -  IMP dehydrogenase / GMP reductase
   -  Insulin-like growth factor binding proteins
   -  Leucine/isoleucine-binding proteins
   -  Lipocalins
   -  Manganese-dependent dipeptidases
   -  Nickel-dependent hydrogenases
   -  P-II proteins (glnB)
   -  Pectinesterases
   -  Phenylalanine and histidine ammonia-lyases
   -  Phosphoenolpyruvate carboxykinase (ATP and GTP)
   -  Polygalacturonases
   -  Prokaryotic molybdopterin oxidoreductases
   -  Receptors tyrosine kinase class IV (FGF receptors)
   -  Ribosomal proteins
   -  Signal recognition particle 54 Kd protein
   -  Somatostatins
   -  Thymosin beta-4 family
   -  Tyrosinases
   -  Wnt-1 family


   2.2  Change in the OS line

   As previously announced we have inverted the order of the information in
   the OS  line. We  switched from  'English common  name (Latin  name)` to
   'Latin name (English common name)`. Example:

      OS   HUMAN (HOMO SAPIENS).

   as been changed to:

      OS   HOMO SAPIENS (HUMAN).


   2.3  Cross-references to the Escherichia coli gene-protein database

   We have  added cross-references  to the  Escherichia  coli  gene-protein
   database (2D gels spots) (for a description see: VanBogelen R.A., Hutton
   M.E., and Neidhardt F.C., in Electrophoresis (1990), 12:1131-1166).

   These cross-references  are present  in the  DR  lines.  The  data  bank
   identifier is  EC-2D-GEL, the  primary identifier  is the  2D  gel  spot
   alphanumeric designation,  and the  secondary identifier  is the  latest
   edition of  the data  bank that  we  have  used  to  derive  the  cross-
   reference. Example  of a  DR line  for the Escherichia coli gene-protein
   database:

      DR   EC-2D-GEL; G052.0; 3RD EDITION.


   2.4  Status of cross-references to PIR

   We have  continued adding cross-references to entries in the unannotated
   sections of  PIR (known  as PIR2  and PIR3);  currently we  have  cross-
   references to  13118 sequence  entries in PIR2/3 out of a total of 19051
   entries in those sections in release 27 of PIR.


   2.5  Documentation changes

   -  The EC2DTOSP.TXT  document is  an index  of  Escherichia  coli  Gene-
      protein database entries referenced in SWISS-PROT (see section 2.3).

   -  The SPEINDEX.TXT document is a species index.

   -  The JOURLIST.TXT  document  now  indicates,  when  it  exist,  the  6
      characters CODEN  designation of the journals cited in SWISS-PROT and
      in PROSITE. Example of an entry in the JOURLIST.TXT file:

             Abbrev: EMBO J.
             Title : EMBO Journal
             ISSN  : 0261-4189
             Coden : EMJ0DG

   -  The SPECODES.TXT  document is  no longer distributed. The information
      contained in  this document was duplicating that found in the species
      index.


   2.6  Absence of the line-types: CA and CF

   We announced  in the  last two release notes that, starting with release
   18, the enzyme entries in SWISS-PROT would have two new line-types:

      CA   Description_of_catalytic_activity.
      CF   Description_of_cofactor.


   We finally decided not to implement this change as it would create line-
   types specific  to a subset of entries (enzymes); it would open the door
   to the  creation of  too many types of lines. We believe that the use of
   topics in  the comment line is a better approach for the storage of such
   information.



                            3. FORTHCOMING CHANGES

   3.1  New line-types: RP and RC

   We plan  to implement the following change in release 19; the current RN
   line will  be replaced  by three  line types:  a modified  RN (Reference
   Number) line  type containing  just  the  reference  number,  a  new  RP
   (Reference Position) line type containing the extent of the work carried
   out by  the authors  of the  reference, and a new RC (Reference Comment)
   line type containing comments relevant to the reference (strain, tissue,
   etc.). Three examples of the usage of these new lines are given below.

      RN   [1]
      RP   SEQUENCE FROM N.A., AND SEQUENCE OF 1-23.
      RC   STRAIN=K12;

      RN   [1]
      RP   SEQUENCE OF 24-56 AND 67-89.
      RC   STRAIN=BALB/C; TISSUE=BRAIN;

      RN   [2]
      RP   X-RAY CRYSTALLOGRAPHY 1.8 ANGSTROMS.

   -  Each reference block will continue to have exactly one RN line.
   -  There will  always be  a single  RP line  which will  be in free text
      format.
   -  As many  RC lines  as are needed to display the comments will appear;
      if a reference has no comment then the RC line will not appear.
   -  A precise  syntax will be used to display the information that appear
      on the RC line.

   The syntax of the Rc line is:

      RC   TOKEN=Text; TOKEN=Text;....

   Where  the  following  token  are  already  defined:  MEDLINE,  PLASMID,
   SPECIES, STRAIN, and TISSUE. Additional tokens will probably be added to
   this list.


   3.2  MEDLINE unique identifiers

   Starting with  release 19  each journal  reference listed  in SWISS-PROT
   which exists  in the  MEDLINE bibliographic  data bank  will include the
   "Unique Identifier"  (UI) of that reference in MEDLINE. This information
   will be stored in the new RC line using the "MEDLINE" token. Example:

        RC   MEDLINE=90205618;

   It is  planned that,  in a few months, MEDLINE will add cross-references
   to SWISS-PROT.


                            4. ENZYME AND PROSITE

   4.1  The ENZYME data bank

   Release 5.0 of the ENZYME data bank is distributed along with release 18
   of SWISS-PROT.  ENZYME release 5.0 contains information relative to 3072
   enzymes. The  data bank  is complete  and up  to date.  Until new enzyme
   nomenclature data  is published  we only  plan to  update the SWISS-PROT
   pointers at  each release  of the  protein sequence  data bank,  correct
   eventual errors,  and complete  the information  concerning synonyms and
   cofactors using the literature.

   4.2  The PROSITE data bank

   Release 7.0  of the  PROSITE data bank is distributed along with release
   18 of  SWISS-PROT. Release  7.0 contains 441 documentation chapters that
   describes 508  different patterns.  Since  the  last  major  release  of
   PROSITE (release  6.0 of November 1990), 69 new chapters have been added
   and 163 chapters have been updated.


                            5. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.




                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.65   Gln (Q) 4.09   Leu (L) 9.11   Ser (S) 7.08
   Arg (R) 5.23   Glu (E) 6.28   Lys (K) 5.85   Thr (T) 5.85
   Asn (N) 4.44   Gly (G) 7.12   Met (M) 2.32   Trp (W) 1.30
   Asp (D) 5.24   His (H) 2.27   Phe (F) 3.95   Tyr (Y) 3.21
   Cys (C) 1.83   Ile (I) 5.44   Pro (P) 5.09   Val (V) 6.49

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 2864

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1266
                            2x:  520
                            3x:  284
                            4x:  172
                            5x:  121
                            6x:   94
                            7x:   73
                            8x:   37
                            9x:   58
                           10x:   29
                       11- 20x:  101
                       21-100x:   86
                         >100x:   23



        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        1715          Human
         2        1454          Escherichia coli
         3        1001          Mouse
         4         933          Rat
         5         675          Baker's yeast (Saccharomyces cerevisiae)
         6         486          Bovine
         7         388          Fruit fly (Drosophila melanogaster)
         8         354          Chicken
         9         282          Bacillus subtilis
        10         255          Rabbit
        11         251          Vaccinia virus (strain Copenhagen)
        12         246          African clawed frog (Xenopus laevis)
        13         230          Pig
        14         193          Human cytomegalovirus (strain AD169)
        15         191          Salmonella typhimurium
        16         160          Bacteriophage T4
        17         152          Maize
        18         128          Rice
        19         113          Vaccinia virus (strain WR)
        20         112          Tobacco
                                Pea
        22         106          Wheat
        23         102          Staphylococcus aureus
        24          96          Sheep
        25          93          Barley
                                Slime mold (Dictyostelium discoideum)
        27          84          Agrobacterium tumefaciens
                                Liverwort (Marchantia polymorpha)
                                Spinach
        30          83          Soybean
        31          80          Fission yeast (Schizosaccharomyces pombe)
        32          79          Pseudomonas aeruginosa
                                Klebsiella pneumoniae
        34          78          Dog
        35          77          Neurospora crassa



   A.3  Repartition of the sequences by size


               From   To  Number             From   To   Number
                  1-  50    1390             1001-1100      191
                 51- 100    2369             1101-1200      122
                101- 150    3358             1201-1300       94
                151- 200    2034             1301-1400       57
                201- 250    1666             1401-1500       46
                251- 300    1490             1501-1600       24
                301- 350    1311             1601-1700       23
                351- 400    1281             1701-1800       23
                401- 450     979             1801-1900       27
                451- 500    1057             1901-2000       22
                501- 550     811             2001-2100        9
                551- 600     555             2101-2200       23
                601- 650     390             2201-2300       28
                651- 700     288             2301-2400       11
                701- 750     270             2401-2500       13
                751- 800     214             >2500           50
                801- 850     162
                851- 900     172
                901- 950     112
                951-1000     100



   Currently the ten largest sequences are:


                            RYNR$RABIT  5037 a.a.
                            RYNR$HUMAN  5032 a.a.
                            APB$HUMAN   4563 a.a.
                            APOA$HUMAN  4548 a.a.
                            POLG$BVDV   3988 a.a.
                            POLG$HCVA   3898 a.a.
                            POLG$HCVB   3898 a.a.
                            TRX$DROME   3759 a.a.
                            ACVA$PENCH  3746 a.a.
                            DMD$HUMAN   3685 a.a.



                         APPENDIX B: ON-LINE EXPERTS



   B.1  List of on-line experts for PROSITE and SWISS-PROT


Field of expertise           Name                 Email address
---------------------------  -------------------  --------------------------
Alcohol dehydrogenases       Bengt P.             bengt@medfys.ki.se
Aldehyde dehydrogenases      Bengt P.             bengt@medfys.ki.se
Alpha-2-macroglobulins       Van Leuven F.        fred@blekul13.bitnet
Apolipoproteins              Boguski M.S.         boguski@ncbi.nlm.nih.gov
Arrestins                    Kolakowski L.F. Jr.  lfk@athena.mit.edu
Bacteriophage P4             Halling C.           chh9@midway.uchicago.edu
Beta-lactamases              Brannigan J.         jab5@vaxa.york.ac.uk
Chitinases                   Henrissat B.         cermav@frgren81.bitnet
CTF/NF-I                     Mermod N.            nmermod@clsuni51.bitnet
Cytochromes P450             Holsztynska E.J.     ela@netcom.uucp
                                                  netcom!ela@apple.com
EF-hand calcium-binding      Cox J.A.             cox@cgeuge52.bitnet
                             Kretsinger R.H.      rhk5i@virginia.bitnet
Eryf1-type zinc-fingers      Boguski M.S.         boguski@ncbi.nlm.nih.gov
Glucanases                   Henrissat B.         cermav@frgren81.bitnet
                             Beguin P.            phycel@pasteur.bitnet
G-protein coupled receptors  Chollet A.           chollet@clients.switch.ch
                             Attwood T.K.         bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins   Boguski M.S.         boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17         Landsman D.          landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases   Kolakowski L.F. Jr.  lfk@athena.mit.edu
Integrases                   Roy P.H.             2020000@lavalvx1.bitnet
Phytochromes                 Partis M.D.          partis@gcri.afrc.ac.uk
Protein kinases              Hanks S.             hanks@vuctrvax.bitnet
Restriction-modification     Bickle T.            bickle@urz.unibas.ch
                             Roberts R.J.         roberts@cshl.org
Ring-cleavage dioxygenases   Harayama S.          harayama@cgecmu51.bitnet
Subtilisin family proteases  Brannigan J.         jab5@vaxa.york.ac.uk
Thiol proteases              Turks B.             turk@ijs.ac.mail.yu
Thiol proteases inhibitors   Turks B.             turk@ijs.ac.mail.yu
TPR repeats                  Boguski M.S.         boguski@ncbi.nlm.nih.gov
Transit peptides             von Heijne G.        gunnar@cbts.sunet.se
Type-II membrane antigens    Levy S.              levy@cellbio.stanford.edu
Xylose isomerase             Jenkins J.           jenkins@frira.afrc.ac.uk


   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and which would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at one of the following electronic mail addresses:

                           bairoch@cgecmu51.bitnet
                           bairoch@cmu.unige.ch