Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 10.0

Published March 5, 1989


             SWISS-PROT RELEASE 10.0 RELEASE NOTES


   Date:     March 5, 1989
   Author:   A. Bairoch


                         1. INTRODUCTION

   1.1  Evolution

   Release 10.0 of SWISS-PROT contains 10008 sequence entries,
   comprising  2'952'613  amino  acids  abstracted  from  9920
   references. This represents an increase of 15% over release
   9.0. The  recent growth  of the  data  bank  is  summarised
   below:

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613


   1.2  Source of data

   Release 10.0  has been  updated using protein sequence data
   from  release  19.0  of  the  PIR  (Protein  Identification
   Resource) protein  data bank,  as well  as  translation  of
   nucleotide sequence  data from  release 18.0  of  the  EMBL
   nucleotide sequence Data Library.

   As an  indication to the source of the sequence data in the
   SWISS-PROT data bank we list here the statistics concerning
   the DR (Databank Reference) pointer lines:

   Entries with pointer(s) to only PIR entri(es):          3009
   Entries with pointer(s) to only EMBL entri(es):         3383
   Entries with pointer(s) to both EMBL and PIR entri(es): 2973
   Entries with no pointers lines (entered in house):       643


   2. Description  of the  changes made  to  SWISS-PROT  since
      release 9

   2.1  Sequences and annotations

   Some 1306  new sequences  have been  added since  the  last
   release, the sequence data of 159 existing entries has been
   updated and  the annotations  of  1699  entries  have  been
   revised. In  particular we  have used  reviews articles  to
   update the  annotations of the following groups or families
   of proteins:

      Aerolysin type toxins
      Asparaginases / Glutaminases
      Aspartyl proteases
      ATP-binding proteins 'active transport' family
      Bacterial and fungi ribonucleases
      Bowman-Birk serine protease inhibitors
      Cadherins
      Calcitonins
      Clathrin light chains
      Crystallins beta and gamma
      Cytosine-specific DNA methylases
      Colony stimulating factors
      Glucagon / GIP / secretin / VIP family
      E1-E2 ATPases
      Crystaline entomocidal toxin proteins
      Fos/jun proteins family.
      Galactose-1-phosphate uridyl transferase
      Glutathione S-transferase
      Herpes and Varicella Zooster viruses proteins
      Int-1 family
      Interferons alpha and beta
      Interferons induced proteins
      Kazal serine protease inhibitors
      Lipoxygenases
      LysR bacterial activator proteins family
      Manganese and Iron superoxide dismutases
      Myb family proteins
      Myc family proteins
      Nicotinic acetylcholine receptors
      Pancreatic hormone / Neuropeptide Y family
      Pancreatic ribonucleases
      Peptidyl-prolyl     cis-trans     isomerase     (ppiase)
      (cyclophilin)
      Platelet factor 4 family.
      Shiga/Ricin ribosomal inactivating toxins
      Somatotropin, prolactin and related hormones
      Squash family of serine protease inhibitors
      Thymidylate synthase
      TNF alpha and beta
      Topoisomerases type II
      Tropomyosins


   2.2  New format for the date (DT) line type

   The format of the DT line has been changed and is now:

    DT   DD-MMM-YYYY  (REL. XX, COMMENT)

   where DD  is the  day, MMM the month, YYYY the year, and XX
   the SWISS-PROT  release number.  The comment portion of the
   line indicates  the action  taken on  That date.  There are
   always three  DT lines  in each  entry,  each  of  them  is
   associated with a specific comment:

   -  The  first  DT  line  indicates  when  the  entry  first
      appeared in  the data  bank. The  associated comment  is
      "CREATED".
   -  The second  DT line indicates when the sequence data was
      last modified.  The associated comment is "LAST SEQUENCE
      UPDATE".
   -  The third DT line indicates when any data other then the
      sequence was  last modified.  The associated  comment is
      "LAST ANNOTATION UPDATE".

   Example of a block of DT lines:

   DT   01-JAN-1988  (REL. 06, CREATED)
   DT   01-AUG-1988  (REL. 08, LAST SEQUENCE UPDATE)
   DT   01-MAR-1989  (REL. 10, LAST ANNOTATION UPDATE)


   2.3   Extension of  the taxonomic  classification in the OC
   lines

   In  previous   releases  of  SWISS-PROT  the  OC  (Organism
   Classification) lines  only contained the first node of the
   taxonomic tree (PROKARYOTA, EUKARYOTA or VIRIDAE). Starting
   with release  10  we  are  implementing  a  full  taxonomic
   classification. In  release  10,  164  different  taxonomic
   nodes have  been  defined.  The  list  of  these  nodes  is
   available in the SPECLIST.TXT document file.

   2.4  New topic for the comments (CC) line type

   As of  release 10  we have  added a  new  'topic'  for  the
   comments  (CC)   line-type:  COFACTOR,  which  is  used  to
   describe enzyme cofactor(s). Example of its usage:

     CC   -!- COFACTOR: REQUIRES PYRIDOXAL PHOSPHATE.


   2.5  New feature key

   A new  feature key  has been  introduced in  this  release:
   TRANSIT, which  describes the  extent of  a transit peptide
   (mitochondrial or  chloroplastic). Examples  of TRANSIT key
   feature lines:

     FT   TRANSIT       1     25       MITOCHONDRION.
     FT   TRANSIT       1     42       CHLOROPLAST.


   3. THE NEXT RELEASE

   SWISS-PROT release 11.0 will be available in June 1989.


   4. WE NEED YOUR HELP !

   We welcome any feedback from our users. We especially would
   appreciate that  you notify  us if  you find that sequences
   belonging to  your field  of expertise are missing from the
   data  bank.  We  also  would  like  to  be  notified  about
   annotations to  be updated,  as for example if the function
   of  a   protein  has   been  clarified   or  if  new  post-
   translational information has become available.


   APPENDIX A: DISKS FOR SWISS-PROT

   A.1  IBM PC/AT 1.2 Mb disks

   SWISS-PROT is  stored on twelve 1.2 Mb disks. Each of these
   disk  contains   a  single   bulk  file   (PRT10_01.BLK  to
   PRT10_12.BLK):

   Disk     First sequence        Last Sequence
    1       10KA$MYCTU            C1QC$HUMAN
    2       C1S$HUMAN             CRA$PLAFA
    3       CRA2$MESAU            ENV$HIV1M
    4       ENV$HIV1P             H1$DROME
    5       H1$ECHCR              HYEP$HUMAN
    6       HYEP$RABIT            KV3T$MOUSE
    7       KV3U$MOUSE            NGFR$RAT
    8       NIF1$CLOPA            POLG$FMDVI
    9       POLG$FMDVO            RNP$DAMDA
   10       RNP$DAMKO             TKN$PHYFU
   11       TKN1$HYLMA            VIB1$AGRT9
   12       VIB2$AGRT6            ZIPP$DROME

   A.2  IBM PS/2 1.4 Mb disks

   The number  and content  of the  1.4 Mb  disks for the PS/2
   systems are  exactly identical to those of the 1.2 Mb disks
   (see above).

   APPENDIX B: SOME STATISTICS


   B.1  Amino acid composition

        COMPOSITION IN PERCENT FOR THE COMPLETE DATA BANK

   Ala (A) 7.77   Gln (Q) 4.11   Leu (L) 9.08   Ser (S) 7.00
   Arg (R) 5.23   Glu (E) 6.15   Lys (K) 5.81   Thr (T) 5.84
   Asn (N) 4.36   Gly (G) 7.30   Met (M) 2.26   Trp (W) 1.35
   Asp (D) 5.21   His (H) 2.29   Phe (F) 3.94   Tyr (Y) 3.22
   Cys (C) 1.89   Ile (I) 5.30   Pro (P) 5.21   Val (V) 6.51

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


      CLASSIFICATION OF THE AMINO ACIDS BY THEIR FREQUENCY

   Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Arg, Asp, Pro,
   Asn, Gln, Phe, Tyr, His, Met, Cys, Trp



   B.2   Repartition of  the sequences  by their  organism  of
         origin

   Total number  of species represented in this release of the
   data bank: 1590

   Species represented 1x: 741
                       2x: 291
                       3x: 163
                       4x:  99
                       5x:  58
                       6x:  45
                       7x:  27
                       8x:  28
                       9x:  30
                      10x:  13
                  11- 20x:  43
                  21-100x:  42
                    >100x:  10



              TABLE OF THE MOST REPRESENTED SPECIES

   918: HUMAN       173: DROME       71: BPT4        59: VACCV
   838: ECOLI       143: RABIT       70: HSV11       57: WHEAT
   497: MOUSE       126: PIG         69: TOBAC       54: BPT7
   414: RAT          93: XENLA       67: VZVD        53: SOYBN
   324: YEAST        84: BACSU       62: LAMBD       53: SHEEP
   282: BOVIN        80: SALTY       60: MARPO
   176: CHICK        74: MAIZE       59: EPBAR


   B.3  Repartition of the sequences by size

     From  To   Number            From   To   Number
        1-  50     582            1001-1100       70
       51- 100    1291            1101-1200       46
      101- 150    2043            1201-1300       35
      151- 200    1066            1301-1400       24
      201- 250     805            1401-1500       17
      251- 300     662            1501-1600        8
      301- 350     588            1601-1700       11
      351- 400     549            1701-1800        9
      401- 450     420            1801-1900        8
      451- 500     461            1901-2000        5
      501- 550     369                >2000       49
      551- 600     216
      601- 650     168            Currently the two largest sequences are:
      651- 700     114            APB$HUMAN   4563 a.a.
      701- 750      99            APOA$HUMAN  4548 a.a.
      751- 800      70
      801- 850      66
      851- 900      85
      901- 950      37
      951-1000      35