Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 19.0

Published August 1, 1991




                    SWISS-PROT RELEASE 19.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 19.0  of SWISS-PROT  contains 21795 sequence entries, comprising
   7'173'785 amino  acids abstracted from 21773 references. This represents
   an increase of 6% over release 18. The recent growth of the data bank is
   summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785


   1.2  Source of data

   Release 19.0  has been  updated using protein sequence data from release
   28.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 27.0 of the
   EMBL Nucleotide Sequence Database.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):           3912
   Entries with pointer(s) to only EMBL entri(es):          3542
   Entries with pointer(s) to both EMBL and PIR entri(es): 13835
   Entries with no pointers lines:                           506









<PAGE>




      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 18


   2.1  Sequences and annotations

   About 1040 sequences have been added since release 18, the sequence data
   of 194  existing entries  has been  updated and  the annotations of 2220
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  Adenylosuccinate synthetase
   -  Aldose 1-epimerase
   -  Argininosuccinate synthase
   -  Amidases
   -  Bacterial regulatory proteins, lacI family
   -  Bacterial regulatory proteins, merR family
   -  Bacterioferritin
   -  Delta 1-pyrroline-5-carboxylate reductase
   -  DNA polymerase family X
   -  Eukaryotic molybdopterin-dependent oxidoreductases
   -  Eukaryotic porin
   -  Ferrochelatase
   -  Fibrillarin
   -  FMN-dependent alpha-hydroxy acid dehydrogenases
   -  Hemerythrins
   -  HlyD family secretion proteins
   -  Hok/gef family cell toxic proteins
   -  Matrixins
   -  Methylmalonyl-CoA mutase
   -  Phosphoribulokinase
   -  Plant viruses icosahedral capsid proteins
   -  Porphobilinogen deaminase
   -  Pyrokinins
   -  Ribonuclease T2 family
   -  Stathmin family
   -  Transglutaminases
   -  Ubiquitin-activating enzyme
   -  Zinc finger RFP/RPT-1 family


   2.2  New line types: RP and RC

   The following change has been implemented in release 19; the RN line has
   been replaced by three line types: a modified RN (Reference Number) line
   type containing just the reference number, a new RP (Reference Position)
   line type  containing the  extent of the work carried out by the authors
   of the  reference, and a new RC (Reference Comment) line type containing
   comments  relevant  to  the  reference  (strain,  tissue,  etc.).  Three
   examples of the usage of these new line types are given below.






<PAGE>




      RN   [1]
      RP   SEQUENCE FROM N.A., AND SEQUENCE OF 1-23.
      RC   STRAIN=K12;

      RN   [1]
      RP   SEQUENCE OF 24-56 AND 67-89.
      RC   STRAIN=BALB/C; TISSUE=BRAIN;

      RN   [1]
      RP   X-RAY CRYSTALLOGRAPHY, 1.8 ANGSTROMS.
      RC   MEDLINE=91002678;

   Within a  reference block  the RN  and RP  lines occur once, the RC line
   occurs zero or more times.

   The format of the RC line is:

      RC   TOKEN1=Text; TOKEN2=Text;....

   Where the  following tokens  are currently  defined:  MEDLINE,  PLASMID,
   SPECIES, STRAIN, TISSUE, and TRANSPOSON.

   The `SPECIES'  token is  only used  when an  entry describes  a sequence
   which is  identical in more than one species; similarly the `PLASMID' is
   only used  if an  entry describes  a sequence identical in more than one
   plasmid.


   2.3  MEDLINE unique identifiers

   Starting with  release 19  each journal  reference listed  in SWISS-PROT
   which  exists   in  the  National  Library  of  Medicine  (NLM)  MEDLINE
   bibliographic data  bank includes  the `Unique  Identifier' (UI) of that
   reference. This  information is stored in the new RC line type using the
   `MEDLINE' token. Example:

      RC   MEDLINE=90205618;

   It is  planned that,  in a few months, MEDLINE will add cross-references
   to SWISS-PROT.
















<PAGE>




   2.4  New cross-references

   We have  added cross-references  to the  Transcription Factors  Database
   (TFD) of  David Ghosh  (for a description see: Nucleic Acids Res. (1990)
   18:1749-1756); as well as to the Drosophila Genetic Maps database (DMAP)
   prepared  by   Michael  Ashburner  at  the  Department  of  Genetics  in
   Cambridge, England. These cross-references are present in the DR lines:

   Data bank identifier: DMAP
   Primary identifier  : Gene unique identifier number (UID).
   Secondary identifier: Latest release of DMAP that was used to derive
                         the cross-references.
   Example             : DR   DMAP; 00055; RELEASE 9107.

   Data bank identifier: TFD
   Primary identifier  : Unique identifier for the corresponding TFD
                         POLYPEPTIDES table entry (the TFD_ID field).
   Secondary identifier: Latest release of TFD that was used to derive
                         the cross-references.
   Example             : DR   TFD; P00040; RELEASE 3.0.


   2.5  Minor change in the DT line format

   There is  now a  single space character between the date and the comment
   part of  a DT  line instead  of the  two spaces  that used  to exist  in
   previous releases. Example:

      DT   01-MAY-1991  (REL. 18, CREATED)

   has been changed to:

      DT   01-MAY-1991 (REL. 18, CREATED)


   2.6  Minor change in the RL lines for submissions

   References for  sequence  information  submitted  to  the  international
   nucleic acid  databases (DDBJ,  EMBL, Genbank)  were represented  by the
   following subtype of RL lines:

      RL   SUBMITTED (JAN-1991) TO EMBL/GENBANK DATA BANKS.

   Starting with release 19, these RL lines use the following format:

      RL   SUBMITTED (JAN-1991) TO EMBL/GENBANK/DDBJ DATA BANKS.










<PAGE>




   2.7  Status of cross-references to PIR

   We have  continued adding cross-references to entries in the unannotated
   sections of  PIR (known  as PIR2  and PIR3);  currently we  have  cross-
   references to  14078 sequence  entries in PIR2/3 out of a total of 20265
   entries in those sections in release 28 of PIR.


                            3. FORTHCOMING CHANGES

   3.1  Change in the format of the entry names

   Starting with  release 21  we will  replace the dollar sign `$' in entry
   names by the underscore character `_'. This change is made on the behalf
   of users  of sequence analysis software running under the Unix operating
   system, where  the dollar  sign is a reserved symbol. Example: the entry
   name `CYC$HUMAN' will be changed to `CYC_HUMAN'.


                            4. ENZYME AND PROSITE

   4.1  The ENZYME data bank

   Release 6.0 of the ENZYME data bank is distributed along with release 19
   of SWISS-PROT.  ENZYME release 6.0 contains information relative to 3072
   enzymes. The  data bank  is complete  and up  to date.  Until new enzyme
   nomenclature data  is published  we only  plan to  update the SWISS-PROT
   pointers at  each release  of the  protein sequence  data bank,  correct
   eventual errors,  and complete  the information  concerning synonyms and
   cofactors using the literature.

   4.2  The PROSITE data bank

   Release 7.10  of the PROSITE data bank is distributed along with release
   19 of  SWISS-PROT. Release 7.10 contains 441 documentation chapters that
   describes 508 different patterns. Release 7.10 does not really represent
   a new release; the only changes between release 7.0 and 7.1 are updating
   of the  pointers to the SWISS-PROT entries whose name have been modified
   between release  18 and  19. The  next release  of PROSITE (8.0) will be
   distributed with release 20 of SWISS-PROT.


                            5. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.






<PAGE>




                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.64   Gln (Q) 4.09   Leu (L) 9.11   Ser (S) 7.10
   Arg (R) 5.23   Glu (E) 6.27   Lys (K) 5.85   Thr (T) 5.85
   Asn (N) 4.46   Gly (G) 7.11   Met (M) 2.32   Trp (W) 1.30
   Asp (D) 5.24   His (H) 2.27   Phe (F) 3.96   Tyr (Y) 3.21
   Cys (C) 1.82   Ile (I) 5.45   Pro (P) 5.09   Val (V) 6.49

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 2986

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1302
                            2x:  543
                            3x:  303
                            4x:  184
                            5x:  128
                            6x:   93
                            7x:   78
                            8x:   40
                            9x:   60
                           10x:   32
                       11- 20x:  111
                       21-100x:   88
                         >100x:   24













<PAGE>




        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        1790          Human
         2        1512          Escherichia coli
         3        1043          Mouse
         4         972          Rat
         5         735          Baker's yeast (Saccharomyces cerevisiae)
         6         499          Bovine
         7         414          Fruit fly (Drosophila melanogaster)
         8         362          Chicken
         9         286          Bacillus subtilis
        10         261          Rabbit
        11         252          African clawed frog (Xenopus laevis)
        12         251          Vaccinia virus (strain Copenhagen)
        13         238          Pig
        14         201          Salmonella typhimurium
        15         193          Human cytomegalovirus (strain AD169)
        16         166          Bacteriophage T4
        17         154          Maize
        18         132          Rice
        19         113          Vaccinia virus (strain WR)
        20         112          Tobacco
                                Pea
        22         110          Wheat
        23         105          Staphylococcus aureus
        24         101          Slime mold (Dictyostelium discoideum)
        25          98          Sheep
        26          97          Barley
        27          90          Fission yeast (Schizosaccharomyces pombe)
        28          89          Spinach
        29          87          Pseudomonas aeruginosa
        30          85          Soybean
        31          84          Liverwort (Marchantia polymorpha)
        32          81          Agrobacterium tumefaciens
        33          80          Dog
                                Klebsiella pneumoniae
        35          79          Neurospora crassa


















<PAGE>




   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    1458             1001-1100      203
                 51- 100    2439             1101-1200      124
                101- 150    3466             1201-1300       97
                151- 200    2112             1301-1400       63
                201- 250    1757             1401-1500       49
                251- 300    1558             1501-1600       27
                301- 350    1395             1601-1700       24
                351- 400    1365             1701-1800       26
                401- 450    1052             1801-1900       29
                451- 500    1121             1901-2000       22
                501- 550     855             2001-2100        9
                551- 600     583             2101-2200       24
                601- 650     422             2201-2300       30
                651- 700     304             2301-2400       11
                701- 750     284             2401-2500       13
                751- 800     226             >2500           51
                801- 850     178
                851- 900     190
                901- 950     116
                951-1000     112


   Currently the ten largest sequences are:


                            RYNR$RABIT  5037 a.a.
                            RYNR$HUMAN  5032 a.a.
                            APB$HUMAN   4563 a.a.
                            APOA$HUMAN  4548 a.a.
                            POLG$BVDV   3988 a.a.
                            POLG$HCVA   3898 a.a.
                            POLG$HCVB   3898 a.a.
                            TRX$DROME   3759 a.a.
                            ACVA$PENCH  3746 a.a.
                            DMD$HUMAN   3685 a.a.
















<PAGE>




                         APPENDIX B: ON-LINE EXPERTS



   B.1  List of on-line experts for PROSITE and SWISS-PROT

Field of expertise            Name                Email address
----------------------------- ------------------  --------------------------
African swine fever virus     Yanez R.J.          ryanez@cbm2.uam.es
Alcohol dehydrogenases        Bengt P.            bengt@medfys.ki.se
Aldehyde dehydrogenases       Bengt P.            bengt@medfys.ki.se
Alpha-crystallins/HSP-20      Leunissen J.A.M.    jackl@caos.caos.kun.nl
Alpha-2-macroglobulins        Van Leuven F.       fred@blekul13.bitnet
Apolipoproteins               Boguski M.S.        boguski@ncbi.nlm.nih.gov
Arrestins                     Kolakowski L.F.Jr.  lfk@athena.mit.edu
Bacteriophage P4 proteins     Halling C.          chh9@midway.uchicago.edu
Beta-lactamases               Brannigan J.        jab5@vaxa.york.ac.uk
Chitinases                    Henrissat B.        cermav@frgren81.bitnet
CTF/NF-I                      Mermod N.           nmermod@clsuni51.bitnet
Cytochromes P450              Holsztynska E.J.    ela@netcom.uucp
                                                  netcom!ela@apple.com
EF-hand calcium-binding       Cox J.A.            cox@cgeuge52.bitnet
                              Kretsinger R.H.     rhk5i@virginia.bitnet
Eryf1-type zinc-fingers       Boguski M.S.        boguski@ncbi.nlm.nih.gov
fruR/lacI family HTH proteins Reizer J.           jreizer@ucsd.edu
Glucanases                    Henrissat B.        cermav@frgren81.bitnet
                              Beguin P.           phycel@pasteur.bitnet
G-protein coupled receptors   Chollet A.          chollet@clients.switch.ch
                              Attwood T.K.        bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins    Boguski M.S.        boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17          Landsman D.         landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr.  lfk@athena.mit.edu
Integrases                    Roy P.H.            2020000@lavalvx1.bitnet
Phytochromes                  Partis M.D.         partis@gcri.afrc.ac.uk
Prokaryotic carbohydrate      Reizer J.           jreizer@ucsd.edu
            kinases
Protein kinases               Hanks S.            hanks@vuctrvax.bitnet
Restriction-modification      Bickle T.           bickle@urz.unibas.ch
            enzymes           Roberts R.J.        roberts@cshl.org
Ribosomal protein S3          Hallick R.          hallick%biotec@arizona.edu
Ribosomal protein S15         Ellis S.R.          srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.         harayama@cgecmu51.bitnet
Sodium symporters             Reizer J.           jreizer@ucsd.edu
Subtilisin family proteases   Brannigan J.        jab5@vaxa.york.ac.uk
Thiol proteases               Turks B.            turk@ijs.ac.mail.yu
Thiol proteases inhibitors    Turks B.            turk@ijs.ac.mail.yu
TPR repeats                   Boguski M.S.        boguski@ncbi.nlm.nih.gov
Transit peptides              von Heijne G.       gunnar@cbts.sunet.se
Type-II membrane antigens     Levy S.             levy@cellbio.stanford.edu
Xylose isomerase              Jenkins J.          jenkins@frira.afrc.ac.uk






<PAGE>





   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and which would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at one of the following electronic mail addresses:

                           bairoch@cgecmu51.bitnet
                             bairoch@cmu.unige.ch






































<PAGE>