Swiss-Prot release 9.0
Published December 1, 1988
SWISS-PROT RELEASE 9.0 RELEASE NOTES
Date: December 1, 1988
Author: A. Bairoch
1. INTRODUCTION
1.1 Evolution
Release 9.0 of SWISS-PROT contains 8702 sequence entries,
comprising 2'498'140 amino acids abstracted from 8735
references. This represents an increase of 12% over release
8.0. The recent growth of the data bank is summarised
below:
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
1.2 Source of data
Release 9.0 has been updated using protein sequence data
from release 17.0 of the PIR (Protein Identification
Resource) protein data bank, as well as translation of
nucleotide sequence data from release 17.0 of the EMBL
nucleotide sequence Data Library.
As an indication to the source of the sequence data in the
SWISS-PROT data bank we list here the statistics concerning
the DR (Databank Reference) pointer lines:
Entries with pointer(s) to only PIR entri(es): 3049
Entries with pointer(s) to only EMBL entri(es): 2640
Entries with pointer(s) to both EMBL and PIR entri(es): 2532
Entries with no pointers lines (entered in house): 481
2. Description of the changes made to SWISS-PROT since
release 8
2.1 Sequences and annotations
Some 978 new sequences have been added since the last
release, the sequence data of 111 existing entries has been
updated and the annotations of 1622 entries have been
revised. In particular we have used reviews articles to
update the annotations of the following groups or families
of proteins:
Acidic ribosomal proteins
Adipokinetic hormone family
ATP synthase subunits
Bacterial histone-like DNA-binding proteins
Bombesin-like peptides family
Chaperonins
Cholecystokinin/gastrin family
Ferredoxins
G-protein coupled receptors
Histones H2A
Histones H4
HIV/SIV viruses proteins
Homeobox proteins
Integrins alpha chains
Intermediate filaments
Myosin heavy chains
Myosin light chains
Ovomucoids
Oxytocin/vasopressin family
Potato inhibitor I family
Prion protein
Receptor tyrosine-protein kinase class II
Receptor tyrosine-protein kinase class III
Ribonucleoproteins
Serine carboxypeptidases
Silk moth chorion proteins
Thiol proteases inhibitors
2.2 Cross-references to the HIV Sequence Database
Starting with release 9 we have added cross-references to
entries in the Human Retroviruses and AIDS compilation of
nucleic and amino acid sequences (HIV Sequence
Database)(1). An example of a DR line pointing to an HIV
entry is shown here:
DR HIV; M15654; 3ORF$BH102.
2.3 New topic for the comments (CC) line type
As of release 9 we have added a new 'topic' for the
comments (CC) line-type: ALTERNATIVE SPLICING which is used
to describe the existence of related protein sequence(s)
produced by alternative splicing of the same gene(s).
Example of its usage:
CC -!- ALTERNATIVE SPLICING: SKELETAL MUSCLE AND FIBROBLAST
CC TROPOMYOSINS ARE OBTAINED BY ALTERNATIVE MRNA SPLICING.
3. THE NEXT RELEASE
SWISS-PROT release 10.0 will be available in February 1989.
We plan to upgrade the OC lines so as to add at least a few
level of taxonomic nodes (currently only the top-node:
viridae, prokaryota or eukaryota is available).
4. WE NEED YOUR HELP !
We welcome any feedback from our users. We especially would
appreciate that you notify us if you find that sequences
belonging to your field of expertise are missing from the
data bank. We also would like to be notified about
annotations to be updated, as for example if the function
of a protein has been clarified or if new post-
translational information has become available.
____________________
1 The HIV Sequence Database is edited by G. Myers, A.B.
Rabson, S.F. Josephs, T.F. Smith, F. Wong-Staal;
published by the Theoretical Biology and Biophysics
Group T-10 at Los Alamos National Laboratory; and funded
by the AIDS program of the National Institute of Allergy
and Infectious Diseases through an interagency agreement
with the United States Department of Energy.
APPENDIX A: DISKS FOR SWISS-PROT
- Release 9 is the first release to be available on a CD-
ROM disk.
- Starting with release 9 we have stopped distributing
SWISS-PROT on 360 Kb disks.
A.1 IBM PC/AT 1.2 Mb disks
SWISS-PROT is stored on ten 1.2 Mb disks. Each of these
disk contains a single bulk file (PROT9_01.BLK to
PROT9_10.BLK):
Disk First sequence Last Sequence
1 16K$TRVPS CAS2$BOVIN
2 CAS2$SHEEP CYSE$ECOLI
3 CYSP$MOUSE GENK$ECOLI
4 GENL$ECOLI IBB3$SOYBN
5 IBB4$MACAX LPID$EDWTA
6 LPIV$ECOLI PA21$PIG
7 PA21$PSEAU RENI$RAT
8 RENS$MOUSE TOLC$ECOLI
9 TOLL$DROME VSI1$REOV1
10 VSI1$REOV2 ZEB2$MAIZE
A.2 IBM PS/2 1.4 Mb disks
The number and content of the 1.4 Mb disks for the PS/2
systems are exactly identical to those of the 1.2 Mb disks
(see above).
APPENDIX B: SOME STATISTICS
B.1 Amino acid composition
COMPOSITION IN PERCENT FOR THE COMPLETE DATA BANK
Ala (A) 7.75 Gln (Q) 4.10 Leu (L) 9.06 Ser (S) 7.01
Arg (R) 5.15 Glu (E) 6.17 Lys (K) 5.89 Thr (T) 5.84
Asn (N) 4.37 Gly (G) 7.32 Met (M) 2.29 Trp (W) 1.36
Asp (D) 5.19 His (H) 2.28 Phe (F) 3.94 Tyr (Y) 3.23
Cys (C) 1.90 Ile (I) 5.32 Pro (P) 5.17 Val (V) 6.51
Asx (B) 0.02 Glx (Z) 0.01 Xaa (X) 0.03
CLASSIFICATION OF THE AMINO ACIDS BY THEIR FREQUENCY
Leu, Ala, Gly, Ser, Val, Glu, Lys, Thr, Ile, Asp, Pro, Arg,
Asn, Gln, Phe, Tyr, Met, His, Cys, Trp
B.2 Repartition of the sequences by their organism of
origin
Total number of species represented in this release of the
data bank: 1345
Species represented 1x: 700
2x: 276
3x: 155
4x: 88
5x: 48
6x: 43
7x: 23
8x: 28
9x: 24
10x: 10
>10x: 83
TABLE OF THE MOST REPRESENTED SPECIES
795: HUMAN 142: DROME 69: SALTY 59: VACCV
703: ECOLI 141: CHICK 68: BPT4 54: BPT7
452: MOUSE 129: RABIT 67: MAIZE 52: MARPO
356: RAT 116: PIG 62: LAMBD 50: WHEAT
297: YEAST 85: XENLA TOBAC
264: BOVIN 81: BACSU 59: EPBAR
B.3 Repartition of the sequences by size
From To Number From To Number
1- 50 487 1001-1100 54
51- 100 1179 1101-1200 34
101- 150 1893 1201-1300 29
151- 200 938 1301-1400 21
201- 250 661 1401-1500 13
251- 300 554 1501-1600 4
301- 350 492 1601-1700 10
351- 400 483 1701-1800 8
401- 450 354 1801-1900 6
451- 500 387 1901-2000 3
501- 550 327 >2000 43
551- 600 176
601- 650 135 Currently the two largest sequences are:
651- 700 87 APB$HUMAN 4563 a.a.
701- 750 82 APOA$HUMAN 4548 a.a.
751- 800 61
801- 850 52
851- 900 68
901- 950 30
951-1000 31
