Swiss-Prot release 11.0
Published July 10, 1989
SWISS-PROT RELEASE 11.0 RELEASE NOTES
Date: July 10, 1989
Author: A. Bairoch
1. INTRODUCTION
1.1 Evolution
Release 11.0 of SWISS-PROT contains 10856 sequence entries,
comprising 3'265'966 amino acids abstracted from 10775
references. This represents an increase of 9% over release
10.0. The recent growth of the data bank is summarised
below:
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
1.2 Source of data
Release 11.0 has been updated using protein sequence data
from release 20.0 of the PIR (Protein Identification
Resource) protein data bank, as well as translation of
nucleotide sequence data from release 19.0 of the EMBL
nucleotide sequence Data Library.
As an indication to the source of the sequence data in the
SWISS-PROT data bank we list here the statistics concerning
the DR (Databank Reference) pointer lines:
Entries with pointer(s) to only PIR entri(es): 2992
Entries with pointer(s) to only EMBL entri(es): 3875
Entries with pointer(s) to both EMBL and PIR entri(es): 3272
Entries with no pointers lines (entered in house): 717
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE
RELEASE 10
2.1 Sequences and annotations
Some 848 new sequences have been added since the last
release, the sequence data of 113 existing entries has been
updated and the annotations of 1366 entries have been
revised. In particular we have used reviews articles to
update the annotations of the following groups or families
of proteins:
Adenylate kinases
Bacterial restriction systems proteins
Bacterial transduction systems proteins
Caseins
Chitin-binding proteins
Cutinases
Cytochromes P450
DNA polymerases
DNA topoisomerases type I
Esterases
Heat shock hsp70 proteins
Lipases
Microtubule-associated proteins
2-oxo acid dehydrogenases complex components
Paramyxoviruses proteins
Protein disulfide isomerases
Purine/pyrimidine phosphoribosyl transferases
Rhabdoviruses proteins
Ribonucleotide reductases
Rotaviruses proteins
Serine hydroxymethyltransferases
Small, acid-soluble spore proteins
Xylose isomerases
2.2 Standardized journal abbreviations
Journal names are now abbreviated according to the
conventions used by the National Library of Medicine
(Washington D.C., USA) and are based on the existing ISO
and ANSI standards. In most cases the changes are small,
and the new abbreviations are at least as meaningful as
the old ones. As in previous releases the abbreviations for
the journals cited in SWISS-PROT are listed in the document
file JOURLIST.TXT
2.3 New feature key
A new feature key has been introduced in this release:
THIOETH, which describes a thioether bond between two
residues.
3. THE NEXT RELEASE
SWISS-PROT release 12.0 will be available in November 1989.
4. WE NEED YOUR HELP !
We welcome any feedback from our users. We especially would
appreciate that you notify us if you find that sequences
belonging to your field of expertise are missing from the
data bank. We also would like to be notified about
annotations to be updated, as for example if the function
of a protein has been clarified or if new post-
translational information has become available.
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data
bank
Ala (A) 7.74 Gln (Q) 4.11 Leu (L) 9.08 Ser (S) 7.01
Arg (R) 5.22 Glu (E) 6.19 Lys (K) 5.83 Thr (T) 5.84
Asn (N) 4.38 Gly (G) 7.27 Met (M) 2.27 Trp (W) 1.34
Asp (D) 5.22 His (H) 2.29 Phe (F) 3.94 Tyr (Y) 3.23
Cys (C) 1.88 Ile (I) 5.31 Pro (P) 5.17 Val (V) 6.51
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03
A.1.2 Classification of the amino acids by their
frequency
Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Arg = Asp,
Pro, Asn, Gln, Phe, Tyr, His, Met, Cys, Trp
A.2 Repartition of the sequences by their organism of
origin
Total number of species represented in this release of the
data bank: 1687
Species represented 1x: 785
2x: 304
3x: 169
4x: 102
5x: 69
6x: 47
7x: 29
8x: 28
9x: 34
10x: 16
11- 20x: 51
21-100x: 42
>100x: 11
A.2.2 Table of the most represented species
Number Frequency Species
1 1003 Human
2 885 Escherichia coli
3 555 Mouse
4 458 Rat
5 354 Baker's yeast (Saccharomyces cerevisiae)
6 313 Bovine
7 185 Fruit fly (Drosophila melanogaster)
8 183 Chicken
9 151 Rabbit
10 131 Pig
11 102 African clawed frog (Xenopus laevis)
12 96 Bacillus subtilis
13 84 Salmonella typhimurium
14 83 Maize
15 79 Bacteriophage T4
16 70 Herpes virus (Type 1, Strain 17)
70 Tobacco
18 67 Varicella-Zoster virus (Strain Dumas)
19 62 Bacteriophage Lambda
62 Vaccinia Virus
62 Wheat
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 626 1001-1100 77
51- 100 1368 1101-1200 53
101- 150 2159 1201-1300 44
151- 200 1130 1301-1400 26
201- 250 876 1401-1500 17
251- 300 729 1501-1600 10
301- 350 643 1601-1700 14
351- 400 608 1701-1800 12
401- 450 468 1801-1900 8
451- 500 527 1901-2000 7
501- 550 401 >2000 54
551- 600 244
601- 650 188
651- 700 130
701- 750 108
751- 800 76
801- 850 77
851- 900 94
901- 950 43
951-1000 39
Currently the two largest sequences are:
APB$HUMAN 4563 a.a.
APOA$HUMAN 4548 a.a.
APPENDIX B: DISKS FOR SWISS-PROT
B.1 IBM PC/AT 1.2 Mb disks
SWISS-PROT is stored on fourteen 1.2 Mb disks. Each of
these disk contains a single bulk file (PRT11_01.BLK to
PRT11_14.BLK):
Disk First sequence Last Sequence
1 10KA$MYCTU B1AR$HUMAN
2 B1AR$MELGA COLI$SQUAC
3 COLI$STRCA DPOL$HPBVY
4 DPOL$HPBVZ GC2$HUMAN
5 GC3$HUMAN HEMA$INCMI
6 HEMA$INCP1 K1CS$BOVIN
7 K1CS$HUMAN MAP2$HUMAN
8 MAS1$YEAST ODB1$BOVIN
9 ODB2$HUMAN PRP2$MOUSE
10 PRP2$RAT RRPO$BPSP
11 RRPO$CARMV TKNG$RAT
12 TKNK$BOVIN VGLG$VSVJ
13 VGLG$VSVO YVL6$HCMVA
14 YWL1$HCMVA ZP3$MOUSE
B.2 IBM PS/2 1.4 Mb disks
The number and content of the 1.4 Mb disks for the PS/2
systems are exactly identical to those of the 1.2 Mb disks
(see above).
