Swiss-Prot release 23.0
Published August 1, 1992
SWISS-PROT RELEASE 23.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 23.0 of SWISS-PROT contains 26706 sequence entries, comprising
9'011'391 amino acids abstracted from 26485 references. This represents
an increase of 7.6% over release 22. The recent growth of the data bank
is summarized below.
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
23.0 08/92 26706 9 011 391
1.2 Source of data
Release 23.0 has been updated using protein sequence data from release
33.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 31.0 of the
EMBL Nucleotide Sequence Database.
<PAGE>
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Database cross-
references) pointer lines:
Entries with pointer(s) to only PIR entri(es): 4368
Entries with pointer(s) to only EMBL entri(es): 3365
Entries with pointer(s) to both EMBL and PIR entri(es): 18444
Entries with no pointers lines: 529
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 22
2.1 Sequences and annotations
About 1680 sequences have been added since release 22, the sequence data
of 235 existing entries has been updated and the annotations of 3400
entries have been revised. In particular we have used reviews articles
to update the annotations of the following groups or families of
proteins:
- AP endonucleases
- Bacterial regulatory proteins, lacI family
- Electron transfer flavoprotein alpha-subunit
- Enterobacterial virulence outer membrane protein
- Formate--tetrahydrofolate ligase
- Germin family
- Guanine-nucleotide releasing factors CDC25 family
- Lipoxygenases
- Prokaryotic ornithine and lysine decarboxylases
- Prokaryotic-type carbonic anhydrases
- Riboflavin synthase alpha chain family
- Ribosomal proteins
- Sigma-54 factors family
- Sigma-70 factors family
- Single strand binding protein family
- Stress-induced proteins SRP1/TIP1 family
- TNF family
3.0 CHANGES PLANNED FOR FUTURE RELEASES
3.1 Change in the RA line concerning the author names format
As from release 25 in March 1993 we will change the format of author
names on RA lines to conform to that used by major bibliographic
databases such as Medline. The main change is that the periods and
hyphens ("-") which currently appear within initials will not appear any
more. For example, the current:
RA Wilson A.C., Smith J.-C.;
<PAGE>
will appear as:
RA Wilson AC, Smith JC;
3.2 Weekly update of SWISS-PROT
Starting with release 24 in November 1992 we will provide weekly update
of SWISS-PROT. Instructions on how to access the update files will be
given at the next release.
4. ENZYME AND PROSITE
4.1 The ENZYME data bank
Release 10.0 of the ENZYME data bank is distributed along with release
23 of SWISS-PROT. ENZYME release 10.0 contains information relative to
3183 enzymes. The data bank will probably be significantly modified at
the next release due to the publication of a new edition of the IUPAC-
IUB Enzyme Nomenclature book which describes many new enzymes and update
the information concerning existing ones.
4.2 The PROSITE data bank
Release 9.10 of the PROSITE data bank is distributed along with release
23 of SWISS-PROT. Release 9.10 contains 580 documentation chapters that
describes 689 different patterns. Release 9.10 does not really represent
a new release; the only changes between release 9.0 and 9.10 are
updating of the pointers to the SWISS-PROT entries whose name have been
modified between release 22 and 23. The next release of PROSITE (10.0)
will be distributed with release 24 of SWISS-PROT.
5. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, as for example if the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.66 Gln (Q) 4.06 Leu (L) 9.15 Ser (S) 7.07
Arg (R) 5.24 Glu (E) 6.25 Lys (K) 5.82 Thr (T) 5.84
Asn (N) 4.45 Gly (G) 7.10 Met (M) 2.34 Trp (W) 1.31
Asp (D) 5.25 His (H) 2.26 Phe (F) 3.97 Tyr (Y) 3.21
Cys (C) 1.80 Ile (I) 5.50 Pro (P) 5.06 Val (V) 6.50
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 3497
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 1537
2x: 612
3x: 345
4x: 222
5x: 148
6x: 117
7x: 76
8x: 60
9x: 71
10x: 30
11- 20x: 144
21- 50x: 78
51-100x: 24
>100x: 33
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 2018 Human
2 1918 Escherichia coli
3 1220 Mouse
4 1154 Rat
5 1053 Baker's yeast (Saccharomyces cerevisiae)
6 556 Bovine
7 485 Fruit fly (Drosophila melanogaster)
8 428 Chicken
9 402 Bacillus subtilis
10 311 Salmonella typhimurium
11 310 African clawed frog (Xenopus laevis)
12 297 Rabbit
13 273 Pig
14 251 Vaccinia virus (strain Copenhagen)
15 197 Maize
16 193 Human cytomegalovirus (strain AD169)
17 168 Bacteriophage T4
18 159 Vaccinia virus (strain WR)
19 153 Rice
20 140 Tobacco
21 138 Wheat
22 128 Pea
23 120 Barley
24 119 Pseudomonas aeruginosa
119 Staphylococcus aureus
26 117 Marchantia polymorpha (liverwort)
27 116 Arabidopsis thaliana (Mouse-ear cress)
28 111 Slime mold (Dictyostelium discoideum)
29 110 Fission yeast (Schizosaccharomyces pombe)
30 106 Soybean
31 104 Caenorhabditis elegans
104 Sheep
104 Spinach
34 100 Klebsiella pneumoniae
100 Pseudomonas putida
100 Dog
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 1644 1001-1100 258
51- 100 2839 1101-1200 147
101- 150 4010 1201-1300 129
151- 200 2576 1301-1400 79
201- 250 2168 1401-1500 64
251- 300 1987 1501-1600 37
301- 350 1804 1601-1700 32
351- 400 1773 1701-1800 32
401- 450 1340 1801-1900 35
451- 500 1490 1901-2000 27
501- 550 1053 2001-2100 10
551- 600 742 2101-2200 32
601- 650 512 2201-2300 39
651- 700 378 2301-2400 13
701- 750 367 2401-2500 14
751- 800 291 >2500 73
801- 850 216
851- 900 220
901- 950 140
951-1000 135
Currently the ten largest sequences are:
RYNR_RABIT 5037 a.a.
RYNR_HUMAN 5032 a.a.
APB_HUMAN 4563 a.a.
APOA_HUMAN 4548 a.a.
DYHC_TRIGR 4466 a.a.
POLG_BVDV 3988 a.a.
VGF1_IBVB 3951 a.a.
POLG_HCVA 3898 a.a.
POLG_HCVB 3898 a.a.
ACVT_PENCH 3791 a.a.
<PAGE>
APPENDIX B: ON-LINE EXPERTS
B.1 List of on-line experts for PROSITE and SWISS-PROT
Field of expertise Name Email address
--------------------------- ------------------ ----------------------------
Alcohol dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt@medfys.ki.se
Aldehyde dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt@medfys.ki.se
Alpha-crystallins/HSP-20 Leunissen J.A.M. jackl@caos.caos.kun.nl
de Jong W. u629000@hnykun11.bitnet
Alpha-2-macroglobulins Van Leuven F. fred@blekul13.bitnet
AA-tRNA synthetases class II Leberman R. leberman@frembl51.bitnet
Apolipoproteins Boguski M.S. boguski@ncbi.nlm.nih.gov
Arrestins Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Band 4.1 family proteins Rees J. jrees@vax.oxford.ac.uk
Beta-lactamases Brannigan J. jab5@vaxa.york.ac.uk
Beta-transducin family Boguski M.S. boguski@ncbi.nlm.nih.gov
Chalcone/stilbene synthases Schroeder J. raf@sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60 Georgopoulos C. georgopo@cmu.unige.ch
Chaperonins TCP1 family Willison K.R. willison@icrf.ac.uk
Chitinases Henrissat B. cermav@frgren81.bitnet
Clusterin Peitsch M.C. peitsch@ulbio1.unil.ch
CTF/NF-I Mermod N. nmermod@ulys.unil.ch
Cytochromes P450 Holsztynska E.J. ela@netcom.uucp
netcom!ela@apple.com
DEAD-box helicases Linder P. linder@urz.unibas.ch
dnaJ family Kelley W. kelley@cmu.unige.ch
EF-hand calcium-binding Cox J.A. cox@sc2a.unige.ch
Kretsinger R.H. rhk5i@virginia.bitnet
Enoyl-CoA hydratase Hofmann K.O. khofmann@cipvax.biolan.uni-koeln.de
fruR/lacI family HTH proteins Reizer J. jreizer@ucsd.edu
GATA-type zinc-fingers Boguski M.S. boguski@ncbi.nlm.nih.gov
Glucanases Henrissat B. cermav@frgren81.bitnet
Beguin P. phycel@pasteur.bitnet
G-protein coupled receptors Chollet A. chollet@clients.switch.ch
Attwood T.K. bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins Boguski M.S. boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17 Landsman D. landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Integrases Roy P.H. 2020000@saphir.ulaval.ca
Lipocalins Boguski M.S. boguski@ncbi.nlm.nih.gov
Peitsch M.C. peitsch@ulbio1.unil.ch
lysR family HTH proteins Henikoff S. henikoff@sparky.fhcrc.org
MAC components / perforin Peitsch M.C. peitsch@ulbio1.unil.ch
Malic enzymes Glynias M. mglynias@ncsa.uiuc.edu
Myelin proteolipid protein Hofmann K.O. khofmann@cipvax.biolan.uni-koeln.de
<PAGE>
PEP requiring enzymes Reizer J. jreizer@ucsd.edu
pfkB carbohydrate kinases Reizer J. jreizer@ucsd.edu
Phytochromes Partis M.D. partis@gcri.afrc.ac.uk
Protein kinases Hanks S. hanks@vuctrvax.bitnet
Hunter T. hunter@salk.bitnet
PTS proteins Reizer J. jreizer@ucsd.edu
Restriction-modification Bickle T. bickle@urz.unibas.ch
enzymes Roberts R.J. roberts@neb.com
Ribosomal protein S3 Hallick R. hallick%biotec@arizona.edu
Ribosomal protein S15 Ellis S.R. srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases Harayama S. harayama@cmu.unige.ch
Sodium symporters Reizer J. jreizer@ucsd.edu
Subtilases Brannigan J. jab5@vaxa.york.ac.uk
Thiol proteases Turk B. turk@ijs.ac.mail.yu
Thiol proteases inhibitors Turk B. turk@ijs.ac.mail.yu
TNF family Jongeneel C.V. vjongene@isrec.arcom.ch
TPR repeats Boguski M.S. boguski@ncbi.nlm.nih.gov
Transit peptides von Heijne G. gunnar@cbts.sunet.se
Type-II membrane antigens Levy S. levy@cellbio.stanford.edu
Uracil-DNA glycosylase Aasland R. aasland@bio.uib.no
Xylose isomerase Jenkins J. jenkins@frira.afrc.ac.uk
WAP-type domain Claverie J.-M. jmc@ncbi.nlm.nih.gov
ZP domain Bork P. bork@embl-heidelberg.de
African swine fever virus Yanez R.J. ryanez@cbm2.uam.es
Bacteriophage P4 Halling C. chh9@midway.uchicago.edu
Drosophila Ashburner M. ma11@phx.cam.ac.uk
Escherichia coli Rudd K. rudd@ncbi.nlm.nih.gov
Salmonella typhimurium Rudd K. rudd@ncbi.nlm.nih.gov
Snakes Stocklin R. stocklin@cmu.unige.ch
Yeast chromosome I Ouellette F. francis@monod.biol.mcgill.ca
B.2 Requirements to fulfill to become an on-line expert
An expert should be a scientist working with specific famili(es) of
proteins (or specific domains) and which would:
a) Review the protein sequences in SWISS-PROT and the patterns/matrices
in PROSITE relevant to their field of research.
b) Agree to be contacted by people that have obtained new sequence(s)
which seem to belong to "their" familie(s) of proteins.
c) Have access to electronic mail and be willing to use it to send and
receive data.
If you are willing to be part of this scheme please contact Amos Bairoch
at one of the following electronic mail addresses:
bairoch@cmu.unige.ch
bairoch@cgecmu51.bitnet
<PAGE>
APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
**********************
*********************** <----- * EPD [Euk. Promot.] *
* EMBL Nucleotide * -----> **********************
* Sequence Data *
***************** ----> * Library * **********************
* FLYBASE * <---- *********************** <----- * ECD [E. coli map] *
* [Drosophila * ^ | ^ **********************
* genetic maps] * --------+ | | |
***************** <-----+ | | | +--------- **********************
| | | | +--------- * TFD [Trans. fact.] *
| | | | | +------> **********************
| | | | | |
***************** | v | v v | **********************
* REBASE * *********************** * ENZYME [Nomencl.] *
* [Restriction * <---- * SWISS-PROT * <----- **********************
* enzymes] * * Protein Sequence * |
***************** * Data Bank * v
*********************** **********************
***************** | ^ | ^ | ^ | | * OMIM [Diseases] *
* PROSITE * <-------+ | | | | | | +--------> **********************
* [Patterns] * ----------+ | | | | |
***************** | | | | +-----------> **********************
| | | | +-------------- * E. coli 2D gels *
| | | | **********************
| | | |
| | | +----------------> **********************
| | +------------------- * EcoGene/EcoSeq *
| v **********************
| ***********************
+--------> * PDB [3D structures] *
***********************
<PAGE>
