Swiss-Prot release 31.0
Published February 1, 1995
SWISS-PROT RELEASE 31.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 31.0 of SWISS-PROT contains 43470 sequence entries, comprising
15'335'248 amino acids abstracted from 39750 references. This represents
an increase of 8.3% over release 30. The recent growth of the data bank
is summarized below.
Release Date Number of entries Nb of amino acids
2.0 09/86 3939 900 163
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
23.0 08/92 26706 9 011 391
24.0 12/92 28154 9 545 427
25.0 04/93 29955 10 214 020
26.0 07/93 31808 10 875 091
27.0 10/93 33329 11 484 420
28.0 02/94 36000 12 496 420
29.0 06/94 38303 13 464 008
30.0 10/94 40292 14 147 368
31.0 02/95 43470 15 335 248
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 30
2.1 Sequences and annotations
About 3243 sequences have been added since release 30, the sequence data
of 493 existing entries has been updated and the annotations of 6729
entries have been revised.
<PAGE>
2.2 What's happening with the model organisms
As we announced in the last four releases we have selected a number of
organisms that are the target of genome sequencing and/or mapping
projects and for which we intend to:
- Be as complete as possible. All sequences available at a given time
should be immediately included in SWISS-PROT. This also includes
sequence corrections and updates;
- Provide a higher level of annotation;
- Provide cross-references to specialized database(s) that contain,
among other data, some genetic information about the genes that code
for these proteins;
- Provide specific indices or documents.
What was done since the last release or in preparation for the next
release:
- We have added four species to the list of model organisms:
Salmonella typhimurium. We added all the missing publicly available
protein sequences from this species and we cross-referenced the
Salmonella sequences to the StyGene database from Ken Rudd (see
section 2.3.2). A new documentation file (SALTY.TXT) is available
(see section 2.4) that list all the Salmonella typhimurium entries
linked to StyGene.
Schizosaccharomyces pombe (Fission yeast). For this organism we are
looking for a genomic database to which the relevant SWISS-PROT
entries can be linked. In this release we have added a significant
number of S.pombe sequences. A new documentation file (POMBE.TXT)
is available (see section 2.4) that list all the
Schizosaccharomyces pombe entries and their gene designation(s).
Arabidopsis thaliana (Mouse-ear cress). As of this release we have
not yet cross-linked the entries originating from this organism to
a specific genomic database nor have we yet made a significant
effort to enter all the Arabidopsis entries in SWISS-PROT. Such an
effort is planned for the next release.
Sulfolobus solfataricus. This archebacteria is the target of a
genomic project carried out at the Dalhousie University in Halifax,
Canada. It is expected that data originating from this project will
be available very soon. In preparation for this development we have
already entered in SWISS-PROT all the available S.solfataricus
sequences that are publicly available.
- SWISS-PROT is now cross-referenced to the Bacillus subtilis 168
SubtiList database designed by Ivan Moszer of the Pasteur Institute
in Paris (see section 2.3.3). We now have in SWISS-PROT, 100% of the
publicly available protein sequences referenced in SubtiList. A new
documentation file (SUBTILIS.TXT) is available (see section 2.4) that
list all the Bacillus subtilis entries linked to SubtiList.
<PAGE>
- SWISS-PROT is now cross-referenced to the yeast LISTA database
designed by Patrick Linder of the University of Geneva (see section
2.3.1). We have worked together to insure that the gene name used in
LISTA always correspond to the first one (if more than one gene name
exist) listed in the relevant SWISS-PROT entry. All proteins
referenced in LISTA are now in SWISS-PROT. We also made a major
effort to integrate in this release all the new sequence data from
the complete sequences of chromosomes I, V, VIII and IX. New
documentations files are available for each of these chromosomes (see
section 2.4).
Here is the current status of the model organisms:
Organism Database Index file Number of
cross-referenced sequences
-------------- -------------------------- -------------- ---------
A.thaliana None yet In preparation 271
B.subtilis SubtiList SUBTILIS.TXT 1083
C.elegans WormPep CELEGANS.TXT 690
D.discoideum DictyDB DICTY.TXT 199
D.melanogaster FlyBase In preparation 711
E.coli EcoGene ECOLI.TXT 3151
H.sapiens MIM MIMTOSP.TXT 3067
S.cerevisiae LISTA YEAST.TXT 3144
S.typhimurium StyGene SALTY.TXT 568
S.pombe None yet POMBE.TXT 279
S.solfataricus None yet None yet 58
Other relevant information
We apologizes for running behind with new sequencing data from the C.
elegans genomic project. Starting with the next release, we will have
implemented a mechanism to insure that data from the genome project is
directly fed into the SWISS-PROT annotation 'pipeline'.
With the help of the group of Elizabeth Kutter from the Evergreen State
College, we have revised and added protein sequence entries from the
completed genome of phage T4. The next release will incorporate all of
the data from the complete genome of the Autographa californica nuclear
polyhedrosis virus.
2.3 Changes in the DR line
In this release, we have added cross-references from SWISS-PROT to five
additional databases. These cross-references are present in the DR
lines.
2.3.1 LISTA
The LISTA database of budding yeast (Saccharomyces cerevisiae) genes
coding for proteins prepared under the supervisation of Patrick Linder
<PAGE>
at the University of Geneva (See: Doelz R., Mosse M.-O., Slonimski P.P.,
Bairoch A., and Linder P.; Nucleic Acids Res. (1994), 22:3459-3461).
Data bank identifier: LISTA
Primary identifier : Unique identifier attributed by LISTA to the
gene coding for the protein
Secondary identifier: The gene designation (name)
Example : DR LISTA; SC00018; ACT1.
2.3.2 StyGene
The StyGene section of the StySeq/StyMap integrated Salmonella
typhimurium LT2 database, both prepared by Ken Rudd at the National
Center for Biotechnology Information (NCBI).
Data bank identifier: STYGENE
Primary identifier : Unique identifier attributed by StyGene to the
gene coding for the protein
Secondary identifier: The gene designation (name)
Example : DR STYGENE; SG10312; PROV.
2.3.3 SubtiList
The SubtiList relational database for the Bacillus subtilis 168 genome
prepared under the supervisation of Ivan Moszer at the Pasteur Institute
(See Moszer I., Glaser P., and Danchin A.; Microbiol. (1995), In press).
Data bank identifier: SUBTILIST
Primary identifier : Unique identifier attributed by SubtiList to
the gene coding for the protein
Secondary identifier: The gene designation (name)
Example : DR SUBTILIST; BG10774; OPPD.
2.3.4 HSSP
The database of Homology-derived Secondary Structure of Proteins (HSSP)
prepared under the supervisation of Chris Sander at the EMBL (See:
Sander C., and Schneider R.; Nucleic Acids Res. (1993), 21:3105-3109).
Data bank identifier: HSSP
Primary identifier : Accession number of a SWISS-PROT entry cross-
referenced to a PDB entry whose structure is
expected to be similar to that of the entry in
which the HSSP cross-reference is present
Secondary identifier: Entry name of the PDB structure related to that
of the entry in which the HSSP cross-reference
is present
Example : DR HSSP; P00438; 1DOB.
<PAGE>
2.3.5 Transfac
The transcription factor database (Transfac) developed by Edgar
Wingender and Rainer Knueppel from the Gesellschaft fuer
Biotechnologische Forschung mbH in Braunschweig.
Data bank identifier: TRANSFAC
Primary identifier : Unique identifier (accession number) of the
Tranfac entry
Secondary identifier: None; a dash '-' is stored in that field
Example : DR TRANSFAC; T00141; -.
2.4 Status of the documentation files
SWISS-PROT is distributed with a large number of documentation files.
Some of these files have been available for a long time (the user
manual, release notes, the various indices for authors, citations,
keywords, etc.), but many have been created recently and we are
continuously adding new files. The following table list all the
documents that are currently available or that will be added in the next
few months.
USERMAN .TXT User manual
RELNOTES.TXT Release notes
SHORTDES.TXT Short description of entries in SWISS-PROT
JOURLIST.TXT List of abbreviations for journals cited
KEYWLIST.TXT List of keywords in use
SPECLIST.TXT List of organism identification codes
EXPERTS .TXT List of on-line experts for PROSITE and SWISS-PROT
ACINDEX .TXT Accession number index
AUTINDEX.TXT Author index
CITINDEX.TXT Citation index
KEYINDEX.TXT Keyword index
SPEINDEX.TXT Species index
7TMRLIST.TXT List of 7-transmembrane G-linked receptors entries
CDLIST .TXT CD nomenclature for surface proteins of human leucocytes
CELEGANS.TXT Index of Caenorhabditis elegans entries and their
corresponding gene designations and WormPep cross-
references
DICTY .TXT Index of Dictyostelium discoideum entries and their
corresponding gene designations and DictyDB cross-
references
EC2DTOSP.TXT Index of Escherichia coli Gene-protein database entries
referenced in SWISS-PROT
ECOLI .TXT Index of Escherichia coli K12 chromosomal entries and
their corresponding EcoGene cross-reference
EMBLTOSP.TXT Index of EMBL Database entries referenced in SWISS-PROT
EXTRADOM.TXT Nomenclature of extracellular domains
<PAGE>
GLYCOSYL.TXT Index of glycosyl hydrolases classified by families on
the basis of sequence similarities [2]
HOXLIST .TXT Vertebrate homeobox proteins: nomenclature and index
HUMCHR21.TXT Index of protein sequence entries encoded on human
chromosome 21 [1]
HUMCHRY .TXT Index of protein sequence entries encoded on human
chromosome Y [1]
MIMTOSP .TXT Index of MIM entries referenced in SWISS-PROT
NOMLIST .TXT List of nomenclature related references for proteins
PDBTOSP .TXT Index of Brookhaven PDB entries referenced in SWISS-PROT
PLASTID .TXT List of chloroplast and cyanelle encoded proteins
POMBE .TXT Index of Schizosaccharomyces pombe entries in SWISS-PROT
and their corresponding gene designations [1]
RESTRIC .TXT List of restriction enzymes and methylases entries
RIBOSOMP.TXT Index of ribosomal proteins classified by families on the
basis of sequence similarities [2]
SALTY .TXT Index of Salmonella typhimurium LT2 chromosomal entries
and their corresponding StyGene cross-references [1]
SUBTILIS.TXT Index of Bacillus subtilis 168 chromosomal entries and
their corresponding SubtiList cross-references [1]
YEAST .TXT Index of Saccharomyces cerevisiae entries and their
corresponding gene designations
YEAST1 .TXT Yeast Chromosome I entries [1]
YEAST2 .TXT Yeast Chromosome II entries
YEAST3 .TXT Yeast Chromosome III entries
YEAST5 .TXT Yeast Chromosome V entries [1]
YEAST8 .TXT Yeast Chromosome VIII entries [1]
YEAST9 .TXT Yeast Chromosome IX entries [1]
YEAST11 .TXT Yeast Chromosome XI entries
Notes:
[1] New in release 31.
[2] Will be available starting with release 32 in June 1995.
2.5 The Expasy World-Wide Web server
2.5.1 Background information
The World-Wide Web (WWW), which originated at CERN, is a powerful global
information system merging networked information retrieval and
hypertext. It gives access, using hypertext links, to the documents and
information contained in all the existing WWW servers around the world,
as well as to the data obtainable through other information retrieval
systems like WAIS, Gopher, X500, etc. To access a WWW server, one has to
run on a local computer a client program (a WWW browser), which displays
hypertext documents. The user can then either request a keyword search
or jump to another document by following a hypertext link. WWW has the
outstanding advantage of extending the hypertext model to the whole
world (by allowing hypertext jumps to documents anywhere on the internet
network) and by being device and user-interface independent (browsers
exist for a variety of computers and user-interfaces, including Unix
<PAGE>
workstations running XWindows, MacIntoshes and PCs with Microsoft
Windows).
The ExPASy WWW server allows access, using the user-friendly hypertext
model, to the SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and SWISS-
3DIMAGE databases and, through any SWISS-PROT protein sequence entry, to
other databases such as EMBL, REBASE, FlyBase, GCRDb, MaizeDB,
SubtiList, OMIM, PDB, HSSP, YEPD and Medline. Using a browser which is
able to display images one can also remotely access 2D gels image data
from SWISS-2DPAGE.
A WWW server can be accessed on the internet through its Uniform
Resource Locator (URL), the addressing system defined by the WWW model.
The URL for the ExPASy WWW server is:
http://expasy.hcuge.ch/
or
http://129.195.254.61/
To access a WWW server, you need to run a browser (or client) program on
your local computer. Browsers exist for a variety of machines and may be
obtained by anonymous ftp. ExPASy can be used with any WWW browser, but
we recommend either NCSA Mosaic or Netscape Navigator. Both are very
flexible and powerful browsers with a graphical user interface; they are
available for Unix boxes using X11/Motif; for Apple McIntoshes and for
Microsoft Windows. You can get them from various FTP sites, for example:
ftp.ncsa.uiuc.edu (for Mosaic)
ftp1.netscape.com (for Netscape)
For more information on the ExPASy WWW server, you can read the
following article:
Appel R.D., Bairoch A., Hochstrasser D.F.
A new generation of information retrieval tools for biologists: the
example of the ExPASy WWW server.
Trends Biochem. Sci. 19:258-260(1994).
Or you can contact Dr. Ron Appel:
Email: appel@cih.hcuge.ch
Fax: +41-22-372 61 98
2.5.2 SWISS-SHOP
Thanks to the work of Manuel Peitsch from the Geneva Glaxo Institute for
Molecular Biology, we can provide, on ExPASy, an important new service
called SWISS-SHOP. SWISS-Shop allows any users of SWISS-PROT to indicate
what proteins he/she is interested in. This can be done using various
criteria that can be combined:
- By entering one or more words that should be present in the
description line;
<PAGE>
- By entering one or more species name(s) or taxonomic division(s);
- By entering one or more keywords;
- By entering one or more author names;
- By entering the accession number (or entry name) of a PROSITE pattern
or a user-defined sequence pattern;
- By entering the accession number (or entry name) of an existing
SWISS-PROT entry or by entering a "private" sequence.
Every week, the new sequences entered in SWISS-PROT are automatically
compared with all the criteria that have been defined by the users. If a
sequence corresponds to the selection criteria defined by a user, that
sequence is sent by electronic mail.
2.5.3 What else is new on ExPASy
Since the last release, there has been a number of new developments on
the ExPASy WWW server. Here are some highlights of these changes:
- Access to the ENZYME data bank has been fully implemented in ExPASy.
Many different access options are allowed. Hypertext links between
ENZYME and SWISS-PROT, PROSITE, MIM and the Japanese Ligand database
are available.
- WWW links have been implemented between SWISS-PROT and HSSP,
SubtiList, and YEPD.
- Cross-references from SWISS-PROT to Flybase now use the links to the
new server for that database (at Harvard).
- The page giving access to the SWISS-PROT documents has been
completely updated. A new page allows access to all of the old
release notes.
2.6 Important forthcoming change
In the next release, the RM (Reference Medline) line will be replaced by
a more 'generic' line called RX (Reference cross-references). The format
of that line will be:
RX bibliographic_database_name; identifier.
As of the next release, the only "bibliographic_database_name" that will
be used will be "MEDLINE" and the associated "identifier" is the eight
digit Medline Unique Identifier (UID). But it is 'rumored' that
additional bibliographic databases are interested to be linked to the
sequence databases.
Example:
RM 91002678
<PAGE>
will be changed to:
RX MEDLINE; 91002678.
2.7 Weekly updates of SWISS-PROT
Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
are updated at each update:
new_seq.dat Contains all the new entries since the last full release.
upd_seq.dat Contains the entries for which the sequence data has been
updated since the last release.
upd_ann.dat Contains the entries for which one or more annotation
fields have been updated since
the last release.
Currently these files are available on the following anonymous ftp
servers:
Organization ExPASy (Geneva University Expert Protein Analysis System)
Address expasy.hcuge.ch (or 129.195.254.61)
Directory /databases/swiss-prot/updates
Organization National Center for Biotechnology Information (NCBI)
Address ncbi.nlm.nih.gov (or 130.14.20.1)
Directory /repository/swiss-prot/updates
Organization European Bioinformatics Institute (EBI)
Address ftp.ebi.ac.uk (or 193.62.196.6)
Directory /pub/databases/swissprot/new
!! Important notes !!!
Although we try to follow a regular schedule, we do not promise to
update these files every week. In some cases two weeks will elapse in-
between two updates.
Due to the current mechanism used to build a release the entries that
are provided in these updates are not guaranteed to be error free. Also,
for the same reason, new entries do not contain an OC (Organism
Classification) line.
3. ENZYME AND PROSITE
3.1 The ENZYME data bank
Release 18.0 of the ENZYME data bank is distributed with release 31 of
SWISS-PROT. ENZYME release 18.0 contains information relative to 3546
enzymes.
<PAGE>
In this release we have added cross-references from the ENZYME data bank
to the PROSITE data bank document entries that mention specific types of
enzymes. These lines are present in a new line type (PR) whose format
is:
PR PROSITE; PSITE_DOC_AC_NB
where 'PSITE_DOC_AC_NB' is a PROSITE document entry accession number.
Example:
PR PROSITE; PDOC00065;
3.2 The PROSITE data bank
Release 12.2 of the PROSITE data bank is distributed with release 30 of
SWISS-PROT. Release 12.2 contains 785 documentation chapters that
describes 1029 different patterns, rules and profiles/matrices. Release
12.2 does not really represent a new release; the only changes between
releases 12.1 and 12.2 are updating of the pointers to the SWISS-PROT
entries whose name have been modified between releases 30 and 31. The
next release of PROSITE (13.0) will be distributed with release 32 of
SWISS-PROT.
WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, if, for example, the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.57 Gln (Q) 4.02 Leu (L) 9.24 Ser (S) 7.19
Arg (R) 5.21 Glu (E) 6.29 Lys (K) 5.88 Thr (T) 5.80
Asn (N) 4.50 Gly (G) 6.91 Met (M) 2.36 Trp (W) 1.28
Asp (D) 5.30 His (H) 2.24 Phe (F) 4.02 Tyr (Y) 3.21
Cys (C) 1.73 Ile (I) 5.64 Pro (P) 4.98 Val (V) 6.51
Asx (B) 0.002 Glx (Z) 0.002 Xaa (X) 0.02
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Ser, Gly, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 4714
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 2139
2x: 756
3x: 426
4x: 264
5x: 191
6x: 191
7x: 117
8x: 80
9x: 102
10x: 46
11- 20x: 185
21- 50x: 128
51-100x: 43
>100x: 45
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 3151 Escherichia coli
2 3144 Baker's yeast (Saccharomyces cerevisiae)
3 3067 Human
4 1843 Mouse
5 1683 Rat
6 1083 Bacillus subtilis
7 735 Bovine
8 711 Fruit fly (Drosophila melanogaster)
9 690 Caenorhabditis elegans
10 573 Chicken
11 568 Salmonella typhimurium
12 449 African clawed frog (Xenopus laevis)
13 403 Rabbit
14 357 Pig
15 279 Fission yeast (Schizosaccharomyces pombe)
16 275 Bacteriophage T4
17 271 Arabidopsis thaliana (Mouse-ear cress)
18 258 Maize
19 251 Vaccinia virus (strain Copenhagen)
20 219 Rice
21 214 Pseudomonas aeruginosa
22 199 Slime mold (Dictyostelium discoideum)
23 195 Tobacco
24 193 Human cytomegalovirus (strain AD169)
25 183 Vaccinia virus (strain WR)
26 180 Pea
27 173 Wheat
28 168 Barley
29 154 Staphylococcus aureus
30 153 Dog
31 151 Marchantia polymorpha (Liverwort)
151 Neurospora crassa
33 147 Soybean
34 146 Variola virus
35 144 Pseudomonas putida
144 Rhodobacter capsulatus
37 141 Sheep
38 135 Spinach
39 131 Klebsiella pneumoniae
40 120 Bacillus stearothermophilus
41 117 Tomato
42 116 Agrobacterium tumefaciens
43 111 Potato
44 107 Rhizobium meliloti
45 102 Lactococcus lactis (subsp. lactis)
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 2381 1001-1100 404
51- 100 4198 1101-1200 292
101- 150 5781 1201-1300 214
151- 200 4200 1301-1400 140
201- 250 3732 1401-1500 127
251- 300 3303 1501-1600 70
301- 350 3099 1601-1700 57
351- 400 3136 1701-1800 52
401- 450 2365 1801-1900 58
451- 500 2477 1901-2000 38
501- 550 1775 2001-2100 21
551- 600 1247 2101-2200 51
601- 650 898 2201-2300 55
651- 700 682 2301-2400 22
701- 750 629 2401-2500 26
751- 800 497 >2500 132
801- 850 375
851- 900 405
901- 950 281
951-1000 250
A.4 Longest sequences
The longest sequences (>=4000 residues) are listed here:
HTS1_COCCA 5217
FAT_DROME 5147
RYNR_RABIT 5037
RYNR_HUMAN 5032
RYNC_RABIT 4969
DYHC_DICDI 4725
DYHC_RAT 4644
DYHC_DROME 4639
APB_HUMAN 4563
APOA_HUMAN 4548
RRPA_CVMJH 4488
DYHC_TRIGR 4466
DYHC_ANTCR 4466
GRSB_BACBR 4451
PKSK_BACSU 4447
PKSL_BACSU 4427
PLEC_RAT 4140
DYHC_YEAST 4092
RRPA_CVH22 4085
<PAGE>
A.5 List of the most cited journals in SWISS-PROT
Citations Journal abbreviation
4517 J. BIOL. CHEM.
3097 NUCLEIC ACIDS RES.
2886 PROC. NATL. ACAD. SCI. U.S.A.
1869 J. BACTERIOL.
1571 FEBS LETT.
1543 GENE
1454 EUR. J. BIOCHEM.
1313 EMBO J.
1263 NATURE
1215 BIOCHEM. BIOPHYS. RES. COMMUN.
1183 BIOCHEMISTRY
933 J. MOL. BIOL.
931 BIOCHIM. BIOPHYS. ACTA
914 CELL
860 MOL. CELL. BIOL.
738 MOL. GEN. GENET.
689 VIROLOGY
650 BIOCHEM. J.
647 PLANT MOL. BIOL.
566 SCIENCE
539 J. BIOCHEM.
509 MOL. MICROBIOL.
443 J. VIROL.
392 J. GEN. VIROL.
284 J. CELL BIOL.
264 GENOMICS
253 GENES DEV.
249 BIOL. CHEM. HOPPE-SEYLER
229 CURR. GENET.
218 ARCH. BIOCHEM. BIOPHYS.
214 YEAST
213 HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
212 J. IMMUNOL.
190 MOL. BIOCHEM. PARASITOL.
184 J. GEN. MICROBIOL.
173 MOL. ENDOCRINOL.
164 INFECT. IMMUN.
156 J. CLIN. INVEST.
149 ONCOGENE
143 PLANT PHYSIOL.
142 DNA
137 FEMS MICROBIOL. LETT.
132 HUM. MOL. GENET.
129 J. EXP. MED.
120 AM. J. HUM. GENET.
117 J. MOL. EVOL.
111 GENETICS
102 BLOOD
101 AGRIC. BIOL. CHEM.
<PAGE>
APPENDIX B: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
***********************
****************** * EMBL Nucleotide * **********************
* EPD [Euk.Prom] * <---> * Sequence Data * <---- * ECD [E. coli map] *
****************** * Library [EBI] * **********************
***********************
^ ^ ^ ^ ^ ^ ^
****************** | | | I | | |
* FlyBase * <------+ | | I | | | **********************
****************** | | | I | | +---------> * GCRDb [7TM recep.] *
| | | I | | | **********************
****************** | | | I | | |
* SubtiList * <---------+ | I | | | **********************
****************** | | | I | +------------> * EcoGene [E.coli] *
| | | I | | | **********************
****************** | | | I | | |
* MaizeDb * <-----------+ I | | | **********************
****************** | | | I +---------------> * LISTA (Yeast) *
| | | I | | | **********************
****************** | | | I | | |
* WormPep * | | | I | | | **********************
* [C.elegans] * <----+ | | | I | | | +------> * DictyDB [D.disco.] *
****************** | | | | I | | | | **********************
| | | | I | | | |
****************** | v v v v v v v v **********************
* REBASE * *********************** * ENZYME [Nomencl.] *
* [Restriction * <--- * SWISS-PROT * <----- **********************
* enzymes] * * Protein Sequence * |
****************** * Data Bank * v
*********************** **********************
****************** ^ ^ ^ ^ ^ ^ | ^ ^ | * OMIM [Diseases] *
* StyGene * | | | | | | | | | +--------> **********************
* [S.Typhimurium]* <----+ | | | | | | | |
****************** | | | | | | | | **********************
| | | | | | | +----------> * ECO2DBASE [2D] *
****************** | | | | | | | **********************
* Transfac * <------+ | | | | | |
****************** | | | | | | **********************
| | | | | +------------> * SWISS-2DPAGE [2D] *
****************** | | | | | **********************
* PROSITE * <--------+ | | | |
* [Patterns] * | | | | **********************
****************** | | | +--------------> * Aarhus/Ghent [2D] *
| | | | **********************
| | | |
| | | +----------------> **********************
| | | * YEPD (Yeast) [2D] *
| | +-----------------+ **********************
| v |
| *********************** +-> **********************
+--------> * PDB [3D structures] * <----- * HSSP (3D simil.) *
*********************** **********************
<PAGE>
