Swiss-Prot release 20.0
Published November 1, 1991
SWISS-PROT RELEASE 20.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 20.0 of SWISS-PROT contains 22654 sequence entries, comprising
7'500'130 amino acids abstracted from 22830 references. This represents
an increase of 5% over release 19. The recent growth of the data bank is
summarized below.
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
1.2 Source of data
Release 20.0 has been updated using protein sequence data from release
29.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 28.0 of the
EMBL Nucleotide Sequence Database.
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Database cross-
references) pointer lines:
Entries with pointer(s) to only PIR entri(es): 4129
Entries with pointer(s) to only EMBL entri(es): 2970
Entries with pointer(s) to both EMBL and PIR entri(es): 15061
Entries with no pointers lines: 494
<PAGE>
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 19
2.1 Sequences and annotations
About 890 sequences have been added since release 19, the sequence data
of 187 existing entries has been updated and the annotations of 3030
entries have been revised. In particular we have used reviews articles
to update the annotations of the following groups or families of
proteins:
- Aminotransferases class-II and class-III
- Aromatic amino acids permeases
- Avidin/Streptavidin
- Bacterial porins
- Bacterial regulatory proteins, asnC family
- Bacterial regulatory proteins, crp family
- Bacterial ring hydroxylating dioxygenases
- Beta-ketoacyl synthases
- C3HC4-class zinc finger proteins
- Chalcone and resveratrol synthases
- Chromo domain proteins
- DEAD-box family ATP-dependent helicases
- Flagella basal body rod proteins
- GATA transcription factors
- Glutamate / Leucine / Phenylalanine dehydrogenases
- Glycosyl hydrolase family 1
- Glycosyl hydrolase family 9
- Glycosyl hydrolase family 10
- Glycosyl hydrolase family 17
- Heme oxygenase
- High potential iron-sulfur proteins
- HIV envelope proteins
- Initiation factor 5a (eIF-5a)
- Interferon regulatory factors
- Myelin P0 protein
- Myelin proteolipid protein
- PEP-utilizing enzymes
- pfkB family prokaryotic carbohydrate kinases
- Plant lipid transfer proteins
- PTS Hpr component
- Ribosomal proteins
- Saposins
- Serine/threonine protein kinases
<PAGE>
2.2 Changes in the feature table
2.2.1 The LIPID key
Until this release SWISS-PROT was very inconsistent in the use of the
feature table to annotate the covalent binding of lipids (fatty acids,
prenyl groups, or glycolipids) to a specific position in a protein
sequence. The attachment of a myristate group was indicated by the
`MYRISTYL' key, the attachment of a palmitate group was indicated using
either the `MOD_RES' or the `BINDING' keys, prenylation and GPI-anchor
were indicated using the `BINDING' key.
To correct these inconsistencies we have introduced in this release a
new key `LIPID' and we have deleted the `MYRISTYL' key.
Definition of the new key:
LIPID - Covalent binding of a lipidic moiety
The chemical nature of the bound lipid moiety is given in the
description. The general format of the LIPID description field is:
FT LIPID xxx xxx MODIFICATION (COMMENT).
The modifications which are currently defined are the following:
MYRISTATE Myristate group attached through an amide bond to the
N-terminal glycine residue of the mature form of a
protein [1,2].
PALMITATE Palmitate group attached through a thioether bond to
a cysteine residue or through an ester bond to a
serine or threonine residue [1,2].
FARNESYL Farnesyl group attached through a thioether bond to a
cysteine residue [3].
GERANYL-GERANYL Geranyl-geranyl group attached through a thioether
bond to a cysteine residue [3].
GPI-ANCHOR Glycosyl-phosphatidylinositol (GPI) group linked to
the alpha-carboxyl group of the C-terminal residue of
the mature form of a protein [4,5].
N-ACYL DIGLYCERIDE N-terminal cysteine of the mature form of a
prokaryotic lipoprotein with an amide-linked fatty
acid and a glyceryl group to which two fatty acids
are linked by ester linkages [6].
[1] Grand R.J.A.
Biochem. J. 258:626-638(1989).
[2] McLhinney R.A.J.
Trends Biochem. Sci. 15:387-391(1990).
<PAGE>
[3] Glomset J.A., Gelb M.H., Farnsworth C.C.
Trends Biochem. Sci. 15:139-142(1990).
[4] Low M.G.
FASEB J. 3:1600-1608(1989).
[5] Low M.G.
Biochimica Biophysica Acta 988:427-454(1989).
[6] Hayashi S., Wu H.C.
J. Bioenerg. Biomembr. 22:451-471(1990).
Examples of LIPID key feature lines:
FT LIPID 1 1 MYRISTATE.
FT LIPID 65 65 PALMITATE (BY SIMILARITY).
FT LIPID 354 354 GPI-ANCHOR.
2.2.2 The VARSPLIC key
There are some genes which, by the mechanism of alternative splicing,
encode closely related proteins that differs only by the presence or
absence of one or more domains. Generally a single sequence entry
represents the longest form of the protein and the feature table is used
to indicate the regions which differ in alternative spliced forms. The
`VARIANT' key was used for such purpose. For example (from entry P04085;
PGDA$HUMAN):
FT VARIANT 194 196 GRP -> DVR (IN SHORT FORM).
FT VARIANT 197 211 MISSING (IN SHORT FORM).
It was very difficult to write software tools that could distinguish
between the above usage of the `VARIANT' key and the more classical use
to describe polymorphisms or natural mutations. We have therefore
introduced in this release a new key `VARSPLIC' which is used
specifically to describe splicing variants.
The example shown above is know represented by:
FT VARSPLIC 194 196 GRP -> DVR (IN SHORT FORM).
FT VARSPLIC 197 211 MISSING (IN SHORT FORM).
2.3 Changes in the cross-references lines (DR)
2.3.1 Cross-references to EcoGene
Starting with this release we have added cross-references to the EcoGene
section of the EcoSeq/EcoMap integrated Escherichia coli database
prepared by Ken Rudd at the National Center for Biotechnology
Information (NCBI) (for a description see: Rudd K.E., Miller W., Werner
C., Ostell J., Tolstoshev C., and Satterfield S.G.; Nucleic Acids Res.
(1991) 19:637-647).
<PAGE>
These cross-references are present in the DR lines:
Data bank identifier: ECOGENE
Primary identifier : EcoGene gene accession number
Secondary identifier: Gene designation
Example : DR ECOGENE; EG10075; AROC.
The collaboration with Ken Rudd goes much further than simply adding
these cross-references. Thanks to this collaboration we have been able
to update hundreds of Escherichia coli sequence entries (to add data
concerning the function of some proteins, to resolve sequence conflicts,
to add references and comments, etc.), we are also using his master list
of sequenced genes to pinpoint missing sequences. We have also
implemented his gene name nomenclature for hypothetical proteins. This
scheme is described below.
Unnamed Escherichia coli hypothetical proteins and proteins of unknown
function are assigned gene names based upon their position on the E.
coli genomic physical map. They all begin with the letter `Y'. The next
two letters designate which 1/100th of the map (starting at the thr
locus) contain the ORF in the order YAA, YAB, ..YAJ, YBA, YBB, ..YBJ,
YCA, YCB, ..YJJ. ORF's within any one of these 100 intervals are given a
fourth letter (A-Z) that serves to distinguish them but is not meant to
convey position information.
2.3.2 FlyBase
The official name of Michael Ashburner Drosophila Genetics Maps Database
has been changed from `DMAP' to `FlyBase'; we have therefore changed the
data bank identifier in the DR line from `DMAP' to `FLYBASE' for all
cross-references to that data collection.
2.4 Minor change in the RL lines for thesis references
Up till now, thesis references have been formatted as in the following
example:
RL UNPUBLISHED (1972) THESIS, GEORGE WASHINGTON UNIVERSITY, USA.
In recognition of the fact that theses are generally regarded as
published references we will format them as follows starting with the
current release:
RL THESIS (19YY), INSTITUTION_NAME, COUNTRY.
Example:
RL THESIS (1972), GEORGE WASHINGTON UNIVERSITY, USA.
<PAGE>
For those of you who write software to parse reference blocks, the
presence of the word `THESIS' as the first word on the first RL line of
a reference block will thus indicate a thesis reference. The remaining
text consists of a parenthesized year followed by the institution name
followed by the country where that institution is localized.
3. FORTHCOMING CHANGES
The following changes will be implemented starting with release 21.
3.1 Change in the format of the entry names
The dollar sign `$' in entry names will be replaced by the underscore
character `_'. This change is made on the behalf of users of sequence
analysis software running under the Unix operating system, where the
dollar sign is a reserved symbol. Example: the entry name `CYC$HUMAN'
will be changed to `CYC_HUMAN'.
3.2 New line type GN
The GN (Gene Name) line is a new line that will be used to indicate the
name(s) of the gene(s) that encodes for the protein being described.
Currently this information is found in the DE line as shown in the
following example:
DE SERUM ALBUMIN PRECURSOR (GENE NAME: ALB).
The format of the GN line will be:
GN NAME1[ AND|OR NAME2...].
Examples:
GN ALB.
GN REX-1.
It often occurs that more than one gene name has been assigned to an
individual locus. In that case all the synonyms will be listed. The word
`OR' separates the different designations. The first name in the list is
assumed to be the most correct (or most current) designation. Example:
GN HNS OR DRDX OR OSMZ OR BGLY.
In a few cases, multiple genes encode for an identical protein sequence.
In that case all the different gene names will be listed. The word `AND'
separates the designations. Example:
GN CECA1 AND CECA2.
<PAGE>
In very rare cases (only one occurrence has been found in the current
release) `AND' and `OR' could be both present. In that case parenthesis
are used as shown in the following example:
GN GVPA AND (GVPB OR GVPA2).
3.3 New line type RM
The RM (Reference Medline) line will be used to indicate the Medline
Unique Identifier (UID) of a reference. This information is currently
listed in the RC line using the `MEDLINE' token as shown in the
following example:
RC MEDLINE=90205618;
The format of the RM line will be:
RM nnnnnnnn
where `nnnnnnnn' is the eight digit Medline Unique Identifier (UID).
Example:
RM 90205618
3.4 Secondary structure information
Thanks to the help of Chris Sander of the Biocomputing group at EMBL we
are going to add in the feature table of each sequence entry that
belongs to a protein whose tertiary structure is known, the secondary
structure information corresponding to that protein. Complete details
regarding this new feature will be communicated in the next release
notes.
4. ENZYME AND PROSITE
4.1 The ENZYME data bank
Release 7.0 of the ENZYME data bank is distributed along with release 19
of SWISS-PROT. ENZYME release 7.0 contains information relative to 3072
enzymes. The data bank is complete and up to date. Until new enzyme
nomenclature data is published we only plan to update the SWISS-PROT
pointers at each release of the protein sequence data bank, correct
eventual errors, and complete the information concerning synonyms and
cofactors using the literature.
<PAGE>
4.2 The PROSITE data bank
Release 8.00 of the PROSITE data bank is distributed along with release
20 of SWISS-PROT. Release 8.00 contains 530 documentation chapters that
describes 605 different patterns.
4.2.1 What's new in release 8.0
Since the last major release of PROSITE (release 7.01 of June 1991), 88
new chapters have been added and 210 chapters have been updated. The new
chapters are:
- Fibrinogen beta and gamma chains C-terminal domain signature
- Somatomedin B domain signature
- Cellulose-binding domain, bacterial type
- Cellulose-binding domain, fungal type
- Zinc finger, C3HC4 type, signature
- IRF family signature
- TEA domain signature
- Fibrillarin signature
- Bacterial regulatory proteins, asnC family signature
- Bacterial regulatory proteins, merR family signature
- HMG-I and HMG-Y DNA-binding domain (A+T-hook)
- Chromo domain
- Nuclear transition protein 1 signature
- Ribosomal protein L6 signature
- Ribosomal protein L16 signature
- Ribosomal protein L29 signature
- Ribosomal protein L33 signature
- Ribosomal protein L19e signature
- Ribosomal protein L32e signature
- Ribosomal protein S3 signature
- Ribosomal protein S5 signature
- Ribosomal protein S14 signature
- Ribosomal protein S4e signature
- Ribosomal protein S6e signature
- Ribosomal protein S24e signature
- FMN-dependent alpha-hydroxy acid dehydrogenases active site
- Eukaryotic molybdopterin oxidoreductases signature
- Delta 1-pyrroline-5-carboxylate reductase signature
- Pyridine nucleotide-disulphide oxidoreductases class-II active site
- Respiratory chain NADH dehydrogenase 30 Kd subunit signature
- Respiratory chain NADH dehydrogenase 49 Kd subunit signature
- Bacterial ring hydroxylating dioxygenases alpha-subunit signature
- Heme oxygenase signature
- Beta-ketoacyl synthases active site
- Transglutaminases active site
- Aminotransferases class-II pyridoxal-phosphate attachment site
- Aminotransferases class-III pyridoxal-phosphate attachment site
- Phosphoserine aminotransferase signature
<PAGE>
- pfkB family prokaryotic carbohydrate kinases signatures
- Phosphoribulokinase signature
- Thymidine kinase cellular-type signature
- DNA polymerase family X signature
- PEP-utilizing enzymes phosphorylation site signature
- cAMP phosphodiesterases class-II signature
- Ribonuclease III family signature
- Ribonuclease T2 family histidine active sites
- Glycosyl hydrolases family 1 active site
- Glycosyl hydrolases family 9 active site
- Glycosyl hydrolases family 10 active site
- Glycosyl hydrolases family 17 signature
- Alkylbase DNA glycosidases alkA family signature
- Matrixins cysteine switch
- Amidases signature
- ATP synthase c subunit signature
- Phosphoenolpyruvate carboxykinase (ATP) signature
- Fructose-bisphosphate aldolase class-II signature
- Porphobilinogen deaminase cofactor-binding site
- Ferrochelatase signature
- Aldose 1-epimerase putative active site
- Methylmalonyl-CoA mutase signature
- Ubiquitin-activating enzyme signature
- Adenylosuccinate synthetase active site
- Argininosuccinate synthase signatures
- Cytochrome b559 subunits heme-binding site signature
- High potential iron-sulfur proteins signature
- Bacterioferritin signature
- Hemerythrins signature
- Avidin / Streptavidin family signature
- Plant lipid transfer proteins signature
- Aromatic amino acids permeases signature
- General diffusion gram-negative porins signature
- Eukaryotic porin signature
- Myelin basic protein signature
- Myelin P0 protein signature
- Myelin proteolipid protein signature
- Synaptophysin / synaptoporin signature
- Flagella basal body rod proteins signature
- Plant viruses icosahedral capsid proteins 'S' region signature
- Bacterial chemotaxis sensory transducers signature
- Interleukin-10 signature
- LIF / OSM family signature
- Pyrokinins signature
- Hok/gef family cell toxic proteins signature
- Lambdoid phages regulatory protein CIII signature
- Stathmin family signature
- HlyD family secretion proteins signature
- Seminal vesicle protein II repeats signature
<PAGE>
4.2.2 New /SKIP-FLAG qualifier for CC lines
Some PROSITE keys such as those describing commonly found post-
translational modifications (a typical example is N-glycosylation) are
found in the majority of known protein sequences. While it is generally
useful to note their presence, some programs may want, in some cases, to
ignore those keys. For this purpose these keys are indicated with the
following qualifier in their CC lines:
CC /SKIP-FLAG=TRUE;
4.2.3 The new 3D line type
We have introduced a new line: 3D (3D-structure), which is used to list
the code(s) of X-ray crystallography Protein Data Bank (PDB) entries
that contain structural data corresponding the sequence region described
in a PROSITE entry. The format of the 3D line is:
3D name; [name2;...]
Example:
3D 7WGA; 9WGA; 1WGC; 2WGC;
5. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, as for example if the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.65 Gln (Q) 4.08 Leu (L) 9.12 Ser (S) 7.10
Arg (R) 5.23 Glu (E) 6.27 Lys (K) 5.85 Thr (T) 5.85
Asn (N) 4.46 Gly (G) 7.10 Met (M) 2.32 Trp (W) 1.30
Asp (D) 5.25 His (H) 2.27 Phe (F) 3.95 Tyr (Y) 3.21
Cys (C) 1.81 Ile (I) 5.45 Pro (P) 5.08 Val (V) 6.49
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 3029
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 1303
2x: 548
3x: 308
4x: 189
5x: 131
6x: 100
7x: 73
8x: 47
9x: 64
10x: 32
11- 20x: 117
21-100x: 92
>100x: 25
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 1847 Human
2 1596 Escherichia coli
3 1083 Mouse
4 1014 Rat
5 778 Baker's yeast (Saccharomyces cerevisiae)
6 511 Bovine
7 435 Fruit fly (Drosophila melanogaster)
8 374 Chicken
9 311 Bacillus subtilis
10 266 Rabbit
266 African clawed frog (Xenopus laevis)
12 251 Vaccinia virus (strain Copenhagen)
13 245 Pig
14 208 Salmonella typhimurium
15 193 Human cytomegalovirus (strain AD169)
16 166 Bacteriophage T4
17 160 Maize
18 133 Rice
19 118 Vaccinia virus (strain WR)
118 Tobacco
21 113 Pea
22 111 Wheat
23 110 Staphylococcus aureus
24 101 Slime mold (Dictyostelium discoideum)
101 Barley
26 100 Sheep
27 94 Fission yeast (Schizosaccharomyces pombe)
28 93 Pseudomonas aeruginosa
29 92 Spinach
30 89 Pseudomonas putida
31 86 Soybean
86 Neurospora crassa
33 85 Dog
34 84 Liverwort (Marchantia polymorpha)
35 82 Klebsiella pneumoniae
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 1494 1001-1100 210
51- 100 2500 1101-1200 131
101- 150 3559 1201-1300 108
151- 200 2181 1301-1400 64
201- 250 1817 1401-1500 54
251- 300 1634 1501-1600 32
301- 350 1494 1601-1700 25
351- 400 1436 1701-1800 26
401- 450 1104 1801-1900 30
451- 500 1188 1901-2000 23
501- 550 881 2001-2100 9
551- 600 619 2101-2200 25
601- 650 438 2201-2300 31
651- 700 319 2301-2400 11
701- 750 297 2401-2500 13
751- 800 232 >2500 53
801- 850 187
851- 900 194
901- 950 119
951-1000 116
Currently the ten largest sequences are:
RYNR$RABIT 5037 a.a.
RYNR$HUMAN 5032 a.a.
APB$HUMAN 4563 a.a.
APOA$HUMAN 4548 a.a.
DYHC$TRIGR 4466 a.a.
POLG$BVDV 3988 a.a.
POLG$HCVA 3898 a.a.
POLG$HCVB 3898 a.a.
TRX$DROME 3759 a.a.
ACVA$PENCH 3746 a.a.
<PAGE>
APPENDIX B: ON-LINE EXPERTS
B.1 List of on-line experts for PROSITE and SWISS-PROT
Field of expertise Name Email address
--------------------------- ------------------ ----------------------------
African swine fever virus Yanez R.J. ryanez@cbm2.uam.es
Alcohol dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt@medfys.ki.se
Aldehyde dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt@medfys.ki.se
Alpha-crystallins/HSP-20 Leunissen J.A.M. jackl@caos.caos.kun.nl
de Jong W. u629000@hnykun11.bitnet
Alpha-2-macroglobulins Van Leuven F. fred@blekul13.bitnet
Apolipoproteins Boguski M.S. boguski@ncbi.nlm.nih.gov
Arrestins Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Bacteriophage P4 proteins Halling C. chh9@midway.uchicago.edu
Beta-lactamases Brannigan J. jab5@vaxa.york.ac.uk
Chitinases Henrissat B. cermav@frgren81.bitnet
Clusterin Peitsch M.C. peitsch@ulbio1.unil.ch
CTF/NF-I Mermod N. nmermod@ulys.unil.ch
Cytochromes P450 Holsztynska E.J. ela@netcom.uucp
netcom!ela@apple.com
EF-hand calcium-binding Cox J.A. cox@sc2a.unige.ch
Kretsinger R.H. rhk5i@virginia.bitnet
Enoyl-CoA hydratase Hofmann K.O. khofmann@cipvax.biolan.uni-koeln.de
fruR/lacI family HTH proteins Reizer J. jreizer@ucsd.edu
GATA-type zinc-fingers Boguski M.S. boguski@ncbi.nlm.nih.gov
Glucanases Henrissat B. cermav@frgren81.bitnet
Beguin P. phycel@pasteur.bitnet
G-protein coupled receptors Chollet A. chollet@clients.switch.ch
Attwood T.K. bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins Boguski M.S. boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17 Landsman D. landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.edu
Integrases Roy P.H. 2020000@lavalvx1.bitnet
Lipocalins Boguski M.S. boguski@ncbi.nlm.nih.gov
Peitsch M.C. peitsch@ulbio1.unil.ch
MAC components / perforin Peitsch M.C. peitsch@ulbio1.unil.ch
Myelin proteolipid protein Hofmann K.O. khofmann@cipvax.biolan.uni-koeln.de
PEP requiring enzymes Reizer J. jreizer@ucsd.edu
Phytochromes Partis M.D. partis@gcri.afrc.ac.uk
Prokaryotic carbohydrate Reizer J. jreizer@ucsd.edu
kinases
Protein kinases Hanks S. hanks@vuctrvax.bitnet
Hunter T. hunter@salk.bitnet
PTS proteins Reizer J. jreizer@ucsd.edu
Restriction-modification Bickle T. bickle@urz.unibas.ch
enzymes Roberts R.J. roberts@cshl.org
<PAGE>
Ribosomal protein S3 Hallick R. hallick%biotec@arizona.edu
Ribosomal protein S15 Ellis S.R. srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases Harayama S. harayama@cmu.unige.ch
Sodium symporters Reizer J. jreizer@ucsd.edu
Subtilases Brannigan J. jab5@vaxa.york.ac.uk
Thiol proteases Turk B. turk@ijs.ac.mail.yu
Thiol proteases inhibitors Turk B. turk@ijs.ac.mail.yu
TPR repeats Boguski M.S. boguski@ncbi.nlm.nih.gov
Transit peptides von Heijne G. gunnar@cbts.sunet.se
Type-II membrane antigens Levy S. levy@cellbio.stanford.edu
Uracil-DNA glycosylase Aasland R. aasland@bio.uib.no
Xylose isomerase Jenkins J. jenkins@frira.afrc.ac.uk
B.2 Requirements to fulfill to become an on-line expert
An expert should be a scientist working with specific famili(es) of
proteins (or specific domains) and which would:
a) Review the protein sequences in SWISS-PROT and the patterns/matrices
in PROSITE relevant to their field of research.
b) Agree to be contacted by people that have obtained new sequence(s)
which seem to belong to "their" familie(s) of proteins.
c) Have access to electronic mail and be willing to use it to send and
receive data.
If you are willing to be part of this scheme please contact Amos Bairoch
at one of the following electronic mail addresses:
bairoch@cgecmu51.bitnet
bairoch@cmu.unige.ch
<PAGE>
APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
**********************
*********************** <----- * EPD [Euk. Promot.] *
* EMBL Nucleotide * -----> **********************
* Sequence Data *
***************** ----> * Library * **********************
* FLYBASE * *********************** <----- * ECD [E. coli map] *
* [Drosophila * ^ | ^ **********************
* genetic maps] * --------+ | | |
***************** <-----+ | | | +--------- **********************
| | | | +--------- * TFD [Trans. fact.] *
| | | | | +------> **********************
| | | | | |
***************** | v | v v | **********************
* REBASE * *********************** * ENZYME [Nomencl.] *
* [Restriction * <---- * SWISS-PROT * <----- **********************
* enzymes] * * Protein Sequence * |
***************** * Data Bank * v
*********************** **********************
***************** | ^ | ^ | | | * OMIM [Diseases] *
* PROSITE * <-------+ | | | | | +--------> **********************
* [Patterns] * ----------+ | | | |
***************** | | | +-----------> **********************
| | | | * E. coli 2D gels *
| | | | **********************
| | | |
| | | +-------------> **********************
| v +---------------- * EcoGene/EcoSeq *
| *********************** **********************
+--------> * PDB [3D structures] *
***********************
<PAGE>
