You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.
Swiss-Prot release 20.0
Published November 1, 1991
SWISS-PROT RELEASE 20.0 RELEASE NOTES 1. INTRODUCTION 1.1 Evolution Release 20.0 of SWISS-PROT contains 22654 sequence entries, comprising 7'500'130 amino acids abstracted from 22830 references. This represents an increase of 5% over release 19. The recent growth of the data bank is summarized below. Release Date Number of entries Nb of amino acids 3.0 11/86 4160 969 641 4.0 04/87 4387 1 036 010 5.0 09/87 5205 1 327 683 6.0 01/88 6102 1 653 982 7.0 04/88 6821 1 885 771 8.0 08/88 7724 2 224 465 9.0 11/88 8702 2 498 140 10.0 03/89 10008 2 952 613 11.0 07/89 10856 3 265 966 12.0 10/89 12305 3 797 482 13.0 01/90 13837 4 347 336 14.0 04/90 15409 4 914 264 15.0 08/90 16941 5 486 399 16.0 11/90 18364 5 986 949 17.0 02/91 20024 6 524 504 18.0 05/91 20772 6 792 034 19.0 08/91 21795 7 173 785 20.0 11/91 22654 7 500 130 1.2 Source of data Release 20.0 has been updated using protein sequence data from release 29.0 of the PIR (Protein Identification Resource) protein data bank, as well as translation of nucleotide sequence data from release 28.0 of the EMBL Nucleotide Sequence Database. As an indication to the source of the sequence data in the SWISS-PROT data bank we list here the statistics concerning the DR (Database cross- references) pointer lines: Entries with pointer(s) to only PIR entri(es): 4129 Entries with pointer(s) to only EMBL entri(es): 2970 Entries with pointer(s) to both EMBL and PIR entri(es): 15061 Entries with no pointers lines: 494 <PAGE> 2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 19 2.1 Sequences and annotations About 890 sequences have been added since release 19, the sequence data of 187 existing entries has been updated and the annotations of 3030 entries have been revised. In particular we have used reviews articles to update the annotations of the following groups or families of proteins: - Aminotransferases class-II and class-III - Aromatic amino acids permeases - Avidin/Streptavidin - Bacterial porins - Bacterial regulatory proteins, asnC family - Bacterial regulatory proteins, crp family - Bacterial ring hydroxylating dioxygenases - Beta-ketoacyl synthases - C3HC4-class zinc finger proteins - Chalcone and resveratrol synthases - Chromo domain proteins - DEAD-box family ATP-dependent helicases - Flagella basal body rod proteins - GATA transcription factors - Glutamate / Leucine / Phenylalanine dehydrogenases - Glycosyl hydrolase family 1 - Glycosyl hydrolase family 9 - Glycosyl hydrolase family 10 - Glycosyl hydrolase family 17 - Heme oxygenase - High potential iron-sulfur proteins - HIV envelope proteins - Initiation factor 5a (eIF-5a) - Interferon regulatory factors - Myelin P0 protein - Myelin proteolipid protein - PEP-utilizing enzymes - pfkB family prokaryotic carbohydrate kinases - Plant lipid transfer proteins - PTS Hpr component - Ribosomal proteins - Saposins - Serine/threonine protein kinases <PAGE> 2.2 Changes in the feature table 2.2.1 The LIPID key Until this release SWISS-PROT was very inconsistent in the use of the feature table to annotate the covalent binding of lipids (fatty acids, prenyl groups, or glycolipids) to a specific position in a protein sequence. The attachment of a myristate group was indicated by the `MYRISTYL' key, the attachment of a palmitate group was indicated using either the `MOD_RES' or the `BINDING' keys, prenylation and GPI-anchor were indicated using the `BINDING' key. To correct these inconsistencies we have introduced in this release a new key `LIPID' and we have deleted the `MYRISTYL' key. Definition of the new key: LIPID - Covalent binding of a lipidic moiety The chemical nature of the bound lipid moiety is given in the description. The general format of the LIPID description field is: FT LIPID xxx xxx MODIFICATION (COMMENT). The modifications which are currently defined are the following: MYRISTATE Myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein [1,2]. PALMITATE Palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue [1,2]. FARNESYL Farnesyl group attached through a thioether bond to a cysteine residue . GERANYL-GERANYL Geranyl-geranyl group attached through a thioether bond to a cysteine residue . GPI-ANCHOR Glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein [4,5]. N-ACYL DIGLYCERIDE N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages .  Grand R.J.A. Biochem. J. 258:626-638(1989).  McLhinney R.A.J. Trends Biochem. Sci. 15:387-391(1990). <PAGE>  Glomset J.A., Gelb M.H., Farnsworth C.C. Trends Biochem. Sci. 15:139-142(1990).  Low M.G. FASEB J. 3:1600-1608(1989).  Low M.G. Biochimica Biophysica Acta 988:427-454(1989).  Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). Examples of LIPID key feature lines: FT LIPID 1 1 MYRISTATE. FT LIPID 65 65 PALMITATE (BY SIMILARITY). FT LIPID 354 354 GPI-ANCHOR. 2.2.2 The VARSPLIC key There are some genes which, by the mechanism of alternative splicing, encode closely related proteins that differs only by the presence or absence of one or more domains. Generally a single sequence entry represents the longest form of the protein and the feature table is used to indicate the regions which differ in alternative spliced forms. The `VARIANT' key was used for such purpose. For example (from entry P04085; PGDA$HUMAN): FT VARIANT 194 196 GRP -> DVR (IN SHORT FORM). FT VARIANT 197 211 MISSING (IN SHORT FORM). It was very difficult to write software tools that could distinguish between the above usage of the `VARIANT' key and the more classical use to describe polymorphisms or natural mutations. We have therefore introduced in this release a new key `VARSPLIC' which is used specifically to describe splicing variants. The example shown above is know represented by: FT VARSPLIC 194 196 GRP -> DVR (IN SHORT FORM). FT VARSPLIC 197 211 MISSING (IN SHORT FORM). 2.3 Changes in the cross-references lines (DR) 2.3.1 Cross-references to EcoGene Starting with this release we have added cross-references to the EcoGene section of the EcoSeq/EcoMap integrated Escherichia coli database prepared by Ken Rudd at the National Center for Biotechnology Information (NCBI) (for a description see: Rudd K.E., Miller W., Werner C., Ostell J., Tolstoshev C., and Satterfield S.G.; Nucleic Acids Res. (1991) 19:637-647). <PAGE> These cross-references are present in the DR lines: Data bank identifier: ECOGENE Primary identifier : EcoGene gene accession number Secondary identifier: Gene designation Example : DR ECOGENE; EG10075; AROC. The collaboration with Ken Rudd goes much further than simply adding these cross-references. Thanks to this collaboration we have been able to update hundreds of Escherichia coli sequence entries (to add data concerning the function of some proteins, to resolve sequence conflicts, to add references and comments, etc.), we are also using his master list of sequenced genes to pinpoint missing sequences. We have also implemented his gene name nomenclature for hypothetical proteins. This scheme is described below. Unnamed Escherichia coli hypothetical proteins and proteins of unknown function are assigned gene names based upon their position on the E. coli genomic physical map. They all begin with the letter `Y'. The next two letters designate which 1/100th of the map (starting at the thr locus) contain the ORF in the order YAA, YAB, ..YAJ, YBA, YBB, ..YBJ, YCA, YCB, ..YJJ. ORF's within any one of these 100 intervals are given a fourth letter (A-Z) that serves to distinguish them but is not meant to convey position information. 2.3.2 FlyBase The official name of Michael Ashburner Drosophila Genetics Maps Database has been changed from `DMAP' to `FlyBase'; we have therefore changed the data bank identifier in the DR line from `DMAP' to `FLYBASE' for all cross-references to that data collection. 2.4 Minor change in the RL lines for thesis references Up till now, thesis references have been formatted as in the following example: RL UNPUBLISHED (1972) THESIS, GEORGE WASHINGTON UNIVERSITY, USA. In recognition of the fact that theses are generally regarded as published references we will format them as follows starting with the current release: RL THESIS (19YY), INSTITUTION_NAME, COUNTRY. Example: RL THESIS (1972), GEORGE WASHINGTON UNIVERSITY, USA. <PAGE> For those of you who write software to parse reference blocks, the presence of the word `THESIS' as the first word on the first RL line of a reference block will thus indicate a thesis reference. The remaining text consists of a parenthesized year followed by the institution name followed by the country where that institution is localized. 3. FORTHCOMING CHANGES The following changes will be implemented starting with release 21. 3.1 Change in the format of the entry names The dollar sign `$' in entry names will be replaced by the underscore character `_'. This change is made on the behalf of users of sequence analysis software running under the Unix operating system, where the dollar sign is a reserved symbol. Example: the entry name `CYC$HUMAN' will be changed to `CYC_HUMAN'. 3.2 New line type GN The GN (Gene Name) line is a new line that will be used to indicate the name(s) of the gene(s) that encodes for the protein being described. Currently this information is found in the DE line as shown in the following example: DE SERUM ALBUMIN PRECURSOR (GENE NAME: ALB). The format of the GN line will be: GN NAME1[ AND|OR NAME2...]. Examples: GN ALB. GN REX-1. It often occurs that more than one gene name has been assigned to an individual locus. In that case all the synonyms will be listed. The word `OR' separates the different designations. The first name in the list is assumed to be the most correct (or most current) designation. Example: GN HNS OR DRDX OR OSMZ OR BGLY. In a few cases, multiple genes encode for an identical protein sequence. In that case all the different gene names will be listed. The word `AND' separates the designations. Example: GN CECA1 AND CECA2. <PAGE> In very rare cases (only one occurrence has been found in the current release) `AND' and `OR' could be both present. In that case parenthesis are used as shown in the following example: GN GVPA AND (GVPB OR GVPA2). 3.3 New line type RM The RM (Reference Medline) line will be used to indicate the Medline Unique Identifier (UID) of a reference. This information is currently listed in the RC line using the `MEDLINE' token as shown in the following example: RC MEDLINE=90205618; The format of the RM line will be: RM nnnnnnnn where `nnnnnnnn' is the eight digit Medline Unique Identifier (UID). Example: RM 90205618 3.4 Secondary structure information Thanks to the help of Chris Sander of the Biocomputing group at EMBL we are going to add in the feature table of each sequence entry that belongs to a protein whose tertiary structure is known, the secondary structure information corresponding to that protein. Complete details regarding this new feature will be communicated in the next release notes. 4. ENZYME AND PROSITE 4.1 The ENZYME data bank Release 7.0 of the ENZYME data bank is distributed along with release 19 of SWISS-PROT. ENZYME release 7.0 contains information relative to 3072 enzymes. The data bank is complete and up to date. Until new enzyme nomenclature data is published we only plan to update the SWISS-PROT pointers at each release of the protein sequence data bank, correct eventual errors, and complete the information concerning synonyms and cofactors using the literature. <PAGE> 4.2 The PROSITE data bank Release 8.00 of the PROSITE data bank is distributed along with release 20 of SWISS-PROT. Release 8.00 contains 530 documentation chapters that describes 605 different patterns. 4.2.1 What's new in release 8.0 Since the last major release of PROSITE (release 7.01 of June 1991), 88 new chapters have been added and 210 chapters have been updated. The new chapters are: - Fibrinogen beta and gamma chains C-terminal domain signature - Somatomedin B domain signature - Cellulose-binding domain, bacterial type - Cellulose-binding domain, fungal type - Zinc finger, C3HC4 type, signature - IRF family signature - TEA domain signature - Fibrillarin signature - Bacterial regulatory proteins, asnC family signature - Bacterial regulatory proteins, merR family signature - HMG-I and HMG-Y DNA-binding domain (A+T-hook) - Chromo domain - Nuclear transition protein 1 signature - Ribosomal protein L6 signature - Ribosomal protein L16 signature - Ribosomal protein L29 signature - Ribosomal protein L33 signature - Ribosomal protein L19e signature - Ribosomal protein L32e signature - Ribosomal protein S3 signature - Ribosomal protein S5 signature - Ribosomal protein S14 signature - Ribosomal protein S4e signature - Ribosomal protein S6e signature - Ribosomal protein S24e signature - FMN-dependent alpha-hydroxy acid dehydrogenases active site - Eukaryotic molybdopterin oxidoreductases signature - Delta 1-pyrroline-5-carboxylate reductase signature - Pyridine nucleotide-disulphide oxidoreductases class-II active site - Respiratory chain NADH dehydrogenase 30 Kd subunit signature - Respiratory chain NADH dehydrogenase 49 Kd subunit signature - Bacterial ring hydroxylating dioxygenases alpha-subunit signature - Heme oxygenase signature - Beta-ketoacyl synthases active site - Transglutaminases active site - Aminotransferases class-II pyridoxal-phosphate attachment site - Aminotransferases class-III pyridoxal-phosphate attachment site - Phosphoserine aminotransferase signature <PAGE> - pfkB family prokaryotic carbohydrate kinases signatures - Phosphoribulokinase signature - Thymidine kinase cellular-type signature - DNA polymerase family X signature - PEP-utilizing enzymes phosphorylation site signature - cAMP phosphodiesterases class-II signature - Ribonuclease III family signature - Ribonuclease T2 family histidine active sites - Glycosyl hydrolases family 1 active site - Glycosyl hydrolases family 9 active site - Glycosyl hydrolases family 10 active site - Glycosyl hydrolases family 17 signature - Alkylbase DNA glycosidases alkA family signature - Matrixins cysteine switch - Amidases signature - ATP synthase c subunit signature - Phosphoenolpyruvate carboxykinase (ATP) signature - Fructose-bisphosphate aldolase class-II signature - Porphobilinogen deaminase cofactor-binding site - Ferrochelatase signature - Aldose 1-epimerase putative active site - Methylmalonyl-CoA mutase signature - Ubiquitin-activating enzyme signature - Adenylosuccinate synthetase active site - Argininosuccinate synthase signatures - Cytochrome b559 subunits heme-binding site signature - High potential iron-sulfur proteins signature - Bacterioferritin signature - Hemerythrins signature - Avidin / Streptavidin family signature - Plant lipid transfer proteins signature - Aromatic amino acids permeases signature - General diffusion gram-negative porins signature - Eukaryotic porin signature - Myelin basic protein signature - Myelin P0 protein signature - Myelin proteolipid protein signature - Synaptophysin / synaptoporin signature - Flagella basal body rod proteins signature - Plant viruses icosahedral capsid proteins 'S' region signature - Bacterial chemotaxis sensory transducers signature - Interleukin-10 signature - LIF / OSM family signature - Pyrokinins signature - Hok/gef family cell toxic proteins signature - Lambdoid phages regulatory protein CIII signature - Stathmin family signature - HlyD family secretion proteins signature - Seminal vesicle protein II repeats signature <PAGE> 4.2.2 New /SKIP-FLAG qualifier for CC lines Some PROSITE keys such as those describing commonly found post- translational modifications (a typical example is N-glycosylation) are found in the majority of known protein sequences. While it is generally useful to note their presence, some programs may want, in some cases, to ignore those keys. For this purpose these keys are indicated with the following qualifier in their CC lines: CC /SKIP-FLAG=TRUE; 4.2.3 The new 3D line type We have introduced a new line: 3D (3D-structure), which is used to list the code(s) of X-ray crystallography Protein Data Bank (PDB) entries that contain structural data corresponding the sequence region described in a PROSITE entry. The format of the 3D line is: 3D name; [name2;...] Example: 3D 7WGA; 9WGA; 1WGC; 2WGC; 5. WE NEED YOUR HELP ! We welcome feedback from our users. We would especially appreciate that you notify us if you find that sequences belonging to your field of expertise are missing from the data bank. We also would like to be notified about annotations to be updated, as for example if the function of a protein has been clarified or if new post-translational information has become available. <PAGE> APPENDIX A: SOME STATISTICS A.1 Amino acid composition A.1.1 Composition in percent for the complete data bank Ala (A) 7.65 Gln (Q) 4.08 Leu (L) 9.12 Ser (S) 7.10 Arg (R) 5.23 Glu (E) 6.27 Lys (K) 5.85 Thr (T) 5.85 Asn (N) 4.46 Gly (G) 7.10 Met (M) 2.32 Trp (W) 1.30 Asp (D) 5.25 His (H) 2.27 Phe (F) 3.95 Tyr (Y) 3.21 Cys (C) 1.81 Ile (I) 5.45 Pro (P) 5.08 Val (V) 6.49 Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03 A.1.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Ser, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln, Phe, Tyr, Met, His, Cys, Trp A.2 Repartition of the sequences by their organism of origin Total number of species represented in this release of SWISS-PROT: 3029 A.2.1 Table of the frequency of occurrence of species Species represented 1x: 1303 2x: 548 3x: 308 4x: 189 5x: 131 6x: 100 7x: 73 8x: 47 9x: 64 10x: 32 11- 20x: 117 21-100x: 92 >100x: 25 <PAGE> A.2.2 Table of the most represented species Number Frequency Species 1 1847 Human 2 1596 Escherichia coli 3 1083 Mouse 4 1014 Rat 5 778 Baker's yeast (Saccharomyces cerevisiae) 6 511 Bovine 7 435 Fruit fly (Drosophila melanogaster) 8 374 Chicken 9 311 Bacillus subtilis 10 266 Rabbit 266 African clawed frog (Xenopus laevis) 12 251 Vaccinia virus (strain Copenhagen) 13 245 Pig 14 208 Salmonella typhimurium 15 193 Human cytomegalovirus (strain AD169) 16 166 Bacteriophage T4 17 160 Maize 18 133 Rice 19 118 Vaccinia virus (strain WR) 118 Tobacco 21 113 Pea 22 111 Wheat 23 110 Staphylococcus aureus 24 101 Slime mold (Dictyostelium discoideum) 101 Barley 26 100 Sheep 27 94 Fission yeast (Schizosaccharomyces pombe) 28 93 Pseudomonas aeruginosa 29 92 Spinach 30 89 Pseudomonas putida 31 86 Soybean 86 Neurospora crassa 33 85 Dog 34 84 Liverwort (Marchantia polymorpha) 35 82 Klebsiella pneumoniae <PAGE> A.3 Repartition of the sequences by size From To Number From To Number 1- 50 1494 1001-1100 210 51- 100 2500 1101-1200 131 101- 150 3559 1201-1300 108 151- 200 2181 1301-1400 64 201- 250 1817 1401-1500 54 251- 300 1634 1501-1600 32 301- 350 1494 1601-1700 25 351- 400 1436 1701-1800 26 401- 450 1104 1801-1900 30 451- 500 1188 1901-2000 23 501- 550 881 2001-2100 9 551- 600 619 2101-2200 25 601- 650 438 2201-2300 31 651- 700 319 2301-2400 11 701- 750 297 2401-2500 13 751- 800 232 >2500 53 801- 850 187 851- 900 194 901- 950 119 951-1000 116 Currently the ten largest sequences are: RYNR$RABIT 5037 a.a. RYNR$HUMAN 5032 a.a. APB$HUMAN 4563 a.a. APOA$HUMAN 4548 a.a. DYHC$TRIGR 4466 a.a. POLG$BVDV 3988 a.a. POLG$HCVA 3898 a.a. POLG$HCVB 3898 a.a. TRX$DROME 3759 a.a. ACVA$PENCH 3746 a.a. <PAGE> APPENDIX B: ON-LINE EXPERTS B.1 List of on-line experts for PROSITE and SWISS-PROT Field of expertise Name Email address --------------------------- ------------------ ---------------------------- African swine fever virus Yanez R.J. email@example.com Alcohol dehydrogenases Joernvall H. firstname.lastname@example.org Persson B. email@example.com Aldehyde dehydrogenases Joernvall H. firstname.lastname@example.org Persson B. email@example.com Alpha-crystallins/HSP-20 Leunissen J.A.M. firstname.lastname@example.org de Jong W. email@example.com Alpha-2-macroglobulins Van Leuven F. firstname.lastname@example.org Apolipoproteins Boguski M.S. email@example.com Arrestins Kolakowski L.F.Jr. firstname.lastname@example.org Bacteriophage P4 proteins Halling C. email@example.com Beta-lactamases Brannigan J. firstname.lastname@example.org Chitinases Henrissat B. email@example.com Clusterin Peitsch M.C. firstname.lastname@example.org CTF/NF-I Mermod N. email@example.com Cytochromes P450 Holsztynska E.J. firstname.lastname@example.org email@example.com EF-hand calcium-binding Cox J.A. firstname.lastname@example.org Kretsinger R.H. email@example.com Enoyl-CoA hydratase Hofmann K.O. firstname.lastname@example.org fruR/lacI family HTH proteins Reizer J. email@example.com GATA-type zinc-fingers Boguski M.S. firstname.lastname@example.org Glucanases Henrissat B. email@example.com Beguin P. firstname.lastname@example.org G-protein coupled receptors Chollet A. email@example.com Attwood T.K. firstname.lastname@example.org GTPase-activating proteins Boguski M.S. email@example.com HMG1/2 and HMG-14/17 Landsman D. firstname.lastname@example.org Inorganic pyrophosphatases Kolakowski L.F.Jr. email@example.com Integrases Roy P.H. firstname.lastname@example.org Lipocalins Boguski M.S. email@example.com Peitsch M.C. firstname.lastname@example.org MAC components / perforin Peitsch M.C. email@example.com Myelin proteolipid protein Hofmann K.O. firstname.lastname@example.org PEP requiring enzymes Reizer J. email@example.com Phytochromes Partis M.D. firstname.lastname@example.org Prokaryotic carbohydrate Reizer J. email@example.com kinases Protein kinases Hanks S. firstname.lastname@example.org Hunter T. email@example.com PTS proteins Reizer J. firstname.lastname@example.org Restriction-modification Bickle T. email@example.com enzymes Roberts R.J. firstname.lastname@example.org <PAGE> Ribosomal protein S3 Hallick R. email@example.com Ribosomal protein S15 Ellis S.R. firstname.lastname@example.org Ring-cleavage dioxygenases Harayama S. email@example.com Sodium symporters Reizer J. firstname.lastname@example.org Subtilases Brannigan J. email@example.com Thiol proteases Turk B. firstname.lastname@example.org Thiol proteases inhibitors Turk B. email@example.com TPR repeats Boguski M.S. firstname.lastname@example.org Transit peptides von Heijne G. email@example.com Type-II membrane antigens Levy S. firstname.lastname@example.org Uracil-DNA glycosylase Aasland R. email@example.com Xylose isomerase Jenkins J. firstname.lastname@example.org B.2 Requirements to fulfill to become an on-line expert An expert should be a scientist working with specific famili(es) of proteins (or specific domains) and which would: a) Review the protein sequences in SWISS-PROT and the patterns/matrices in PROSITE relevant to their field of research. b) Agree to be contacted by people that have obtained new sequence(s) which seem to belong to "their" familie(s) of proteins. c) Have access to electronic mail and be willing to use it to send and receive data. If you are willing to be part of this scheme please contact Amos Bairoch at one of the following electronic mail addresses: email@example.com firstname.lastname@example.org <PAGE> APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES The current status of the relationships (cross-references) between some biomolecular databases is shown in the following schematic: ********************** *********************** <----- * EPD [Euk. Promot.] * * EMBL Nucleotide * -----> ********************** * Sequence Data * ***************** ----> * Library * ********************** * FLYBASE * *********************** <----- * ECD [E. coli map] * * [Drosophila * ^ | ^ ********************** * genetic maps] * --------+ | | | ***************** <-----+ | | | +--------- ********************** | | | | +--------- * TFD [Trans. fact.] * | | | | | +------> ********************** | | | | | | ***************** | v | v v | ********************** * REBASE * *********************** * ENZYME [Nomencl.] * * [Restriction * <---- * SWISS-PROT * <----- ********************** * enzymes] * * Protein Sequence * | ***************** * Data Bank * v *********************** ********************** ***************** | ^ | ^ | | | * OMIM [Diseases] * * PROSITE * <-------+ | | | | | +--------> ********************** * [Patterns] * ----------+ | | | | ***************** | | | +-----------> ********************** | | | | * E. coli 2D gels * | | | | ********************** | | | | | | | +-------------> ********************** | v +---------------- * EcoGene/EcoSeq * | *********************** ********************** +--------> * PDB [3D structures] * *********************** <PAGE>