Swiss-Prot release 43.0
Published March 29, 2004
|
Swiss-Prot Protein Knowledgebase Release Notes Release 43.0 of 29-Mar-2004 |
Swiss-Prot release 43.0 release notes
| Content |
|---|
Introduction
Status of the model organisms
Swiss-Prot protein knowledgebase release 43.0 statistics
We need your help
See also Recent changes and Forthcoming changes.
| Introduction |
|---|
Release 43.0 of 29-Mar-2004 of Swiss-Prot contains 146'720 sequence entries, comprising 54'093'154 amino acids abstracted from 113'719 references. 10'760 sequences have been added since release 42, the sequence data of 663 existing entries has been updated and the annotations of 44'948 entries have been revised. This represents an increase of 8%.
-
Release Date Number of entries Number of amino acids 2.0 09/86 3'939 900'163 3.0 11/86 4'160 969'641 4.0 04/87 4'387 1'036'010 5.0 09/87 5'205 1'327'683 6.0 01/88 6'102 1'653'982 7.0 04/88 6'821 1'885'771 8.0 08/88 7'724 2'224'465 9.0 11/88 8'702 2'498'140 10.0 03/89 10'008 2'952'613 11.0 07/89 10'856 3'265'966 12.0 10/89 12'305 3'797'482 13.0 01/90 13'837 4'347'336 14.0 04/90 15'409 4'914'264 15.0 08/90 16'941 5'486'399 16.0 11/90 18'364 5'986'949 17.0 02/91 20'024 6'524'504 18.0 05/91 20'772 6'792'034 19.0 08/91 21'795 7'173'785 20.0 11/91 22'654 7'500'130 21.0 03/92 23'742 7'866'596 22.0 05/92 25'044 8'375'696 23.0 08/92 26'706 9'011'391 24.0 12/92 28'154 9'545'427 25.0 04/93 29'955 10'214'020 26.0 07/93 31'808 10'875'091 27.0 10/93 33'329 11'484'420 28.0 02/94 36'000 12'496'420 29.0 06/94 38'303 13'464'008 30.0 10/94 40'292 14'147'368 31.0 02/95 43'470 15'335'248 32.0 11/95 49'340 17'385'503 33.0 02/96 52'205 18'531'384 34.0 10/96 59'021 21'210'389 35.0 11/97 69'113 25'083'768 36.0 07/98 74'019 26'840'295 37.0 12/98 77'977 28'268'293 38.0 07/99 80'000 29'085'965 39.0 05/00 86'593 31'411'114 40.0 10/01 101'602 37'315'215 41.0 02/03 122'564 44'986'459 42.0 10/03 135'850 50'046'799 43.0 03/04 146'720 54'093'154
| Status of the model organisms |
|---|
We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:
- be as complete as possible. All sequences available at a given time should be immediately included in Swiss-Prot. This also includes sequence corrections and updates;
- provide a higher level of annotation;
- provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
- provide specific indexes and documents.
-
Organism Database cross-references Index file Number of sequences A.thaliana None yet arath.txt 2'591 C.albicans None yet calbican.txt 286 C.elegans Wormpep celegans.txt 2'458 D.discoideum DictyDB dicty.txt 319 D.melanogaster FlyBase fly.txt 1'967 M.musculus MGD mgdtosp.txt 7'326 S.cerevisiae SGD yeast.txt 4'930 S.pombe GeneDB_SPombe pombe.txt 2'386
| Swiss-Prot protein knowledgebase release 43.0 statistics |
|---|
1. INTRODUCTION
Release 43.0 of 29-Mar-2004 of Swiss-Prot contains 146720 sequence entries,
comprising 54093154 amino acids abstracted from 113719 references.
10760 sequences have been added since release 42, the sequence data of
663 existing entries has been updated and the annotations of
44948 entries have been revised. This represents an increase of 8%.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 7.79 Gln (Q) 3.92 Leu (L) 9.60 Ser (S) 6.89
Arg (R) 5.28 Glu (E) 6.59 Lys (K) 5.93 Thr (T) 5.47
Asn (N) 4.23 Gly (G) 6.93 Met (M) 2.37 Trp (W) 1.16
Asp (D) 5.30 His (H) 2.27 Phe (F) 4.03 Tyr (Y) 3.09
Cys (C) 1.56 Ile (I) 5.91 Pro (P) 4.85 Val (V) 6.70
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.01
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Asp, Arg, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of Swiss-Prot: 8424
The first twenty species represent 57715 sequences: 39.3 % of the total
number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x: 4065
2x: 1283
3x: 661
4x: 431
5x: 269
6x: 262
7x: 197
8x: 149
9x: 127
10x: 88
11- 20x: 344
21- 50x: 250
51-100x: 93
>100x: 205
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 10691 Homo sapiens (Human)
2 7326 Mus musculus (Mouse)
3 4930 Saccharomyces cerevisiae (Baker's yeast)
4 4835 Escherichia coli
5 3726 Rattus norvegicus (Rat)
6 2712 Bacillus subtilis
7 2591 Arabidopsis thaliana (Mouse-ear cress)
8 2458 Caenorhabditis elegans
9 2386 Schizosaccharomyces pombe (Fission yeast)
10 1967 Drosophila melanogaster (Fruit fly)
11 1773 Haemophilus influenzae
12 1772 Methanococcus jannaschii
13 1647 Escherichia coli O157:H7
14 1438 Bos taurus (Bovine)
15 1406 Salmonella typhimurium
16 1393 Mycobacterium tuberculosis
17 1284 Escherichia coli O6
18 1210 Shigella flexneri
19 1090 Gallus gallus (Chicken)
20 1080 Mycobacterium bovis
21 980 Salmonella typhi
22 962 Pseudomonas aeruginosa
23 941 Synechocystis sp. (strain PCC 6803)
24 937 Archaeoglobus fulgidus
25 873 Xenopus laevis (African clawed frog)
26 850 Sus scrofa (Pig)
27 766 Rhizobium meliloti (Sinorhizobium meliloti)
28 743 Vibrio cholerae
29 738 Aquifex aeolicus
30 725 Oryctolagus cuniculus (Rabbit)
31 695 Yersinia pestis
32 687 Mycoplasma pneumoniae
33 647 Pasteurella multocida
34 605 Mycobacterium leprae
35 603 Treponema pallidum
36 601 Streptomyces coelicolor
37 586 Bacillus halodurans
38 572 Buchnera aphidicola (subsp. Acyrthosiphon pisum)
39 570 Vibrio parahaemolyticus
40 560 Buchnera aphidicola (subsp. Schizaphis graminum)
41 557 Methanobacterium thermoautotrophicum
42 557 Helicobacter pylori (Campylobacter pylori)
43 543 Rickettsia prowazekii
44 542 Anabaena sp. (strain PCC 7120)
45 538 Helicobacter pylori J99 (Campylobacter pylori J99)
46 518 Vibrio vulnificus
47 504 Staphylococcus aureus (strain Mu50 / ATCC 700699)
48 503 Staphylococcus aureus (strain N315)
49 499 Zea mays (Maize)
50 495 Lactococcus lactis (subsp. lactis) (Streptococcus lactis)
51 487 Staphylococcus aureus (strain MW2)
52 486 Mycoplasma genitalium
53 467 Ralstonia solanacearum (Pseudomonas solanacearum)
54 464 Staphylococcus epidermidis
55 463 Listeria monocytogenes
56 459 Neisseria meningitidis (serogroup B)
57 457 Listeria innocua
58 457 Neisseria meningitidis (serogroup A)
59 449 Pseudomonas putida (strain KT2440)
60 448 Thermotoga maritima
61 447 Rhizobium loti (Mesorhizobium loti)
62 447 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
63 443 Xanthomonas campestris (pv. campestris)
64 443 Clostridium acetobutylicum
65 438 Pseudomonas syringae (pv. tomato)
66 434 Caulobacter crescentus
67 424 Oryza sativa (Rice)
68 419 Deinococcus radiodurans
69 417 Chlamydia trachomatis
70 416 Streptococcus pneumoniae
71 414 Borrelia burgdorferi (Lyme disease spirochete)
72 412 Xylella fastidiosa
73 411 Canis familiaris (Dog)
74 407 Xanthomonas axonopodis (pv. citri)
75 406 Pyrococcus horikoshii
76 405 Chlamydia pneumoniae (Chlamydophila pneumoniae)
77 403 Rhizobium sp. (strain NGR234)
78 400 Buchnera aphidicola (subsp. Baizongia pistaciae)
79 400 Pyrococcus abyssi
80 400 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
81 395 Chlamydia muridarum
82 382 Clostridium perfringens
83 377 Brucella melitensis
84 375 Brucella suis
85 374 Bradyrhizobium japonicum
86 371 Corynebacterium glutamicum (Brevibacterium flavum)
87 365 Halobacterium sp. (strain NRC-1 / ATCC 700922 / JCM 11081)
88 362 Campylobacter jejuni
89 361 Methanosarcina acetivorans
90 356 Methanosarcina mazei (Methanosarcina frisia)
91 355 Nicotiana tabacum (Common tobacco)
92 355 Pyrococcus furiosus
93 353 Sulfolobus solfataricus
94 353 Thermoanaerobacter tengcongensis
95 348 Streptococcus pyogenes
96 343 Rickettsia conorii
97 342 Ovis aries (Sheep)
98 330 Lactobacillus plantarum
99 330 Shewanella oneidensis
100 321 Aeropyrum pernix
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 8393 ( 6%)
Bacteria 62334 ( 42%)
Eukaryota 67392 ( 46%)
Viruses 8601 ( 6%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 10691 ( 16%) ( 7%)
Other Mammalia 18197 ( 27%) ( 12%)
Other Vertebrata 6287 ( 9%) ( 4%)
Viridiplantae 10743 ( 16%) ( 7%)
Fungi 9849 ( 15%) ( 7%)
Insecta 3685 ( 5%) ( 3%)
Nematoda 2692 ( 4%) ( 2%)
Other 5248 ( 8%) ( 4%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 2672 1001-1100 1294
51- 100 9908 1101-1200 921
101- 150 14435 1201-1300 686
151- 200 13501 1301-1400 496
201- 250 14264 1401-1500 388
251- 300 12453 1501-1600 250
301- 350 12935 1601-1700 185
351- 400 11893 1701-1800 135
401- 450 9066 1801-1900 150
451- 500 7801 1901-2000 120
501- 550 5961 2001-2100 70
551- 600 3965 2101-2200 108
601- 650 3391 2201-2300 100
651- 700 2385 2301-2400 59
701- 750 2073 2401-2500 62
751- 800 1741 >2500 386
801- 850 1359
851- 900 1418
901- 950 1006
951-1000 862
The average sequence length in Swiss-Prot is 368 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is SNE1_HUMAN (Q8NF91): 8797 amino acids.
5. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of Swiss-Prot: 1437
5.1 Table of the frequency of journal citations
Journals cited 1x: 529
2x: 181
3x: 103
4x: 66
5x: 56
6x: 35
7x: 32
8x: 26
9x: 24
10x: 15
11- 20x: 110
21- 50x: 110
51-100x: 46
>100x: 104
5.2 List of the most cited journals in Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 10139 Journal of Biological Chemistry
2 5380 Proceedings of the National Academy of Sciences of the U.S.A.
3 3835 Journal of Bacteriology
4 3693 Nucleic Acids Research
5 3568 Gene
6 2843 FEBS Letters
7 2789 Biochemical and Biophysical Research Communications
8 2573 Biochemistry
9 2543 European Journal of Biochemistry
10 2361 The EMBO Journal
11 2233 Nature
12 2157 Biochimica et Biophysica Acta
13 1949 Journal of Molecular Biology
14 1886 Genomics
15 1728 Cell
16 1700 Molecular and Cellular Biology
17 1353 Biochemical Journal
18 1293 Science
19 1153 Plant Molecular Biology
20 1147 Molecular Microbiology
21 1141 Molecular and General Genetics
22 887 Journal of Biochemistry
23 868 Virology
24 834 Human Molecular Genetics
25 788 Journal of Cell Biology
26 745 Nature Genetics
27 682 Genes and Development
28 657 Journal of Virology
29 641 The American Journal of Human Genetics
30 639 Plant Physiology
31 626 Human Mutation
32 621 Oncogene
33 568 Infection and Immunity
34 566 Journal of Immunology
35 551 Yeast
36 519 Journal of General Virology
37 517 Structure
38 505 Archives of Biochemistry and Biophysics
39 488 Microbiology
40 475 FEMS Microbiology Letters
41 475 Development
42 436 Nature Structural Biology
43 423 Genetics
44 416 Human Genetics
45 399 Current Genetics
46 383 Blood
47 367 Molecular and Biochemical Parasitology
48 345 Applied and Environmental Microbiology
49 336 Journal of Clinical Investigation
50 318 Mammalian Genome
51 316 Molecular Endocrinology
52 314 Developmental Biology
53 310 Protein Science
54 297 Immunogenetics
55 297 DNA and Cell Biology
56 293 Cancer Research
57 291 Journal of Molecular Evolution
58 279 Neuron
59 274 The Journal of Experimental Medicine
60 274 Molecular Biology of the Cell
61 271 Biological Chemistry Hoppe-Seyler
62 269 Mechanisms of Development
63 265 Acta Crystallographica, Section D
64 265 The Plant Cell
65 250 Endocrinology
66 246 Journal of Cell Science
67 239 DNA Sequence
68 234 The Plant Journal
69 232 Journal of General Microbiology
70 223 Journal of Neuroscience
71 222 Molecular Biology and Evolution
72 213 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
73 212 Journal of Neurochemistry
74 208 Brain Research. Molecular Brain Research
75 206 The Journal of Clinical Endocrinology and Metabolism
76 194 Cytogenetics and Cell Genetics
77 182 Toxicon
78 177 Comparative Biochemistry and Physiology
79 175 American Journal of Physiology
80 174 Bioscience, Biotechnology, and Biochemistry
81 167 Molecular Cell
82 163 Molecular Pharmacology
83 160 Antimicrobial Agents and Chemotherapy
84 158 Current Biology
85 156 DNA
86 144 Journal of Investigative Dermatology
87 142 Tissue Antigens
88 141 DNA Research
89 140 Proteins
90 136 Molecular Plant-Microbe Interactions
91 136 Biochimie
92 132 Peptides
93 132 Virus Research
94 132 Journal of Medical Genetics
95 129 Bioorganicheskaia Khimiia
96 125 American Journal of Medical Genetics
97 124 Genome Research
98 120 Hemoglobin
99 117 Molecular and Cellular Endocrinology
100 114 Agricultural and Biological Chemistry
101 108 Biology of Reproduction
102 107 Plant and Cell Physiology
103 105 European Journal of Immunology
104 102 Archives of Microbiology
6. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 283423 1.93
Journal 250489 136613 1.71
Submitted to EMBL/GenBank/DDBJ 30227 25422 0.21
Unpublished observations 536 532 <0.01
Submitted to Swiss-Prot 527 525 <0.01
Plant Gene Register 487 476 <0.01
Book citation 465 453 <0.01
Thesis 263 261 <0.01
Submitted to other databases 203 202 <0.01
Unpublished results 127 125 <0.01
Patent 97 96 <0.01
Worm Breeder's Gazette 2 2 <0.01
Comments (CC) 512278 3.49
SIMILARITY 147106 127783 1.00
FUNCTION 93944 92443 0.64
SUBCELLULAR LOCATION 69668 69668 0.47
CATALYTIC ACTIVITY 51230 48248 0.35
SUBUNIT 44297 44297 0.30
PATHWAY 24285 23209 0.17
COFACTOR 17058 17058 0.12
TISSUE SPECIFICITY 16130 16130 0.11
PTM 9012 8155 0.06
MISCELLANEOUS 8738 8050 0.06
ALTERNATIVE PRODUCTS 5272 5272 0.04
DOMAIN 4975 4457 0.03
CAUTION 4387 4105 0.03
INDUCTION 4067 4067 0.03
DEVELOPMENTAL STAGE 3868 3868 0.03
DISEASE 2514 1876 0.02
ENZYME REGULATION 2012 2012 0.01
DATABASE 1294 1217 0.01
MASS SPECTROMETRY 1213 1078 0.01
POLYMORPHISM 454 444 <0.01
ALLERGEN 335 335 <0.01
RNA EDITING 295 295 <0.01
BIOTECHNOLOGY 77 77 <0.01
PHARMACEUTICAL 47 47 <0.01
Features (FT) 831689 5.67
DOMAIN 116075 35669 0.79
TRANSMEM 91827 20000 0.63
TURN 62474 4662 0.43
STRAND 57252 4163 0.39
CONFLICT 55195 19373 0.38
METAL 54792 13543 0.37
CARBOHYD 50364 12429 0.34
DISULFID 46118 12311 0.31
HELIX 45117 4520 0.31
REPEAT 31810 4634 0.22
ACT_SITE 31418 19008 0.21
VARIANT 27420 5089 0.19
CHAIN 26007 21096 0.18
NP_BIND 18909 13304 0.13
SIGNAL 16306 16304 0.11
MOD_RES 14982 8452 0.10
NON_TER 10597 8092 0.07
SITE 10154 6238 0.07
VARSPLIC 9968 4490 0.07
BINDING 9770 7652 0.07
ZN_FING 9215 3288 0.06
MUTAGEN 6587 1880 0.04
INIT_MET 6135 6090 0.04
PROPEP 5196 4409 0.04
DNA_BIND 4618 4327 0.03
LIPID 4175 2790 0.03
PEPTIDE 2806 1101 0.02
TRANSIT 2791 2766 0.02
CA_BIND 1840 792 0.01
NON_CONS 835 428 0.01
CROSSLNK 458 360 <0.01
UNSURE 315 131 <0.01
SE_CYS 163 108 <0.01
Cross-references (DR) 1336204 9.11
EMBL 284537 140087 1.94
InterPro 264209 129846 1.80
Pfam 168429 124579 1.15
PROSITE 128678 81216 0.88
PIR 88842 81354 0.61
PRINTS 47994 42345 0.33
GO 47413 14722 0.32
SMART 42918 32422 0.29
HAMAP 40549 40436 0.28
TIGRFAMs 40318 37407 0.27
HSSP 38738 38738 0.26
ProDom 36531 35107 0.25
PDB 22244 6010 0.15
TIGR 14632 14556 0.10
Genew 9613 9565 0.07
MIM 9433 7904 0.06
MGD 6973 6952 0.05
SGD 4973 4919 0.03
GermOnline 4927 4876 0.03
EcoGene 4227 4225 0.03
MEROPS 3454 3339 0.02
WormPep 2730 2439 0.02
SubtiList 2667 2666 0.02
TRANSFAC 2648 2373 0.02
FlyBase 2520 2446 0.02
GeneDB_SPombe 2399 2369 0.02
RGD 2297 2297 0.02
TubercuList 1421 1385 0.01
StyGene 1362 1359 0.01
PIRSF 1168 1168 0.01
SWISS-2DPAGE 1075 1075 0.01
ListiList 921 860 0.01
Leproma 609 605 <0.01
GK 594 594 <0.01
Gramene 556 552 <0.01
MaizeDB 411 406 <0.01
HIV 370 354 <0.01
REBASE 361 356 <0.01
ECO2DBASE 351 299 <0.01
DictyBase 321 319 <0.01
ZFIN 260 260 <0.01
GlycoSuiteDB 259 259 <0.01
PHCI-2DPAGE 214 214 <0.01
SagaList 205 204 <0.01
PhotoList 175 175 <0.01
MypuList 159 159 <0.01
Aarhus/Ghent-2DPAGE 128 98 <0.01
Siena-2DPAGE 103 103 <0.01
HSC-2DPAGE 85 85 <0.01
PhosSite 53 53 <0.01
COMPLUYEAST-2DPAGE 50 50 <0.01
PMMA-2DPAGE 48 48 <0.01
Maize-2DPAGE 39 39 <0.01
ANU-2DPAGE 13 13 <0.01
7. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in Swiss-Prot: 180569
Total number of entries encoded on a chloroplast: 3494
Total number of entries encoded on a mitochondrion: 2886
Total number of entries encoded on a cyanelle: 145
Total number of entries encoded on a plasmid: 2736
Number of fragments: 8221
Number of additional sequences encoded on splice variants: 7776
| We need your help |
|---|
We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available. To facilitate this feedback we offer, on the ExPASy WWW server, a form that allows the submission of updates and/or corrections to Swiss-Prot:
It is also possible, from any entry in Swiss-Prot displayed by the ExPASy server, to submit updates and/or corrections for that particular entry. Finally, you can also send your comments by electronic mail to the address:
Note that all update requests are assigned a unique identifier of the form UR-Xnnnn (example: UR-A0123). This identifier is used internally by the Swiss-Prot staff at SIB and EBI to track requests and is also used in e-mail exchanges with the persons who have submitted a request.