UniProt Knowledgebase |
Release notes UniProtKB release 15.0 of 24-Mar-2009 |
| Content |
|---|
Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.
| Introduction |
|---|
Release 15.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 57.0 and the UniProtKB/TrEMBL Protein Database release 40.0.
More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.
| UniProtKB/Swiss-Prot protein knowledgebase release 57.0 statistics |
|---|
The growth of the database is summarized below.
| Release | Date | Number of entries | Number of amino acids |
|---|---|---|---|
| 2.0 | 09/86 | 3'939 | 900'163 |
| 3.0 | 11/86 | 4'160 | 969'641 |
| 4.0 | 04/87 | 4'387 | 1'036'010 |
| 5.0 | 09/87 | 5'205 | 1'327'683 |
| 6.0 | 01/88 | 6'102 | 1'653'982 |
| 7.0 | 04/88 | 6'821 | 1'885'771 |
| 8.0 | 08/88 | 7'724 | 2'224'465 |
| 9.0 | 11/88 | 8'702 | 2'498'140 |
| 10.0 | 03/89 | 10'008 | 2'952'613 |
| 11.0 | 07/89 | 10'856 | 3'265'966 |
| 12.0 | 10/89 | 12'305 | 3'797'482 |
| 13.0 | 01/90 | 13'837 | 4'347'336 |
| 14.0 | 04/90 | 15'409 | 4'914'264 |
| 15.0 | 08/90 | 16'941 | 5'486'399 |
| 16.0 | 11/90 | 18'364 | 5'986'949 |
| 17.0 | 02/91 | 20'024 | 6'524'504 |
| 18.0 | 05/91 | 20'772 | 6'792'034 |
| 19.0 | 08/91 | 21'795 | 7'173'785 |
| 20.0 | 11/91 | 22'654 | 7'500'130 |
| 21.0 | 03/92 | 23'742 | 7'866'596 |
| 22.0 | 05/92 | 25'044 | 8'375'696 |
| 23.0 | 08/92 | 26'706 | 9'011'391 |
| 24.0 | 12/92 | 28'154 | 9'545'427 |
| 25.0 | 04/93 | 29'955 | 10'214'020 |
| 26.0 | 07/93 | 31'808 | 10'875'091 |
| 27.0 | 10/93 | 33'329 | 11'484'420 |
| 28.0 | 02/94 | 36'000 | 12'496'420 |
| 29.0 | 06/94 | 38'303 | 13'464'008 |
| 30.0 | 10/94 | 40'292 | 14'147'368 |
| 31.0 | 02/95 | 43'470 | 15'335'248 |
| 32.0 | 11/95 | 49'340 | 17'385'503 |
| 33.0 | 02/96 | 52'205 | 18'531'384 |
| 34.0 | 10/96 | 59'021 | 21'210'389 |
| 35.0 | 11/97 | 69'113 | 25'083'768 |
| 36.0 | 07/98 | 74'019 | 26'840'295 |
| 37.0 | 12/98 | 77'977 | 28'268'293 |
| 38.0 | 07/99 | 80'000 | 29'085'965 |
| 39.0 | 05/00 | 86'593 | 31'411'114 |
| 40.0 | 10/01 | 101'602 | 37'315'215 |
| 41.0 | 02/03 | 122'564 | 44'986'459 |
| 42.0 | 10/03 | 135'850 | 50'046'799 |
| 43.0 | 03/04 | 146'720 | 54'093'154 |
| 44.0 | 07/04 | 153'871 | 56'608'159 |
| 45.0 | 10/04 | 163'235 | 59'631'787 |
| 46.0 | 02/05 | 168'297 | 61'443'278 |
| 47.0 | 05/05 | 181'577 | 65'746'672 |
| 48.0 | 09/05 | 194'317 | 70'391'852 |
| 49.0 | 02/06 | 207'132 | 75'438'310 |
| 50.0 | 05/06 | 222'289 | 81'585'146 |
| 51.0 | 10/06 | 241'242 | 88'541'632 |
| 52.0 | 03/07 | 261'513 | 95'638'062 |
| 53.0 | 05/07 | 269'293 | 98'902'758 |
| 54.0 | 07/07 | 276'256 | 101'466'206 |
| 55.0 | 02/08 | 356'194 | 127'836'513 |
| 56.0 | 07/08 | 392'667 | 141'217'034 |
| 57.0 | 03/09 | 428'650 | 154'416'236 |
In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.
We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:
From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:
| Organism | Database cross-references | Index file | Number of sequences |
|---|---|---|---|
| A.thaliana | TAIR | arath.txt | 7'876 |
| C.albicans | None yet | calbican.txt | 767 |
| C.elegans | Wormpep | celegans.txt | 3218 |
| D.discoideum | DictyBase | dicty.txt | 3'557 |
| D.melanogaster | FlyBase | fly.txt | 2'904 |
| M.musculus | MGD | mgdtosp.txt | 16'101 |
| S.cerevisiae | SGD | yeast.txt | 6'552 |
| S.pombe | GeneDB_SPombe | pombe.txt | 4'752 |
UniProtKB/Swiss-Prot protein knowledgebase release 57.0 statistics
1. INTRODUCTION
Release 57.0 of 24-Mar-09 of UniProtKB/Swiss-Prot contains 428650 sequence entries,
comprising 154416236 amino acids abstracted from 177584 references.
36053 sequences have been added since release 56.0, the sequence data of
2010 existing entries has been updated and the annotations of
368500 entries have been revised.
Number of fragments: 8328
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 27591
Protein existence (PE): entries %
1: Evidence at protein level 63411 14.8%
2: Evidence at transcript level 64726 15.1%
3: Inferred from homology 285291 66.6%
4: Predicted 13812 3.2%
5: Uncertain 1410 0.3%
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 11669
The first twenty species represent 103439 sequences: 24.1 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5235
2x: 1703
3x: 854
4x: 556
5x: 413
6x: 319
7x: 228
8x: 194
9x: 172
10x: 101
11- 20x: 515
21- 50x: 364
51-100x: 217
>100x: 798
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20333 Homo sapiens (Human)
2 16101 Mus musculus (Mouse)
3 7876 Arabidopsis thaliana (Mouse-ear cress)
4 7314 Rattus norvegicus (Rat)
5 6552 Saccharomyces cerevisiae (Baker's yeast)
6 5600 Bos taurus (Bovine)
7 4752 Schizosaccharomyces pombe (Fission yeast)
8 4342 Escherichia coli (strain K12)
9 3600 Bacillus subtilis
10 3557 Dictyostelium discoideum (Slime mold)
11 3218 Caenorhabditis elegans
12 2980 Xenopus laevis (African clawed frog)
13 2904 Drosophila melanogaster (Fruit fly)
14 2429 Danio rerio (Zebrafish) (Brachydanio rerio)
15 2199 Pongo abelii (Sumatran orangutan)
16 2104 Gallus gallus (Chicken)
17 2044 Oryza sativa subsp. japonica (Rice)
18 1979 Escherichia coli O157:H7
19 1782 Methanocaldococcus jannaschii (Methanococcus jannaschii)
20 1773 Haemophilus influenzae
21 1736 Salmonella typhimurium
22 1652 Escherichia coli O6
23 1649 Shigella flexneri
24 1462 Mycobacterium tuberculosis
25 1343 Sus scrofa (Pig)
26 1334 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
27 1323 Salmonella typhi
28 1260 Pseudomonas aeruginosa
29 1198 Mycobacterium bovis
30 1140 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
31 1012 Synechocystis sp. (strain PCC 6803)
32 989 Archaeoglobus fulgidus
33 980 Yersinia pestis
34 927 Vibrio cholerae
35 909 Acanthamoeba polyphaga mimivirus (APMV)
36 904 Salmonella paratyphi A
37 898 Rhizobium meliloti (Sinorhizobium meliloti)
38 896 Staphylococcus aureus (strain N315)
39 896 Staphylococcus aureus (strain Mu50 / ATCC 700699)
40 881 Oryctolagus cuniculus (Rabbit)
41 869 Staphylococcus aureus (strain COL)
42 867 Staphylococcus aureus (strain MW2)
43 862 Staphylococcus aureus (strain MSSA476)
44 859 Staphylococcus aureus (strain MRSA252)
45 854 Salmonella choleraesuis
46 846 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
47 844 Yersinia pseudotuberculosis
48 842 Shigella sonnei (strain Ss046)
49 795 Escherichia coli O9:H4 (strain HS)
50 794 Shigella boydii serotype 4 (strain Sb227)
51 784 Ashbya gossypii (Yeast) (Eremothecium gossypii)
52 784 Escherichia coli O139:H28 (strain E24377A / ETEC)
53 783 Escherichia coli (strain UTI89 / UPEC)
54 782 Vibrio parahaemolyticus
55 776 Shigella dysenteriae serotype 1 (strain Sd197)
56 767 Candida albicans (Yeast)
57 765 Pasteurella multocida
58 764 Aquifex aeolicus
59 760 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
60 758 Kluyveromyces lactis (Yeast) (Candida sphaerica)
61 756 Canis familiaris (Dog)
62 751 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
63 745 Neurospora crassa
64 723 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
65 722 Streptomyces coelicolor
66 722 Staphylococcus epidermidis (strain ATCC 12228)
67 719 Shigella flexneri serotype 5b (strain 8401)
68 719 Vibrio vulnificus
69 716 Photorhabdus luminescens subsp. laumondii
70 715 Candida glabrata (Yeast) (Torulopsis glabrata)
71 709 Bacillus halodurans
72 703 Vibrio vulnificus (strain YJ016)
73 694 Bacillus anthracis
74 693 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
75 688 Yersinia pestis bv. Antiqua (strain Nepal516)
76 687 Mycoplasma pneumoniae
77 682 Yersinia pestis bv. Antiqua (strain Antiqua)
78 677 Pan troglodytes (Chimpanzee)
79 677 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
80 671 Staphylococcus aureus (strain NCTC 8325)
81 670 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
82 669 Escherichia coli O1:K1 / APEC
83 668 Anabaena sp. (strain PCC 7120)
84 662 Enterobacter sp. (strain 638)
85 660 Pseudomonas syringae pv. tomato
86 655 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
87 653 Pseudomonas putida (strain KT2440)
88 652 Mycobacterium leprae
89 637 Escherichia coli
90 635 Yersinia pestis (strain Pestoides F)
91 631 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
92 631 Bradyrhizobium japonicum
93 626 Staphylococcus aureus (strain USA300)
94 620 Zea mays (Maize)
95 615 Serratia proteamaculans (strain 568)
96 614 Treponema pallidum
97 613 Bacillus cereus (strain ATCC 14579 / DSM 31)
98 603 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
99 602 Staphylococcus aureus (strain bovine RF122 / ET3-1)
100 601 Shewanella oneidensis
101 600 Methanobacterium thermoautotrophicum
102 600 Ralstonia solanacearum (Pseudomonas solanacearum)
103 591 Rhizobium loti (Mesorhizobium loti)
104 590 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
105 583 Listeria monocytogenes
106 583 Rickettsia prowazekii
107 579 Photobacterium profundum (Photobacterium sp. (strain SS9))
108 579 Helicobacter pylori (Campylobacter pylori)
109 576 Xanthomonas campestris pv. campestris
110 575 Listeria innocua
111 573 Lactococcus lactis subsp. lactis (Streptococcus lactis)
112 573 Staphylococcus haemolyticus (strain JCSC1435)
113 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
114 570 Neisseria meningitidis serogroup B
115 569 Emericella nidulans (Aspergillus nidulans)
116 566 Enterobacter sakazakii (strain ATCC BAA-894)
117 565 Staphylococcus saprophyticus subsp. saprophyticus
118 563 Yarrowia lipolytica (Candida lipolytica)
119 562 Brucella melitensis
120 562 Buchnera aphidicola subsp. Schizaphis graminum
121 561 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
122 560 Helicobacter pylori J99 (Campylobacter pylori J99)
123 559 Bacillus cereus (strain ATCC 10987)
124 559 Brucella suis
125 546 Neisseria meningitidis serogroup A
126 540 Bacillus thuringiensis subsp. konkukian
127 539 Xanthomonas axonopodis pv. citri (Citrus canker)
128 536 Caulobacter crescentus (Caulobacter vibrioides)
129 534 Clostridium acetobutylicum
130 534 Pseudomonas syringae pv. syringae (strain B728a)
131 531 Bacillus cereus (strain ZK / E33L)
132 530 Oceanobacillus iheyensis
133 529 Pseudomonas aeruginosa (strain UCBPP-PA14)
134 526 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
135 525 Pseudomonas fluorescens (strain Pf0-1)
136 524 Vibrio fischeri (strain ATCC 700601 / ES114)
137 521 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
138 516 Listeria monocytogenes serotype 4b (strain F2365)
139 512 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
140 510 Streptococcus pneumoniae
141 510 Xylella fastidiosa
142 508 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
143 507 Buchnera aphidicola subsp. Baizongia pistaciae
144 502 Thermotoga maritima
145 501 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
146 496 Chromobacterium violaceum
147 493 Bordetella parapertussis
148 493 Rickettsia conorii
149 493 Sodalis glossinidius (strain morsitans)
150 493 Bordetella pertussis
151 492 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
152 491 Haemophilus ducreyi
153 485 Brucella abortus
154 483 Mycoplasma genitalium
155 483 Deinococcus radiodurans
156 480 Pseudomonas aeruginosa (strain PA7)
157 479 Clostridium perfringens
158 475 Corynebacterium glutamicum (Brevibacterium flavum)
159 474 Pseudomonas entomophila (strain L48)
160 473 Haemophilus influenzae (strain 86-028NP)
161 472 Methanosarcina acetivorans
162 472 Xanthomonas campestris pv. campestris (strain 8004)
163 470 Geobacillus kaustophilus
164 469 Streptomyces avermitilis
165 469 Bacillus clausii (strain KSM-K16)
166 468 Mannheimia succiniciproducens (strain MBEL55E)
167 468 Burkholderia pseudomallei (Pseudomonas pseudomallei)
168 463 Shewanella sp. (strain MR-7)
169 462 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
170 460 Pyrococcus horikoshii
171 460 Thermosynechococcus elongatus (strain BP-1)
172 460 Shewanella sp. (strain MR-4)
173 459 Staphylococcus aureus (strain Newman)
174 458 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
175 457 Oryza sativa subsp. indica (Rice)
176 456 Brucella abortus (strain 2308)
177 456 Pyrococcus abyssi
178 455 Enterococcus faecalis (Streptococcus faecalis)
179 453 Methanosarcina mazei (Methanosarcina frisia)
180 452 Halobacterium salinarium (Halobacterium halobium)
181 448 Rickettsia felis (Rickettsia azadi)
182 447 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
183 447 Aspergillus fumigatus (Sartorya fumigata)
184 446 Rhodopseudomonas palustris
185 446 Lactobacillus plantarum
186 445 Burkholderia mallei (Pseudomonas mallei)
187 445 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
188 444 Pseudomonas putida (strain F1 / ATCC 700007)
189 443 Burkholderia sp. (strain 383) (Burkholderia cepacia
190 443 Xanthomonas campestris pv. vesicatoria (strain 85-10)
191 441 Streptococcus mutans
192 441 Ovis aries (Sheep)
193 440 Acinetobacter sp. (strain ADP1)
194 440 Bacillus amyloliquefaciens (strain FZB42)
195 439 Chlamydia trachomatis
196 438 Thermoanaerobacter tengcongensis
197 438 Staphylococcus aureus (strain Mu3 / ATCC 700698)
198 437 Pyrococcus furiosus
199 435 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
200 435 Shewanella frigidimarina (strain NCIMB 400)
201 435 Rickettsia bellii (strain RML369-C)
202 434 Pseudomonas putida (strain GB-1)
203 434 Shewanella sp. (strain ANA-3)
204 433 Streptococcus pyogenes serotype M6
205 433 Nicotiana tabacum (Common tobacco)
206 433 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
207 430 Ralstonia eutropha (Cupriavidus necator
208 427 Borrelia burgdorferi (Lyme disease spirochete)
209 427 Methylococcus capsulatus
210 427 Campylobacter jejuni
211 426 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
212 422 Shewanella baltica (strain OS185)
213 422 Chlamydia pneumoniae (Chlamydophila pneumoniae)
214 418 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
215 418 Gloeobacter violaceus
216 418 Pseudoalteromonas haloplanktis (strain TAC 125)
217 417 Hahella chejuensis (strain KCTC 2396)
218 415 Streptococcus pyogenes serotype M1
219 414 Mycobacterium paratuberculosis
220 413 Pseudomonas mendocina (strain ymp)
221 412 Chlamydia muridarum
222 412 Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
223 412 Sulfolobus solfataricus
224 412 Burkholderia xenovorans (strain LB400)
225 411 Staphylococcus aureus (strain JH1)
226 411 Nitrosomonas europaea
227 409 Streptococcus pyogenes serotype M18
228 409 Rhizobium sp. (strain NGR234)
229 409 Dechloromonas aromatica (strain RCB)
230 408 Shewanella sp. (strain W3-18-1)
231 408 Streptococcus pyogenes serotype M3
232 408 Shewanella putrefaciens (strain CN-32 / ATCC BAA-453)
233 407 Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1)
234 407 Shewanella baltica (strain OS195)
235 405 Staphylococcus aureus (strain JH9)
236 405 Aeromonas salmonicida (strain A449)
237 404 Rickettsia typhi
238 404 Shewanella denitrificans (strain OS217 / ATCC BAA-1090 / DSM 15013)
239 403 Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
240 401 Shewanella baltica (strain OS155 / ATCC BAA-1091)
241 400 Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
242 400 Chlorobium tepidum
243 400 Idiomarina loihiensis
244 400 Synechococcus sp. (strain WH8102)
245 399 Haemophilus influenzae (strain PittEE)
246 399 Burkholderia cenocepacia (strain AU 1054)
247 397 Shewanella amazonensis (strain ATCC BAA-1098 / SB2B)
248 397 Caenorhabditis briggsae
249 396 Actinobacillus pleuropneumoniae serotype 5b (strain L20)
250 396 Corynebacterium efficiens
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 15698 ( 4%)
Bacteria 249878 ( 58%)
Eukaryota 150533 ( 35%)
Viruses 12541 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20334 ( 14%) ( 5%)
Other Mammalia 43931 ( 29%) ( 10%)
Other Vertebrata 14925 ( 10%) ( 3%)
Viridiplantae 27014 ( 18%) ( 6%)
Fungi 23102 ( 15%) ( 5%)
Insecta 6145 ( 4%) ( 1%)
Nematoda 3869 ( 3%) ( 1%)
Other 11213 ( 7%) ( 3%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 7410 1001-1100 3070
51- 100 31441 1101-1200 2119
101- 150 44644 1201-1300 1666
151- 200 45150 1301-1400 1581
201- 250 45195 1401-1500 1289
251- 300 39221 1501-1600 599
301- 350 39049 1601-1700 472
351- 400 34750 1701-1800 389
401- 450 27771 1801-1900 364
451- 500 22997 1901-2000 301
501- 550 16055 2001-2100 184
551- 600 11862 2101-2200 255
601- 650 10145 2201-2300 263
651- 700 7254 2301-2400 162
701- 750 6070 2401-2500 118
751- 800 4263 >2500 938
801- 850 3652
851- 900 4290
901- 950 3123
951-1000 2210
The average sequence length in UniProtKB/Swiss-Prot is 360 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1975
4.1 Table of the frequency of journal citations
Journals cited 1x: 647
2x: 267
3x: 132
4x: 107
5x: 77
6x: 60
7x: 38
8x: 41
9x: 33
10x: 23
11- 20x: 151
21- 50x: 157
51-100x: 91
>100x: 151
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 16828 Journal of Biological Chemistry
2 7853 Proceedings of the National Academy of Sciences of the U.S.A.
3 4843 Journal of Bacteriology
4 4434 Gene
5 4294 Biochemical and Biophysical Research Communications
6 4221 Nucleic Acids Research
7 3817 FEBS Letters
8 3625 Biochemistry
9 3557 The EMBO Journal
10 3205 Molecular and Cellular Biology
11 3051 Nature
12 3045 European Journal of Biochemistry
13 2879 Biochimica et Biophysica Acta
14 2828 Journal of Molecular Biology
15 2489 Cell
16 2457 Genomics
17 2075 Biochemical Journal
18 1957 Science
19 1785 Journal of Virology
20 1652 Molecular Microbiology
21 1472 Journal of Cell Biology
22 1453 Plant Molecular Biology
23 1293 Molecular and General Genetics
24 1269 Virology
25 1247 Genes and Development
26 1247 Nature Genetics
27 1235 Human Molecular Genetics
28 1177 Plant Physiology
29 1142 The American Journal of Human Genetics
30 1132 Journal of Biochemistry
31 1129 Oncogene
32 1034 Development
33 972 Human Mutation
34 935 Journal of Immunology
35 911 Genetics
36 909 Molecular Biology of the Cell
37 836 Infection and Immunity
38 833 Structure
39 810 Journal of General Virology
40 779 Archives of Biochemistry and Biophysics
41 777 The Plant Cell
42 734 Blood
43 728 Yeast
44 706 Microbiology
45 696 Molecular Cell
46 645 Developmental Biology
47 641 The Plant Journal
48 640 Journal of Cell Science
49 624 FEMS Microbiology Letters
50 618 Cancer Research
51 580 Human Genetics
52 574 Nature Structural Biology
53 565 Current Biology
54 553 Mechanisms of Development
55 515 Current Genetics
56 495 Acta Crystallographica, Section D
57 494 Journal of Neuroscience
58 490 Applied and Environmental Microbiology
59 487 Protein Science
60 481 Journal of Clinical Investigation
61 472 Neuron
62 464 Mammalian Genome
63 461 Toxicon
64 428 Immunogenetics
65 422 The Journal of Experimental Medicine
66 418 Molecular Endocrinology
67 416 American Journal of Physiology
68 411 Molecular and Biochemical Parasitology
69 388 Journal of Neurochemistry
70 368 Endocrinology
71 367 Journal of Molecular Evolution
72 359 DNA and Cell Biology
73 358 The Journal of Clinical Endocrinology and Metabolism
74 351 DNA Sequence
75 342 Molecular Biology and Evolution
76 328 Bioscience, Biotechnology, and Biochemistry
77 324 Journal of Medical Genetics
78 308 Proteins
79 308 Brain Research. Molecular Brain Research
80 287 Biological Chemistry Hoppe-Seyler
81 273 Cytogenetics and Cell Genetics
82 267 Comparative Biochemistry and Physiology
83 266 Peptides
84 265 Journal of Investigative Dermatology
85 265 Antimicrobial Agents and Chemotherapy
86 256 Plant and Cell Physiology
87 250 Molecular Pharmacology
88 248 Biology of Reproduction
89 246 Nature Cell Biology
90 246 Experimental Cell Research
91 245 Journal of General Microbiology
92 234 Genome Research
93 221 Virus Research
94 218 Neurology
95 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
96 208 Developmental Dynamics
97 204 RNA
98 201 DNA Research
99 197 Molecular Plant-Microbe Interactions
100 193 Biochimie
101 192 European Journal of Immunology
102 184 Annals of Neurology
103 183 Tissue Antigens
104 182 European Journal of Human Genetics
105 181 Planta
106 179 Developmental Cell
107 173 Journal of Human Genetics
108 172 Genes to Cells
109 168 Immunity
110 166 Molecular and Cellular Endocrinology
111 161 Eukaryotic cell
112 161 Molecular Phylogenetics and Evolution
113 160 Archives of Microbiology
114 159 DNA
115 158 American Journal of Medical Genetics
116 157 The New England Journal of Medicine
117 152 Hemoglobin
118 150 Insect Biochemistry and Molecular Biology
119 148 Bioorganicheskaia Khimiia
120 147 Investigative Ophthalmology and Visual Science
121 144 Molecular Reproduction and Development
122 140 Diabetes
123 138 Molecular Immunology
124 138 Glycobiology
125 135 Animal Genetics
126 132 General and Comparative Endocrinology
127 128 Molecular and Cellular Neuroscience
128 128 International Journal of Cancer
129 127 Clinical Genetics
130 124 The FASEB Journal
131 124 Archives of Virology
132 123 EMBO Reports
133 119 Agricultural and Biological Chemistry
134 119 Molecular Genetics and Metabolism
135 115 British Journal of Haematology
136 114 Nature Structural and Molecular Biology
137 113 Molecular Genetics and Genomics
138 112 Journal of Cellular Biochemistry
139 111 Journal of Protein Chemistry
140 110 The FEBS Journal
141 109 Biological Chemistry
142 107 Thrombosis and Haemostasis
143 107 Journal of Neuroscience Research
144 107 Journal of the American Chemical Society
145 106 American Journal of Medical Genetics. Part A
146 105 Nature Immunology
147 105 Neuroscience Letters
148 105 Journal of Lipid Research
149 104 Journal of Molecular Endocrinology
150 103 Protein Expression and Purification
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 781540 1.82
Journal 628701 333076 1.47 1
Submitted to EMBL/GenBank/DDBJ 141218 129587 0.33 2
Submitted to other databases 9617 8507 0.02 3
Book citation 622 611 <0.01 4
Plant Gene Register 556 544 <0.01 5
Thesis 389 387 <0.01 6
Unpublished observations 288 284 <0.01 7
Patent 143 141 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 271220
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 1789693 4.18
ALLERGEN 452 452 <0.01 26
ALTERNATIVE PRODUCTS 17720 17720 0.04 12
BIOPHYSICOCHEMICAL PROPERTIES 2500 2500 0.01 22
BIOTECHNOLOGY 241 239 <0.01 28
CATALYTIC ACTIVITY 175352 160149 0.41 4
CAUTION 6045 5925 0.01 19
COFACTOR 75945 69750 0.18 7
DEVELOPMENTAL STAGE 7930 7930 0.02 16
DISEASE 4495 3090 0.01 20
DISRUPTION PHENOTYPE 1609 1609 <0.01 23
DOMAIN 26533 23452 0.06 11
ENZYME REGULATION 6664 6664 0.02 18
FUNCTION 310429 299159 0.72 2
INDUCTION 9805 9805 0.02 15
INTERACTION 11265 11265 0.03 14
MASS SPECTROMETRY 3883 2946 0.01 21
MISCELLANEOUS 27141 24903 0.06 10
PATHWAY 98123 89621 0.23 6
PHARMACEUTICAL 80 80 <0.01 29
POLYMORPHISM 735 706 <0.01 24
PTM 31005 25377 0.07 8
RNA EDITING 560 560 <0.01 25
SEQUENCE CAUTION 11577 11577 0.03 13
SIMILARITY 497292 404571 1.16 1
SUBCELLULAR LOCATION 249793 245311 0.58 3
SUBUNIT 173827 173827 0.41 5
TISSUE SPECIFICITY 30654 30654 0.07 9
TOXIC DOSE 392 384 <0.01 27
WEB RESOURCE 7646 6129 0.02 17
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 2723263 6.35
ACT_SITE 104207 62057 0.24 11
BINDING 155163 49238 0.36 4
CA_BIND 3566 1449 0.01 35
CARBOHYD 89711 23184 0.21 12
CHAIN 434909 424497 1.01 1
COILED 16267 10833 0.04 26
COMPBIAS 43223 23345 0.10 18
CONFLICT 111609 38964 0.26 9
CROSSLNK 4122 2734 0.01 34
DISULFID 88560 23114 0.21 13
DNA_BIND 9381 8704 0.02 31
DOMAIN 126447 73205 0.29 6
HELIX 112953 11607 0.26 8
INIT_MET 12879 12879 0.03 27
LIPID 9803 6321 0.02 29
METAL 208340 52300 0.49 3
MOD_RES 129042 42164 0.30 5
MOTIF 28332 18294 0.07 22
MUTAGEN 26290 6325 0.06 25
NON_CONS 1569 627 <0.01 36
NON_STD 340 266 <0.01 38
NON_TER 11304 8588 0.03 28
NP_BIND 82019 55056 0.19 14
PEPTIDE 7852 4848 0.02 32
PROPEP 9800 8166 0.02 30
REGION 69832 39220 0.16 17
REPEAT 81125 11967 0.19 15
SIGNAL 30875 30865 0.07 20
SITE 29264 17057 0.07 21
STRAND 116537 10976 0.27 7
TOPO_DOM 107079 21833 0.25 10
TRANSIT 5932 5846 0.01 33
TRANSMEM 292168 59681 0.68 2
TURN 27890 9332 0.07 23
UNSURE 946 300 <0.01 37
VAR_SEQ 37208 15859 0.09 19
VARIANT 70313 15203 0.16 16
ZN_FING 26406 11048 0.06 24
Total number of feature keys: 38
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 8910885 20.79
2DBase-Ecoli 84 84 <0.01 102 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 99 2D gel databases
AGD 790 784 <0.01 77 Organism-specific databases
ANU-2DPAGE 23 23 <0.01 109 2D gel databases
ArrayExpress 54151 54151 0.13 30 Gene expression databases
Bgee 35505 35495 0.08 34 Gene expression databases
BindingDB 297 297 <0.01 92 Other
BioCyc 157015 148907 0.37 14 Enzyme and pathway databases
BRENDA 65123 62330 0.15 26 Enzyme and pathway databases
BuruList 296 296 <0.01 93 Organism-specific databases
CGD 514 512 <0.01 82 Organism-specific databases
CleanEx 30264 29611 0.07 37 Gene expression databases
COMPLUYEAST-2DPAGE 59 59 <0.01 104 2D gel databases
Cornea-2DPAGE 67 67 <0.01 103 2D gel databases
CYGD 6628 6522 0.02 52 Organism-specific databases
dictyBase 3667 3557 0.01 65 Organism-specific databases
DIP 9016 8966 0.02 47 Protein-protein interaction databases
DisProt 397 394 <0.01 86 3D structure databases
DOSAC-COBS-2DPAGE 150 150 <0.01 98 2D gel databases
DrugBank 5316 1625 0.01 54 Other
EchoBASE 4159 4124 0.01 61 Organism-specific databases
ECO2DBASE 351 299 <0.01 90 2D gel databases
EcoGene 4331 4328 0.01 60 Organism-specific databases
EMBL 733511 419465 1.71 3 Sequence databases
Ensembl 68473 66943 0.16 25 Genome annotation databases
euHCVdb 55 44 <0.01 105 Organism-specific databases
FlyBase 4415 4043 0.01 59 Organism-specific databases
Gene3D 194637 161088 0.45 13 Family and domain databases
GeneCards 21183 19899 0.05 38 Organism-specific databases
GeneDB_Spombe 4793 4749 0.01 56 Organism-specific databases
GeneFarm 2504 2483 0.01 70 Organism-specific databases
GeneID 381309 363101 0.89 7 Genome annotation databases
GenomeReviews 284894 266392 0.66 9 Genome annotation databases
GermOnline 41962 41352 0.10 33 Gene expression databases
GlycoSuiteDB 280 280 <0.01 94 PTM databases
GO 1730543 399299 4.04 1 Ontologies
Gramene 3990 3990 0.01 62 Organism-specific databases
H-InvDB 11259 9565 0.03 46 Organism-specific databases
HAMAP 232695 232581 0.54 10 Family and domain databases
HGNC 19216 19059 0.04 40 Organism-specific databases
HOGENOM 204967 204967 0.48 12 Phylogenomic databases
HOVERGEN 76378 76378 0.18 24 Phylogenomic databases
HPA 6200 4994 0.01 53 Organism-specific databases
HSC-2DPAGE 85 85 <0.01 101 2D gel databases
HSSP 84683 84683 0.20 23 3D structure databases
IntAct 20253 20251 0.05 39 Protein-protein interaction databases
InterPro 1083300 399424 2.53 2 Family and domain databases
IPI 85696 61732 0.20 22 Sequence databases
KEGG 355038 334366 0.83 8 Genome annotation databases
LegioList 725 723 <0.01 78 Organism-specific databases
Leproma 655 652 <0.01 81 Organism-specific databases
LinkHub 18287 18287 0.04 41 Other
ListiList 1159 1151 <0.01 75 Organism-specific databases
MaizeGDB 469 464 <0.01 84 Organism-specific databases
MEROPS 7866 7604 0.02 49 Protein family/group databases
MGI 15977 15927 0.04 43 Organism-specific databases
MIM 15492 12279 0.04 45 Organism-specific databases
MypuList 201 201 <0.01 97 Organism-specific databases
NextBio 48267 48265 0.11 32 Other
NMPDR 122888 122860 0.29 16 Genome annotation databases
OGP 378 378 <0.01 88 2D gel databases
Orphanet 3382 1995 0.01 67 Organism-specific databases
PANTHER 155906 143918 0.36 15 Family and domain databases
Pathway_Interaction_DB 4568 1665 0.01 58 Enzyme and pathway databases
PDB 56257 13928 0.13 28 3D structure databases
PDBsum 56248 13927 0.13 29 3D structure databases
PeptideAtlas 5167 5167 0.01 55 Proteomic databases
PeroxiBase 662 646 <0.01 80 Protein family/group databases
Pfam 559239 391183 1.30 4 Family and domain databases
PharmGKB 15843 15831 0.04 44 Organism-specific databases
PHCI-2DPAGE 245 245 <0.01 96 2D gel databases
PhosphoSite 16726 16726 0.04 42 PTM databases
PhosSite 266 266 <0.01 95 PTM databases
PhotoList 716 716 <0.01 79 Organism-specific databases
PIR 113036 103221 0.26 19 Sequence databases
PIRSF 64427 64427 0.15 27 Family and domain databases
PMMA-2DPAGE 52 52 <0.01 106 2D gel databases
PptaseDB 34 34 <0.01 107 Protein family/group databases
PRIDE 33839 33839 0.08 35 Proteomic databases
PRINTS 114793 98243 0.27 18 Family and domain databases
ProDom 114838 111821 0.27 17 Family and domain databases
ProMEX 431 431 <0.01 85 Proteomic databases
PROSITE 385455 243280 0.90 6 Family and domain databases
PseudoCAP 1199 1190 <0.01 73 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 108 2D gel databases
Reactome 4620 2749 0.01 57 Enzyme and pathway databases
REBASE 354 345 <0.01 89 Protein family/group databases
RefSeq 396625 363318 0.93 5 Sequence databases
REPRODUCTION-2DPAGE 1030 942 <0.01 76 2D gel databases
RGD 7194 7189 0.02 50 Organism-specific databases
SagaList 381 380 <0.01 87 Organism-specific databases
SGD 6640 6537 0.02 51 Organism-specific databases
Siena-2DPAGE 102 102 <0.01 100 2D gel databases
SMART 111692 85039 0.26 20 Family and domain databases
SMR 50798 50798 0.12 31 3D structure databases
SubtiList 3537 3535 0.01 66 Organism-specific databases
SWISS-2DPAGE 1182 1182 <0.01 74 2D gel databases
TAIR 7957 7843 0.02 48 Organism-specific databases
TCDB 3095 3060 0.01 69 Protein family/group databases
TIGR 32672 31933 0.08 36 Genome annotation databases
TIGRFAMs 215422 200843 0.50 11 Family and domain databases
TubercuList 1490 1454 <0.01 72 Organism-specific databases
UniGene 85716 78769 0.20 21 Sequence databases
VectorBase 305 296 <0.01 91 Genome annotation databases
World-2DPAGE 501 501 <0.01 83 2D gel databases
WormBase 3670 3585 0.01 64 Organism-specific databases
WormPep 3933 3209 0.01 63 Organism-specific databases
Xenbase 3227 3160 0.01 68 Organism-specific databases
ZFIN 2373 2357 0.01 71 Organism-specific databases
Total number of cross-referenced databases: 109
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.17 Gln (Q) 3.95 Leu (L) 9.67 Ser (S) 6.62
Arg (R) 5.50 Glu (E) 6.74 Lys (K) 5.87 Thr (T) 5.34
Asn (N) 4.07 Gly (G) 7.04 Met (M) 2.41 Trp (W) 1.09
Asp (D) 5.42 His (H) 2.28 Phe (F) 3.88 Tyr (Y) 2.93
Cys (C) 1.40 Ile (I) 5.94 Pro (P) 4.74 Val (V) 6.82
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4433 entries are encoded on a mitochondrion, and 3492 are encoded on a plasmid.
11919 entries are encoded on a plastid,
of which 20 are encoded on apicoplasts,
11406 on chloroplasts,
39 on chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 64801
| UniProtKB/TrEMBL protein database release 40.0 statistics |
|---|
1. INTRODUCTION
Release 40.0 of 24-Mar-2009 of UniProtKB/TrEMBL contains 7'753'442 sequence entries
comprising 2'459'135'421 amino acids.
1'700'878 sequences have been added since release 39, the sequence data of
24'829 existing entries has been updated and the annotations of
4'218'268 entries have been revised. This represents an increase of 31%.
2. AMINO ACID COMPOSITION
2.1 Composition in percent for the complete database
Ala (A) 8.54 Gln (Q) 3.93 Leu (L) 9.83 Ser (S) 6.84
Arg (R) 5.54 Glu (E) 6.09 Lys (K) 5.22 Thr (T) 5.60
Asn (N) 4.17 Gly (G) 7.05 Met (M) 2.42 Trp (W) 1.33
Asp (D) 5.26 His (H) 2.22 Phe (F) 4.02 Tyr (Y) 3.02
Cys (C) 1.36 Ile (I) 5.89 Pro (P) 4.84 Val (V) 6.65
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.07
2.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
Gln, Tyr, Met, His, Cys, Trp
3. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/TrEMBL: 193405
The first twenty species represent 1076730 sequences: 14.3 % of the
total number of entries.
3.1 Table of the frequency of occurrence of species
Species represented 1x:87309
2x:34896
3x:18055
4x:10752
5x: 6919
6x: 4612
7x: 3548
8x: 2634
9x: 2158
10x: 2513
11- 20x:11482
21- 50x: 4206
51-100x: 1591
>100x: 2730
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 262485 Human immunodeficiency virus 1
2 95008 Oryza sativa subsp. japonica (Rice)
3 67189 Homo sapiens (Human)
4 54387 Vitis vinifera (Grape)
5 50193 Branchiostoma floridae (Florida lancelet) (Amphioxus)
6 50188 Trichomonas vaginalis G3
7 46361 Hepatitis C virus
8 44805 Mus musculus (Mouse)
9 43980 Populus trichocarpa (Western balsam poplar)
10 43557 Arabidopsis thaliana (Mouse-ear cress)
11 39850 Paramecium tetraurelia
12 38756 Oryza sativa subsp. indica (Rice)
13 34771 Physcomitrella patens subsp. patens
14 33127 uncultured bacterium
15 31220 Ricinus communis (Castor bean)
16 30108 Zea mays (Maize)
17 29407 Drosophila melanogaster (Fruit fly)
18 28078 Tetraodon nigroviridis (Green puffer)
19 26658 Hepatitis B virus (HBV)
20 26602 Danio rerio (Zebrafish) (Brachydanio rerio)
21 24830 Nematostella vectensis (Starlet sea anemone)
22 21418 Caenorhabditis briggsae
23 21089 Ixodes scapularis (Black-legged tick) (Deer tick)
24 20639 Caenorhabditis elegans
25 20525 Trypanosoma cruzi
26 18820 Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
27 17880 Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver)
28 17513 Drosophila simulans (Fruit fly)
29 16989 Drosophila yakuba (Fruit fly)
30 16785 Drosophila persimilis (Fruit fly)
31 16779 Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
32 16685 Tetrahymena thermophila SB210
33 16281 Drosophila sechellia (Fruit fly)
34 16281 Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
35 16064 Drosophila pseudoobscura pseudoobscura (Fruit fly)
36 15883 Phaeosphaeria nodorum (Septoria nodorum)
37 15513 Drosophila willistoni (Fruit fly)
38 15064 Drosophila ananassae (Fruit fly)
39 15040 Drosophila erecta (Fruit fly)
40 14781 Drosophila mojavensis (Fruit fly)
41 14756 Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
42 14736 Drosophila virilis (Fruit fly)
43 14724 Chlamydomonas reinhardtii
44 14675 Plasmodium chabaudi
45 14301 Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold)
46 14296 Anopheles gambiae (African malaria mosquito)
47 13747 Aspergillus niger (strain CBS 513.88 / FGSC A1513)
48 13489 Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus)
49 13487 Aspergillus flavus NRRL3357
50 12996 Talaromyces stipitatus ATCC 10500
51 12810 Xenopus laevis (African clawed frog)
52 12772 Penicillium chrysogenum Wisconsin 54-1255
53 12737 Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
54 12057 Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus)
55 11927 Aspergillus oryzae
56 11793 Plasmodium berghei
57 11612 Thalassiosira pseudonana CCMP1335
58 11574 Trichoplax adhaerens
59 11562 Brugia malayi (Filarial nematode worm)
60 11045 Escherichia coli
61 10926 Hepatitis C virus subtype 1b
62 10892 Chaetomium globosum (Soil fungus)
63 10709 Podospora anserina
64 10559 Ralstonia solanacearum (Pseudomonas solanacearum)
65 10467 Dictyostelium discoideum (Slime mold)
66 10427 Neurospora crassa
67 10422 Penicillium marneffei ATCC 18224
68 10336 Phaeodactylum tricornutum CCAP 1055/1
69 10294 Coccidioides immitis
70 10288 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
71 10238 Aspergillus terreus (strain NIH 2624)
72 10230 Neosartorya fischeri (Aspergillus fischerianus
73 9892 Schistosoma japonicum (Blood fluke)
74 9878 Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163)
75 9813 Bos taurus (Bovine)
76 9669 Cryptococcus neoformans (Filobasidiella neoformans)
77 9665 Aspergillus fumigatus (Sartorya fumigata)
78 9471 Trypanosoma brucei
79 9416 Emericella nidulans (Aspergillus nidulans)
80 9258 Monosiga brevicollis (Choanoflagellate)
81 9192 Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus)
82 9190 Candida albicans (Yeast)
83 9166 Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
84 9090 Rattus norvegicus (Rat)
85 8983 Postia placenta Mad-698-R
86 8954 Aspergillus clavatus
87 8932 Porcine reproductive and respiratory syndrome virus (PRRSV)
88 8915 Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz)
89 8884 Helicobacter pylori (Campylobacter pylori)
90 8809 Rhodococcus sp. (strain RHA1)
91 8731 Escherichia coli (strain 55989 / EAEC)
92 8607 Entamoeba dispar SAW760
93 8523 Stigmatella aurantiaca DW4/3-1
94 8437 Plesiocystis pacifica SIR-1
95 8275 Plasmodium falciparum
96 8253 Streptomyces sviceus ATCC 29083
97 8249 Microscilla marina ATCC 23134
98 8201 Microcoleus chthonoplastes PCC 7420
99 8180 Burkholderia xenovorans (strain LB400)
100 8129 Bradyrhizobium japonicum
3.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 137049 ( 2%)
Bacteria 4165881 ( 55%)
Eukaryota 2492847 ( 33%)
Viruses 733065 ( 10%)
Other 8599 ( <1%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 67203 ( 3%) ( 1%)
Other Mammalia 144489 ( 6%) ( 2%)
Other Vertebrata 247964 ( 10%) ( 3%)
Viridiplantae 608401 ( 24%) ( 8%)
Fungi 448052 ( 18%) ( 6%)
Insecta 366292 ( 15%) ( 5%)
Nematoda 59871 ( 2%) ( 1%)
Other 550575 ( 22%) ( 7%)
4. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 148439 1001-1100 47501
51- 100 564448 1101-1200 33699
101- 150 664698 1201-1300 23447
151- 200 640557 1301-1400 16104
201- 250 639142 1401-1500 12723
251- 300 614727 1501-1600 9280
301- 350 566716 1601-1700 7128
351- 400 445777 1701-1800 5819
401- 450 370905 1801-1900 4539
451- 500 311596 1901-2000 3897
501- 550 222942 2001-2100 3109
551- 600 167193 2101-2200 3213
601- 650 123928 2201-2300 2449
651- 700 97281 2301-2400 1977
701- 750 83493 2401-2500 1658
751- 800 74339 >2500 15038
801- 850 56358
851- 900 50046
901- 950 35680
951-1000 28236
The average sequence length in UniProtKB/TrEMBL is 326 amino acids.
The shortest sequence is Q16047_HUMAN: 4 amino acids.
The longest sequence is Q3ASY8_CHLCH: 36805 amino acids.
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
--------------------------------- -------- --------- ---------
References (RL) 9885975 1.31
Submitted to EMBL/GenBank/DDBJ 5435937 4559102 0.72
Journal 4342038 3784310 0.58
Thesis 7110 7054 <0.01
Submitted to other databases 4628 4620 <0.01
Book citation 4517 4471 <0.01
Other 91745 90595 0.01
Comments (CC) 5086365 0.67
SIMILARITY 1529419 1283560 0.20
CAUTION 1522553 1522553 0.20
FUNCTION 550977 487269 0.07
CATALYTIC ACTIVITY 520201 431091 0.07
SUBCELLULAR LOCATION 451790 420050 0.06
SUBUNIT 244362 219337 0.03
COFACTOR 159132 146980 0.02
PATHWAY 98881 95980 0.01
MISCELLANEOUS 5827 5827 <0.01
INTERACTION 2627 2627 <0.01
DOMAIN 596 596 <0.01
Features (FT) 2925361 0.39
NON_TER 2416053 1437324 0.32
CHAIN 314678 246799 0.04
SIGNAL 194054 194054 0.03
TRANSIT 576 576 <0.01
Cross-references (DR) 71009542 9.42
GO 13450712 4329612 1.78
InterPro 12179422 5288931 1.62
EMBL 8471780 7530279 1.12
Pfam 6773292 5019113 0.90
PROSITE 3705440 2401590 0.49
RefSeq 3705117 3565145 0.49
GeneID 3690642 3558116 0.49
KEGG 2964305 2877187 0.39
Gene3D 2221743 1897018 0.29
GenomeReviews 2190231 2128466 0.29
SMART 1291968 1012465 0.17
TIGRFAMs 1203718 1100600 0.16
PANTHER 1141465 1081325 0.15
PRINTS 1132757 986972 0.15
HOGENOM 1046657 1046653 0.14
NMPDR 941154 941143 0.12
BioCyc 833412 811420 0.11
ProDom 669141 638944 0.09
SMR 490641 490505 0.07
UniGene 360267 332031 0.05
PIRSF 337379 337379 0.04
HOVERGEN 309523 309327 0.04
HSSP 259517 259229 0.03
TIGR 197613 190359 0.03
FlyBase 194222 192697 0.03
IPI 193089 193089 0.03
PIR 179433 146432 0.02
Ensembl 150065 144403 0.02
ArrayExpress 95180 95144 0.01
Bgee 80715 80670 0.01
Gramene 69538 69538 0.01
PRIDE 60147 60147 0.01
euHCVdb 55083 55082 0.01
NextBio 53147 53147 0.01
MGI 39407 39130 0.01
VectorBase 28981 28654 <0.01
HGNC 27771 27735 <0.01
MEROPS 25649 24990 <0.01
ZFIN 19621 19615 <0.01
WormPep 18815 18712 <0.01
WormBase 18806 18712 <0.01
TAIR 18615 18566 <0.01
IntAct 12617 12617 <0.01
LinkHub 11554 11554 <0.01
Xenbase 10331 10045 <0.01
dictyBase 9048 9047 <0.01
CGD 6852 6852 <0.01
PDBsum 5675 3203 <0.01
PDB 5675 3203 <0.01
LegioList 5178 5150 <0.01
ListiList 4656 4639 <0.01
PseudoCAP 4369 4366 <0.01
PhotoList 3964 3840 <0.01
BuruList 3944 3910 <0.01
AGD 3904 3904 <0.01
RGD 3684 3678 <0.01
REBASE 3674 3650 <0.01
BRENDA 2972 2902 <0.01
TubercuList 2500 2494 <0.01
DIP 2229 2224 <0.01
PeroxiBase 2093 2088 <0.01
TCDB 1977 1958 <0.01
SagaList 1713 1619 <0.01
PhosphoSite 1250 1250 <0.01
Leproma 952 951 <0.01
MypuList 581 577 <0.01
ProMEX 473 473 <0.01
World-2DPAGE 412 412 <0.01
SGD 317 317 <0.01
GeneDB_Spombe 206 202 <0.01
PeptideAtlas 165 165 <0.01
PHCI-2DPAGE 102 102 <0.01
PharmGKB 89 89 <0.01
Reactome 68 64 <0.01
ANU-2DPAGE 58 58 <0.01
SWISS-2DPAGE 29 29 <0.01
Pathway_Interaction_DB 16 13 <0.01
CYGD 16 16 <0.01
REPRODUCTION-2DPAGE 13 13 <0.01
PMMA-2DPAGE 3 3 <0.01
Siena-2DPAGE 2 2 <0.01
COMPLUYEAST-2DPAGE 1 1 <0.01
Number of explicitly cross-referenced databases: 110
6. MISCELLANEOUS STATISTICS
Total number of distinct authors cited in UniProtKB/TrEMBL: 271939
Total number of entries encoded on a Mitochondrion: 246125
Total number of entries encoded on a Plasmid: 121183
Total number of entries encoded on a Plastid: 7064
Total number of entries encoded on a Plastid; Apicoplast: 316
Total number of entries encoded on a Plastid; Chloroplast: 85864
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 419
Number of fragments: 1439360
| Submissions and Updates |
|---|
We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.
Submit new sequence data, updates and corrections at http://www.uniprot.org/help/submissions.
For all queries regarding submissions to UniProtKB, please contact: datasubs@ebi.ac.uk
| Download information |
|---|
The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/downloads. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic
For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.
| Contact |
|---|
| Citation |
|---|
If you want to cite UniProt in a publication, please use the following reference:
The UniProt Consortium
"The Universal Protein Resource (UniProt) 2009"
Nucleic Acids Res. 37:D169-D174(2009) 10.1093/nar/gkn664