SubmitCancel

Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Swiss-Prot release 43.0

Published March 29, 2004

Swiss-Prot Protein Knowledgebase
Release Notes

Release 43.0 of 29-Mar-2004

Swiss-Prot release 43.0 release notes
Content


Introduction
Status of the model organisms
Swiss-Prot protein knowledgebase release 43.0 statistics
We need your help


See also Recent changes and Forthcoming changes.

Introduction

Release 43.0 of 29-Mar-2004 of Swiss-Prot contains 146'720 sequence entries, comprising 54'093'154 amino acids abstracted from 113'719 references. 10'760 sequences have been added since release 42, the sequence data of 663 existing entries has been updated and the annotations of 44'948 entries have been revised. This represents an increase of 8%.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154

Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

  • be as complete as possible. All sequences available at a given time should be immediately included in Swiss-Prot. This also includes sequence corrections and updates;
  • provide a higher level of annotation;
  • provide cross-references to specialized database(s) that contain, among other data, some information about the genes that code for these proteins;
  • provide specific indexes and documents.
From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana None yet arath.txt 2'591
C.albicans None yet calbican.txt 286
C.elegans Wormpep celegans.txt 2'458
D.discoideum DictyDB dicty.txt 319
D.melanogaster FlyBase fly.txt 1'967
M.musculus MGD mgdtosp.txt 7'326
S.cerevisiae SGD yeast.txt 4'930
S.pombe GeneDB_SPombe pombe.txt 2'386

Swiss-Prot protein knowledgebase release 43.0 statistics



1.  INTRODUCTION

Release 43.0 of 29-Mar-2004 of Swiss-Prot contains 146720 sequence entries,
comprising 54093154 amino acids abstracted from 113719 references. 

10760 sequences have been added since release 42, the sequence data of
663 existing entries has been updated and the annotations of
44948 entries have been revised. This represents an increase of 8%.


2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database

   Ala (A) 7.79   Gln (Q) 3.92   Leu (L) 9.60   Ser (S) 6.89
   Arg (R) 5.28   Glu (E) 6.59   Lys (K) 5.93   Thr (T) 5.47
   Asn (N) 4.23   Gly (G) 6.93   Met (M) 2.37   Trp (W) 1.16
   Asp (D) 5.30   His (H) 2.27   Phe (F) 4.03   Tyr (Y) 3.09
   Cys (C) 1.56   Ile (I) 5.91   Pro (P) 4.85   Val (V) 6.70

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.01


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Lys, Ile, Thr, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of Swiss-Prot: 8424

   The first twenty species represent 57715 sequences:  39.3 % of the total
   number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x: 4065
                            2x: 1283
                            3x:  661
                            4x:  431
                            5x:  269
                            6x:  262
                            7x:  197
                            8x:  149
                            9x:  127
                           10x:   88
                       11- 20x:  344
                       21- 50x:  250
                       51-100x:   93
                         >100x:  205


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1      10691  Homo sapiens (Human)
       2       7326  Mus musculus (Mouse)
       3       4930  Saccharomyces cerevisiae (Baker's yeast)
       4       4835  Escherichia coli
       5       3726  Rattus norvegicus (Rat)
       6       2712  Bacillus subtilis
       7       2591  Arabidopsis thaliana (Mouse-ear cress)
       8       2458  Caenorhabditis elegans
       9       2386  Schizosaccharomyces pombe (Fission yeast)
      10       1967  Drosophila melanogaster (Fruit fly)
      11       1773  Haemophilus influenzae
      12       1772  Methanococcus jannaschii
      13       1647  Escherichia coli O157:H7
      14       1438  Bos taurus (Bovine)
      15       1406  Salmonella typhimurium
      16       1393  Mycobacterium tuberculosis
      17       1284  Escherichia coli O6
      18       1210  Shigella flexneri
      19       1090  Gallus gallus (Chicken)
      20       1080  Mycobacterium bovis
      21        980  Salmonella typhi
      22        962  Pseudomonas aeruginosa
      23        941  Synechocystis sp. (strain PCC 6803)
      24        937  Archaeoglobus fulgidus
      25        873  Xenopus laevis (African clawed frog)
      26        850  Sus scrofa (Pig)
      27        766  Rhizobium meliloti (Sinorhizobium meliloti)
      28        743  Vibrio cholerae
      29        738  Aquifex aeolicus
      30        725  Oryctolagus cuniculus (Rabbit)
      31        695  Yersinia pestis
      32        687  Mycoplasma pneumoniae
      33        647  Pasteurella multocida
      34        605  Mycobacterium leprae
      35        603  Treponema pallidum
      36        601  Streptomyces coelicolor
      37        586  Bacillus halodurans
      38        572  Buchnera aphidicola (subsp. Acyrthosiphon pisum) 
      39        570  Vibrio parahaemolyticus
      40        560  Buchnera aphidicola (subsp. Schizaphis graminum)
      41        557  Methanobacterium thermoautotrophicum
      42        557  Helicobacter pylori (Campylobacter pylori)
      43        543  Rickettsia prowazekii
      44        542  Anabaena sp. (strain PCC 7120)
      45        538  Helicobacter pylori J99 (Campylobacter pylori J99)
      46        518  Vibrio vulnificus
      47        504  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      48        503  Staphylococcus aureus (strain N315)
      49        499  Zea mays (Maize)
      50        495  Lactococcus lactis (subsp. lactis) (Streptococcus lactis)
      51        487  Staphylococcus aureus (strain MW2)
      52        486  Mycoplasma genitalium
      53        467  Ralstonia solanacearum (Pseudomonas solanacearum)
      54        464  Staphylococcus epidermidis
      55        463  Listeria monocytogenes
      56        459  Neisseria meningitidis (serogroup B)
      57        457  Listeria innocua
      58        457  Neisseria meningitidis (serogroup A)
      59        449  Pseudomonas putida (strain KT2440)
      60        448  Thermotoga maritima
      61        447  Rhizobium loti (Mesorhizobium loti)
      62        447  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      63        443  Xanthomonas campestris (pv. campestris)
      64        443  Clostridium acetobutylicum
      65        438  Pseudomonas syringae (pv. tomato)
      66        434  Caulobacter crescentus
      67        424  Oryza sativa (Rice)
      68        419  Deinococcus radiodurans
      69        417  Chlamydia trachomatis
      70        416  Streptococcus pneumoniae
      71        414  Borrelia burgdorferi (Lyme disease spirochete)
      72        412  Xylella fastidiosa
      73        411  Canis familiaris (Dog)
      74        407  Xanthomonas axonopodis (pv. citri)
      75        406  Pyrococcus horikoshii
      76        405  Chlamydia pneumoniae (Chlamydophila pneumoniae)
      77        403  Rhizobium sp. (strain NGR234)
      78        400  Buchnera aphidicola (subsp. Baizongia pistaciae)
      79        400  Pyrococcus abyssi
      80        400  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
      81        395  Chlamydia muridarum
      82        382  Clostridium perfringens
      83        377  Brucella melitensis
      84        375  Brucella suis
      85        374  Bradyrhizobium japonicum
      86        371  Corynebacterium glutamicum (Brevibacterium flavum)
      87        365  Halobacterium sp. (strain NRC-1 / ATCC 700922 / JCM 11081)
      88        362  Campylobacter jejuni
      89        361  Methanosarcina acetivorans
      90        356  Methanosarcina mazei (Methanosarcina frisia)
      91        355  Nicotiana tabacum (Common tobacco)
      92        355  Pyrococcus furiosus
      93        353  Sulfolobus solfataricus
      94        353  Thermoanaerobacter tengcongensis
      95        348  Streptococcus pyogenes
      96        343  Rickettsia conorii
      97        342  Ovis aries (Sheep)
      98        330  Lactobacillus plantarum
      99        330  Shewanella oneidensis
     100        321  Aeropyrum pernix


   
   3.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea            8393 (  6%)
    Bacteria          62334 ( 42%)
    Eukaryota         67392 ( 46%)
    Viruses            8601 (  6%)


   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  10691 ( 16%)           (  7%)
     Other Mammalia         18197 ( 27%)           ( 12%)
     Other Vertebrata        6287 (  9%)           (  4%)
     Viridiplantae          10743 ( 16%)           (  7%)
     Fungi                   9849 ( 15%)           (  7%)
     Insecta                 3685 (  5%)           (  3%)
     Nematoda                2692 (  4%)           (  2%)
     Other                   5248 (  8%)           (  4%)


4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    2672             1001-1100     1294
                 51- 100    9908             1101-1200      921
                101- 150   14435             1201-1300      686
                151- 200   13501             1301-1400      496
                201- 250   14264             1401-1500      388
                251- 300   12453             1501-1600      250
                301- 350   12935             1601-1700      185
                351- 400   11893             1701-1800      135
                401- 450    9066             1801-1900      150
                451- 500    7801             1901-2000      120
                501- 550    5961             2001-2100       70
                551- 600    3965             2101-2200      108
                601- 650    3391             2201-2300      100
                651- 700    2385             2301-2400       59
                701- 750    2073             2401-2500       62
                751- 800    1741             >2500          386
                801- 850    1359
                851- 900    1418
                901- 950    1006
                951-1000     862


   The average sequence length in Swiss-Prot is 368 amino acids.

   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
   The longest sequence is   SNE1_HUMAN (Q8NF91):  8797 amino acids.


5.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of Swiss-Prot: 1437


   5.1 Table of the frequency of journal citations

        Journals cited 1x:  529
                       2x:  181
                       3x:  103
                       4x:   66
                       5x:   56
                       6x:   35
                       7x:   32
                       8x:   26
                       9x:   24
                      10x:   15
                  11- 20x:  110
                  21- 50x:  110
                  51-100x:   46
                    >100x:  104


   5.2  List of the most cited journals in Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1        10139   Journal of Biological Chemistry
    2         5380   Proceedings of the National Academy of Sciences of the U.S.A.
    3         3835   Journal of Bacteriology
    4         3693   Nucleic Acids Research
    5         3568   Gene
    6         2843   FEBS Letters
    7         2789   Biochemical and Biophysical Research Communications
    8         2573   Biochemistry
    9         2543   European Journal of Biochemistry
   10         2361   The EMBO Journal
   11         2233   Nature
   12         2157   Biochimica et Biophysica Acta
   13         1949   Journal of Molecular Biology
   14         1886   Genomics
   15         1728   Cell
   16         1700   Molecular and Cellular Biology
   17         1353   Biochemical Journal
   18         1293   Science
   19         1153   Plant Molecular Biology
   20         1147   Molecular Microbiology
   21         1141   Molecular and General Genetics
   22          887   Journal of Biochemistry
   23          868   Virology
   24          834   Human Molecular Genetics
   25          788   Journal of Cell Biology
   26          745   Nature Genetics
   27          682   Genes and Development
   28          657   Journal of Virology
   29          641   The American Journal of Human Genetics
   30          639   Plant Physiology
   31          626   Human Mutation
   32          621   Oncogene
   33          568   Infection and Immunity
   34          566   Journal of Immunology
   35          551   Yeast
   36          519   Journal of General Virology
   37          517   Structure
   38          505   Archives of Biochemistry and Biophysics
   39          488   Microbiology
   40          475   FEMS Microbiology Letters
   41          475   Development
   42          436   Nature Structural Biology
   43          423   Genetics
   44          416   Human Genetics
   45          399   Current Genetics
   46          383   Blood
   47          367   Molecular and Biochemical Parasitology
   48          345   Applied and Environmental Microbiology
   49          336   Journal of Clinical Investigation
   50          318   Mammalian Genome
   51          316   Molecular Endocrinology
   52          314   Developmental Biology
   53          310   Protein Science
   54          297   Immunogenetics
   55          297   DNA and Cell Biology
   56          293   Cancer Research
   57          291   Journal of Molecular Evolution
   58          279   Neuron
   59          274   The Journal of Experimental Medicine
   60          274   Molecular Biology of the Cell
   61          271   Biological Chemistry Hoppe-Seyler
   62          269   Mechanisms of Development
   63          265   Acta Crystallographica, Section D
   64          265   The Plant Cell
   65          250   Endocrinology
   66          246   Journal of Cell Science
   67          239   DNA Sequence
   68          234   The Plant Journal
   69          232   Journal of General Microbiology
   70          223   Journal of Neuroscience
   71          222   Molecular Biology and Evolution
   72          213   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
   73          212   Journal of Neurochemistry
   74          208   Brain Research. Molecular Brain Research
   75          206   The Journal of Clinical Endocrinology and Metabolism
   76          194   Cytogenetics and Cell Genetics
   77          182   Toxicon
   78          177   Comparative Biochemistry and Physiology
   79          175   American Journal of Physiology
   80          174   Bioscience, Biotechnology, and Biochemistry
   81          167   Molecular Cell
   82          163   Molecular Pharmacology
   83          160   Antimicrobial Agents and Chemotherapy
   84          158   Current Biology
   85          156   DNA
   86          144   Journal of Investigative Dermatology
   87          142   Tissue Antigens
   88          141   DNA Research
   89          140   Proteins
   90          136   Molecular Plant-Microbe Interactions
   91          136   Biochimie
   92          132   Peptides
   93          132   Virus Research
   94          132   Journal of Medical Genetics
   95          129   Bioorganicheskaia Khimiia
   96          125   American Journal of Medical Genetics
   97          124   Genome Research
   98          120   Hemoglobin
   99          117   Molecular and Cellular Endocrinology
  100          114   Agricultural and Biological Chemistry
  101          108   Biology of Reproduction
  102          107   Plant and Cell Physiology
  103          105   European Journal of Immunology
  104          102   Archives of Microbiology


6.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                     283423              1.93
   Journal                          250489    136613    1.71
   Submitted to EMBL/GenBank/DDBJ    30227     25422    0.21
   Unpublished observations            536       532   <0.01
   Submitted to Swiss-Prot             527       525   <0.01
   Plant Gene Register                 487       476   <0.01
   Book citation                       465       453   <0.01
   Thesis                              263       261   <0.01
   Submitted to other databases        203       202   <0.01
   Unpublished results                 127       125   <0.01
   Patent                               97        96   <0.01
   Worm Breeder's Gazette                2         2   <0.01

Comments (CC)                       512278              3.49
   SIMILARITY                       147106    127783    1.00
   FUNCTION                          93944     92443    0.64
   SUBCELLULAR LOCATION              69668     69668    0.47
   CATALYTIC ACTIVITY                51230     48248    0.35
   SUBUNIT                           44297     44297    0.30
   PATHWAY                           24285     23209    0.17
   COFACTOR                          17058     17058    0.12
   TISSUE SPECIFICITY                16130     16130    0.11
   PTM                                9012      8155    0.06
   MISCELLANEOUS                      8738      8050    0.06
   ALTERNATIVE PRODUCTS               5272      5272    0.04
   DOMAIN                             4975      4457    0.03
   CAUTION                            4387      4105    0.03
   INDUCTION                          4067      4067    0.03
   DEVELOPMENTAL STAGE                3868      3868    0.03
   DISEASE                            2514      1876    0.02
   ENZYME REGULATION                  2012      2012    0.01
   DATABASE                           1294      1217    0.01
   MASS SPECTROMETRY                  1213      1078    0.01
   POLYMORPHISM                        454       444   <0.01
   ALLERGEN                            335       335   <0.01
   RNA EDITING                         295       295   <0.01
   BIOTECHNOLOGY                        77        77   <0.01
   PHARMACEUTICAL                       47        47   <0.01

Features (FT)                       831689              5.67
   DOMAIN                           116075     35669    0.79
   TRANSMEM                          91827     20000    0.63
   TURN                              62474      4662    0.43
   STRAND                            57252      4163    0.39
   CONFLICT                          55195     19373    0.38
   METAL                             54792     13543    0.37
   CARBOHYD                          50364     12429    0.34
   DISULFID                          46118     12311    0.31
   HELIX                             45117      4520    0.31
   REPEAT                            31810      4634    0.22
   ACT_SITE                          31418     19008    0.21
   VARIANT                           27420      5089    0.19
   CHAIN                             26007     21096    0.18
   NP_BIND                           18909     13304    0.13
   SIGNAL                            16306     16304    0.11
   MOD_RES                           14982      8452    0.10
   NON_TER                           10597      8092    0.07
   SITE                              10154      6238    0.07
   VARSPLIC                           9968      4490    0.07
   BINDING                            9770      7652    0.07
   ZN_FING                            9215      3288    0.06
   MUTAGEN                            6587      1880    0.04
   INIT_MET                           6135      6090    0.04
   PROPEP                             5196      4409    0.04
   DNA_BIND                           4618      4327    0.03
   LIPID                              4175      2790    0.03
   PEPTIDE                            2806      1101    0.02
   TRANSIT                            2791      2766    0.02
   CA_BIND                            1840       792    0.01
   NON_CONS                            835       428    0.01
   CROSSLNK                            458       360   <0.01
   UNSURE                              315       131   <0.01
   SE_CYS                              163       108   <0.01

Cross-references (DR)              1336204              9.11
   EMBL                             284537    140087    1.94
   InterPro                         264209    129846    1.80
   Pfam                             168429    124579    1.15
   PROSITE                          128678     81216    0.88
   PIR                               88842     81354    0.61
   PRINTS                            47994     42345    0.33
   GO                                47413     14722    0.32
   SMART                             42918     32422    0.29
   HAMAP                             40549     40436    0.28
   TIGRFAMs                          40318     37407    0.27
   HSSP                              38738     38738    0.26
   ProDom                            36531     35107    0.25
   PDB                               22244      6010    0.15
   TIGR                              14632     14556    0.10
   Genew                              9613      9565    0.07
   MIM                                9433      7904    0.06
   MGD                                6973      6952    0.05
   SGD                                4973      4919    0.03
   GermOnline                         4927      4876    0.03
   EcoGene                            4227      4225    0.03
   MEROPS                             3454      3339    0.02
   WormPep                            2730      2439    0.02
   SubtiList                          2667      2666    0.02
   TRANSFAC                           2648      2373    0.02
   FlyBase                            2520      2446    0.02
   GeneDB_SPombe                      2399      2369    0.02
   RGD                                2297      2297    0.02
   TubercuList                        1421      1385    0.01
   StyGene                            1362      1359    0.01
   PIRSF                              1168      1168    0.01
   SWISS-2DPAGE                       1075      1075    0.01
   ListiList                           921       860    0.01
   Leproma                             609       605   <0.01
   GK                                  594       594   <0.01
   Gramene                             556       552   <0.01
   MaizeDB                             411       406   <0.01
   HIV                                 370       354   <0.01
   REBASE                              361       356   <0.01
   ECO2DBASE                           351       299   <0.01
   DictyBase                           321       319   <0.01
   ZFIN                                260       260   <0.01
   GlycoSuiteDB                        259       259   <0.01
   PHCI-2DPAGE                         214       214   <0.01
   SagaList                            205       204   <0.01
   PhotoList                           175       175   <0.01
   MypuList                            159       159   <0.01
   Aarhus/Ghent-2DPAGE                 128        98   <0.01
   Siena-2DPAGE                        103       103   <0.01
   HSC-2DPAGE                           85        85   <0.01
   PhosSite                             53        53   <0.01
   COMPLUYEAST-2DPAGE                   50        50   <0.01
   PMMA-2DPAGE                          48        48   <0.01
   Maize-2DPAGE                         39        39   <0.01
   ANU-2DPAGE                           13        13   <0.01


7.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in Swiss-Prot: 180569

Total number of entries encoded on a chloroplast: 3494
Total number of entries encoded on a mitochondrion: 2886
Total number of entries encoded on a cyanelle: 145
Total number of entries encoded on a plasmid: 2736

Number of fragments: 8221
Number of additional sequences encoded on splice variants: 7776
We need your help

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available. To facilitate this feedback we offer, on the ExPASy WWW server, a form that allows the submission of updates and/or corrections to Swiss-Prot:

It is also possible, from any entry in Swiss-Prot displayed by the ExPASy server, to submit updates and/or corrections for that particular entry. Finally, you can also send your comments by electronic mail to the address:

Note that all update requests are assigned a unique identifier of the form UR-Xnnnn (example: UR-A0123). This identifier is used internally by the Swiss-Prot staff at SIB and EBI to track requests and is also used in e-mail exchanges with the persons who have submitted a request.