Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.


StatusReference proteome
Proteinsi <p>Number of protein entries associated with this proteome: UniProtKB entries for regular proteomes or UniParc entries for redundant proteomes (<a href="/help/proteome%5Fredundancy">more...</a>)</p> 39,328
Gene counti <p>This is the total number of unique genes found in the proteome set, algorithmically computed. For each gene, a single representative protein sequence is chosen from the proteome. Where possible, reviewed (Swiss-Prot) protein sequences are chosen as the representatives.</p> - Download one protein sequence per gene (FASTA)
Proteome IDi <p>The proteome identifier (UPID) is the unique identifier assigned to the set of proteins that constitute the <a href="">proteome</a>. It consists of the characters 'UP' followed by 9 digits, is stable across releases and can therefore be used to cite a UniProt proteome.<p><a href='/help/proteome_id' target='_top'>More...</a></p>UP000006548
Taxonomy3702 - Arabidopsis thaliana
Straincv. Columbia
Genome assembly and annotationi <p>Identifier for the genome assembly (<a href="">more...</a>)</p> GCA_000001735.1 from EnsemblPlants full
Buscoi <p>The Benchmarking Universal Single-Copy Ortholog (BUSCO) assessment tool is used, for eukaryotic and bacterial proteomes, to provide quantitative measures of UniProt proteome data completeness in terms of expected gene content. BUSCO scores include percentages of complete (C) single-copy (S) genes, complete (C) duplicated (D) genes, fragmented (F) and missing (M) genes, as well as the total number of orthologous clusters (n) used in the BUSCO assessment, and the name of the taxonomic lineage dataset used.</p> C:100%[S:64.3%,D:35.7%],F:0%,M:0%,n:4596 brassicales_odb10
Completenessi <p>Complete Proteome Detector (CPD) is an algorithm which employs statistical evaluation of the completeness and quality of proteomes in UniProt, by looking at the sizes of taxonomically close proteomes. Possible values are 'Standard', 'Close to Standard' and 'Outlier'.</p> Outlier (high value)

Arabidopsis thaliana (Mouse-ear cress) is a flowering plant belonging to the family Brassicaceae which contains economically important brassica and mustard species. Arabidopsis thaliana was the first plant to have its genome sequenced.

Arabidopsis thaliana is not of economic value itself, but has risen to prominence because of its small size, short generation time and small genome, which make it an ideal plant to use for research.

The Arabidopsis thaliana genome has a haploid chromosome number of 5, containing 135 Mb with about 27,000 protein-coding genes encoding around 35,000 proteins. The reference proteome is derived from the genome sequence published in 2000 for the ecotype Columbia (

Componentsi <p>Genomic components encoding the proteome</p>

Component nameGenome Accession(s)
Component representationProteins
Chromosome 110295
Chromosome 26159
Chromosome 37713
Chromosome 46030
Chromosome 58964
Mitochondrion (cv. C24)115
Mitochondrion (cv. Columbia)33


  1. "Complete structure of the chloroplast genome of Arabidopsis thaliana."
    Sato S., Nakamura Y., Kaneko T., Asamizu E., Tabata S.
    DNA Res. 6:283-290(1999) [PubMed] [Europe PMC] [Abstract]
  2. "The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366,924 nucleotides."
    Unseld M., Marienfeld J.R., Brandt P., Brennicke A.
    Nat. Genet. 15:57-61(1997) [PubMed] [Europe PMC] [Abstract]
  3. "Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana."
    Mayer K.F.X., Schueller C., Wambutt R., Murphy G., Volckaert G., Pohl T., Duesterhoeft A., Stiekema W., Entian K.-D., Terryn N., Harris B., Ansorge W., Brandt P., Grivell L.A., Rieger M., Weichselgartner M., de Simone V., Obermaier B.
    McCombie W.R.
    Nature 402:769-777(1999) [PubMed] [Europe PMC] [Abstract]
  4. "Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana."
    Theologis A., Ecker J.R., Palm C.J., Federspiel N.A., Kaul S., White O., Alonso J., Altafi H., Araujo R., Bowman C.L., Brooks S.Y., Buehler E., Chan A., Chao Q., Chen H., Cheuk R.F., Chin C.W., Chung M.K.
    Davis R.W.
    Nature 408:816-820(2000) [PubMed] [Europe PMC] [Abstract]
  5. "Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana."
    Salanoubat M., Lemcke K., Rieger M., Ansorge W., Unseld M., Fartmann B., Valle G., Bloecker H., Perez-Alonso M., Obermaier B., Delseny M., Boutry M., Grivell L.A., Mache R., Puigdomenech P., De Simone V., Choisne N., Artiguenave F.
    Tabata S.
    Nature 408:820-822(2000) [PubMed] [Europe PMC] [Abstract]
  6. "Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana."
    Tabata S., Kaneko T., Nakamura Y., Kotani H., Kato T., Asamizu E., Miyajima N., Sasamoto S., Kimura T., Hosouchi T., Kawashima K., Kohara M., Matsumoto M., Matsuno A., Muraki A., Nakayama S., Nakazaki N., Naruo K.
    Fransz P.F.
    Nature 408:823-826(2000) [PubMed] [Europe PMC] [Abstract]
  7. "Correction of Persistent Errors in Arabidopsis Reference Mitochondrial Genomes."
    Sloan D.B., Wu Z., Sharbrough J.
    Plant Cell 30:525-527(2018) [PubMed] [Europe PMC] [Abstract]
  8. "Araport11: a complete reannotation of the Arabidopsis thaliana reference genome."
    Cheng C.Y., Krishnakumar V., Chan A.P., Thibaud-Nissen F., Schobel S., Town C.D.
    Plant J. 89:789-804(2017) [PubMed] [Europe PMC] [Abstract]
UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again