Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.


StatusReference proteome
Proteinsi <p>Number of protein entries associated with this proteome: UniProtKB entries for regular proteomes or UniParc entries for redundant proteomes (<a href="/help/proteome%5Fredundancy">more...</a>)</p> 74,863
Gene counti <p>This is the total number of unique genes found in the proteome set, algorithmically computed. For each gene, a single representative protein sequence is chosen from the proteome. Where possible, reviewed (Swiss-Prot) protein sequences are chosen as the representatives.</p> - Download one protein sequence per gene (FASTA)
Proteome IDi <p>The proteome identifier (UPID) is the unique identifier assigned to the set of proteins that constitute the <a href="">proteome</a>. It consists of the characters 'UP' followed by 9 digits, is stable across releases and can therefore be used to cite a UniProt proteome.<p><a href='/help/proteome_id' target='_top'>More...</a></p>UP000008827
Taxonomy3847 - Glycine max
Straincv. Williams 82
Last modifiedFebruary 26, 2021
Genome assembly and annotationi <p>Identifier for the genome assembly (<a href="">more...</a>)</p> GCA_000004515.4 from EnsemblPlants full
Buscoi <p>The Benchmarking Universal Single-Copy Ortholog (BUSCO) assessment tool is used, for eukaryotic and bacterial proteomes, to provide quantitative measures of UniProt proteome data completeness in terms of expected gene content. BUSCO scores include percentages of complete (C) single-copy (S) genes, complete (C) duplicated (D) genes, fragmented (F) and missing (F) genes, as well as the total number of orthologous clusters (n) used in the BUSCO assessment.</p> C:99.2%[S:25.6%,D:73.6%],F:0.2%,M:0.6%,n:5366 fabales_odb10
Completenessi <p>Complete Proteome Detector (CPD) is an algorithm which employs statistical evaluation of the completeness and quality of proteomes in UniProt, by looking at the sizes of taxonomically close proteomes. Possible values are 'Standard', 'Close to Standard' and 'Outlier'.</p> Outlier (high value)

Glycine max (soybean) is one of the most important crop plants for seed protein and oil content. As a member of the plant family Leguminosae, soybean also has the capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms.

The species originated in South East Asia, with the main areas of production today being in North America, South America and China. It is the world's most important legume crop and ranks sixth of all cultivated crops in terms of total harvest.

The reference proteome for Glycine max is derived from the genome published in 2010. Glycine max has a haploid chromosome number of 10 and is an ancient polyploid (palaeopolyploid) with over 50% more protein-coding genes than Arabidopsis, and 75% of the genes occurring as multiple copies. About 80% of the predicted genes are found in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. The soybean genome contains 1 Gb with 64,000 protein-coding genes, which is eight times larger than the Arabidopsis genome.

Componentsi <p>Genomic components encoding the proteome</p>

Component nameGenome Accession(s)
Component representationProteins
Chromosome 13250
Chromosome 24241
Chromosome 33527
Chromosome 43504
Chromosome 53422
Chromosome 64336
Chromosome 73673
Chromosome 85025
Chromosome 93820
Chromosome 103982
Chromosome 113400
Chromosome 123213
Chromosome 134987
Chromosome 143016
Chromosome 153654
Chromosome 162957
Chromosome 173572
Chromosome 183862
Chromosome 193502
Chromosome 203416
UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again