Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily E member 1-related

Gene

HMG20B

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Required for correct progression through G2 phase of the cell cycle and entry into mitosis. Required for RCOR1/CoREST mediated repression of neuronal specific gene promoters.

Regions

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
DNA bindingi70 – 13869HMG boxPROSITE-ProRule annotationAdd
BLAST

GO - Molecular functioni

  1. chromatin binding Source: GO_Central
  2. DNA binding Source: UniProtKB-KW

GO - Biological processi

  1. blood coagulation Source: Reactome
  2. cell cycle Source: UniProtKB-KW
  3. chromatin organization Source: Reactome
  4. chromatin remodeling Source: GO_Central
  5. negative regulation of protein sumoylation Source: Ensembl
  6. positive regulation of neuron differentiation Source: Ensembl
  7. regulation of transcription, DNA-templated Source: GO_Central
  8. skeletal muscle cell differentiation Source: Ensembl
  9. transcription, DNA-templated Source: UniProtKB-KW
Complete GO annotation...

Keywords - Molecular functioni

Chromatin regulator

Keywords - Biological processi

Cell cycle, Transcription, Transcription regulation

Keywords - Ligandi

DNA-binding

Enzyme and pathway databases

ReactomeiREACT_24970. Factors involved in megakaryocyte development and platelet production.
REACT_263923. HDACs deacetylate histones.

Names & Taxonomyi

Protein namesi
Recommended name:
SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily E member 1-related
Short name:
SMARCE1-related protein
Alternative name(s):
BRCA2-associated factor 35
HMG box-containing protein 20B
HMG domain-containing protein 2
HMG domain-containing protein HMGX2
Sox-like transcriptional factor
Structural DNA-binding protein BRAF35
Gene namesi
Name:HMG20B
Synonyms:BRAF35, HMGX2, HMGXB2, SMARCE1R
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
ProteomesiUP000005640 Componenti: Chromosome 19

Organism-specific databases

HGNCiHGNC:5002. HMG20B.

Subcellular locationi

Nucleus. Chromosome
Note: Localized to condensed chromosomes in mitosis in conjunction with BRCA2.

GO - Cellular componenti

  1. chromosome Source: UniProtKB-SubCell
  2. nucleoplasm Source: HPA
Complete GO annotation...

Keywords - Cellular componenti

Chromosome, Nucleus

Pathology & Biotechi

Mutagenesis

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Mutagenesisi116 – 1161K → I: Loss of DNA binding activity of the BHC histone deacetylase complex. 1 Publication

Organism-specific databases

PharmGKBiPA29332.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 317317SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily E member 1-relatedPRO_0000048575Add
BLAST

Proteomic databases

MaxQBiQ9P0W2.
PaxDbiQ9P0W2.
PRIDEiQ9P0W2.

PTM databases

PhosphoSiteiQ9P0W2.

Expressioni

Tissue specificityi

Ubiquitously expressed in adult tissues.2 Publications

Gene expression databases

BgeeiQ9P0W2.
CleanExiHS_HMG20B.
ExpressionAtlasiQ9P0W2. baseline and differential.
GenevestigatoriQ9P0W2.

Interactioni

Subunit structurei

Component of a BHC histone deacetylase complex that contains HDAC1, HDAC2, HMG20B/BRAF35, KDM1A, RCOR1/CoREST and PHF21A/BHC80. The BHC complex may also contain ZMYM2, ZNF217, ZMYM3, GSE1 and GTF2I. Interacts with the BRCA2 tumor suppressor protein.2 Publications

Binary interactionsi

WithEntry#Exp.IntActNotes
BRCA2P515878EBI-713401,EBI-79792

Protein-protein interaction databases

BioGridi115642. 32 interactions.
IntActiQ9P0W2. 10 interactions.
MINTiMINT-1372120.
STRINGi9606.ENSP00000328269.

Structurei

3D structure databases

ProteinModelPortaliQ9P0W2.
SMRiQ9P0W2. Positions 33-148.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Coiled coil

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Coiled coili190 – 25768Sequence AnalysisAdd
BLAST

Sequence similaritiesi

Contains 1 HMG box DNA-binding domain.PROSITE-ProRule annotation

Keywords - Domaini

Coiled coil

Phylogenomic databases

eggNOGiCOG5648.
GeneTreeiENSGT00730000110938.
HOVERGENiHBG059870.
InParanoidiQ9P0W2.
OMAiAQEERQT.
PhylomeDBiQ9P0W2.
TreeFamiTF106440.

Family and domain databases

Gene3Di1.10.30.10. 1 hit.
InterProiIPR009071. HMG_box_dom.
[Graphical view]
PfamiPF00505. HMG_box. 1 hit.
[Graphical view]
SMARTiSM00398. HMG. 1 hit.
[Graphical view]
SUPFAMiSSF47095. SSF47095. 1 hit.
PROSITEiPS50118. HMG_BOX_2. 1 hit.
[Graphical view]

Sequences (3)i

Sequence statusi: Complete.

This entry describes 3 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q9P0W2-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MSHGPKQPGA AAAPAGGKAP GQHGGFVVTV KQERGEGPRA GEKGSHEEEP
60 70 80 90 100
VKKRGWPKGK KRKKILPNGP KAPVTGYVRF LNERREQIRT RHPDLPFPEI
110 120 130 140 150
TKMLGAEWSK LQPTEKQRYL DEAEREKQQY MKELRAYQQS EAYKMCTEKI
160 170 180 190 200
QEKKIKKEDS SSGLMNTLLN GHKGGDCDGF STFDVPIFTE EFLDQNKARE
210 220 230 240 250
AELRRLRKMN VAFEEQNAVL QRHTQSMSSA RERLEQELAL EERRTLALQQ
260 270 280 290 300
QLQAVRQALT ASFASLPVPG TGETPTLGTL DFYMARLHGA IERDPAQHEK
310
LIVRIKEILA QVASEHL
Length:317
Mass (Da):35,813
Last modified:September 30, 2000 - v1
Checksum:iADFCF71C47F8CD2D
GO
Isoform 2 (identifier: Q9P0W2-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-102: Missing.

Show »
Length:215
Mass (Da):24,713
Checksum:iF4730ECC889B01B6
GO
Isoform 3 (identifier: Q9P0W2-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     83-106: Missing.

Show »
Length:293
Mass (Da):32,939
Checksum:iFD6967D1DDDD4463
GO

Sequence cautioni

The sequence AAC26860.1 differs from that shown. Reason: Frameshift at position 223. Curated
The sequence AAC26860.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence AAH21585.1 differs from that shown. Reason: Erroneous initiation. Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti11 – 122AA → SS in AAG01174 (PubMed:11997092).Curated
Sequence conflicti174 – 1741G → D in CAG33035 (Ref. 10) Curated
Sequence conflicti223 – 2231H → Q in AAC26860 (Ref. 9) Curated
Sequence conflicti270 – 2701G → D in BAC03510 (PubMed:14702039).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei1 – 102102Missing in isoform 2. 1 PublicationVSP_037131Add
BLAST
Alternative sequencei83 – 10624Missing in isoform 3. 1 PublicationVSP_037132Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF146223 mRNA. Translation: AAF66707.1.
AL355709 mRNA. Translation: CAB90809.2.
AF331191 mRNA. Translation: AAG60060.1.
AF288679 mRNA. Translation: AAG01174.1.
AF072165 mRNA. Translation: AAF76253.1.
AK090733 mRNA. Translation: BAC03510.1.
AC005786 Genomic DNA. Translation: AAC62837.1.
CH471139 Genomic DNA. Translation: EAW69306.1.
CH471139 Genomic DNA. Translation: EAW69307.1.
CH471139 Genomic DNA. Translation: EAW69308.1.
BC002552 mRNA. Translation: AAH02552.1.
BC003505 mRNA. Translation: AAH03505.2.
BC004408 mRNA. Translation: AAH04408.2.
BC021585 mRNA. Translation: AAH21585.1. Different initiation.
AF072836 mRNA. Translation: AAC26860.1. Sequence problems.
CR456754 mRNA. Translation: CAG33035.1.
CCDSiCCDS45919.1. [Q9P0W2-1]
RefSeqiNP_006330.2. NM_006339.2. [Q9P0W2-1]
UniGeneiHs.406534.

Genome annotation databases

EnsembliENST00000333651; ENSP00000328269; ENSG00000064961. [Q9P0W2-1]
GeneIDi10362.
KEGGihsa:10362.
UCSCiuc002lya.3. human. [Q9P0W2-1]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF146223 mRNA. Translation: AAF66707.1.
AL355709 mRNA. Translation: CAB90809.2.
AF331191 mRNA. Translation: AAG60060.1.
AF288679 mRNA. Translation: AAG01174.1.
AF072165 mRNA. Translation: AAF76253.1.
AK090733 mRNA. Translation: BAC03510.1.
AC005786 Genomic DNA. Translation: AAC62837.1.
CH471139 Genomic DNA. Translation: EAW69306.1.
CH471139 Genomic DNA. Translation: EAW69307.1.
CH471139 Genomic DNA. Translation: EAW69308.1.
BC002552 mRNA. Translation: AAH02552.1.
BC003505 mRNA. Translation: AAH03505.2.
BC004408 mRNA. Translation: AAH04408.2.
BC021585 mRNA. Translation: AAH21585.1. Different initiation.
AF072836 mRNA. Translation: AAC26860.1. Sequence problems.
CR456754 mRNA. Translation: CAG33035.1.
CCDSiCCDS45919.1. [Q9P0W2-1]
RefSeqiNP_006330.2. NM_006339.2. [Q9P0W2-1]
UniGeneiHs.406534.

3D structure databases

ProteinModelPortaliQ9P0W2.
SMRiQ9P0W2. Positions 33-148.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi115642. 32 interactions.
IntActiQ9P0W2. 10 interactions.
MINTiMINT-1372120.
STRINGi9606.ENSP00000328269.

PTM databases

PhosphoSiteiQ9P0W2.

Proteomic databases

MaxQBiQ9P0W2.
PaxDbiQ9P0W2.
PRIDEiQ9P0W2.

Protocols and materials databases

DNASUi10362.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000333651; ENSP00000328269; ENSG00000064961. [Q9P0W2-1]
GeneIDi10362.
KEGGihsa:10362.
UCSCiuc002lya.3. human. [Q9P0W2-1]

Organism-specific databases

CTDi10362.
GeneCardsiGC19P003572.
HGNCiHGNC:5002. HMG20B.
MIMi605535. gene.
neXtProtiNX_Q9P0W2.
PharmGKBiPA29332.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiCOG5648.
GeneTreeiENSGT00730000110938.
HOVERGENiHBG059870.
InParanoidiQ9P0W2.
OMAiAQEERQT.
PhylomeDBiQ9P0W2.
TreeFamiTF106440.

Enzyme and pathway databases

ReactomeiREACT_24970. Factors involved in megakaryocyte development and platelet production.
REACT_263923. HDACs deacetylate histones.

Miscellaneous databases

ChiTaRSiHMG20B. human.
GeneWikiiHMG20B.
GenomeRNAii10362.
NextBioi39281.
PROiQ9P0W2.
SOURCEiSearch...

Gene expression databases

BgeeiQ9P0W2.
CleanExiHS_HMG20B.
ExpressionAtlasiQ9P0W2. baseline and differential.
GenevestigatoriQ9P0W2.

Family and domain databases

Gene3Di1.10.30.10. 1 hit.
InterProiIPR009071. HMG_box_dom.
[Graphical view]
PfamiPF00505. HMG_box. 1 hit.
[Graphical view]
SMARTiSM00398. HMG. 1 hit.
[Graphical view]
SUPFAMiSSF47095. SSF47095. 1 hit.
PROSITEiPS50118. HMG_BOX_2. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "HMG20A and HMG20B map to human chromosomes 15q24 and 19p13.3 and constitute a distinct class of HMG-box genes with ubiquitous expression."
    Sumoy L., Carim-Todd L., Escarceller M., Nadal M., Gratacos M., Pujana M.A., Estivill X., Peral B.
    Cytogenet. Cell Genet. 88:62-67(1999) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), TISSUE SPECIFICITY.
  2. "A human BRCA2 complex containing a structural DNA binding component influences cell cycle progression."
    Marmorstein L.Y., Kinev A.V., Chan G.K.T., Bochar D.A., Beniya H., Epstein J.A., Yen T.J., Shiekhattar R.
    Cell 104:247-257(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), INTERACTION WITH BRCA2, SUBCELLULAR LOCATION.
  3. "Characterization of human SMARCE1r high-mobility-group protein."
    Lee Y.M., Shin H., Choi W., Ahn S., Kim W.
    Biochim. Biophys. Acta 1574:269-276(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), SUBCELLULAR LOCATION, TISSUE SPECIFICITY.
  4. "Cloning a cDNA encoding an alternatively spliced protein of BRCA2-associated factor 35."
    Wang C., McCarty I.M., Balazs L., Li Y., Steiner M.S.
    Biochem. Biophys. Res. Commun. 295:129-135(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 2).
  5. "Complete sequencing and characterization of 21,243 full-length human cDNAs."
    Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R., Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H., Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.
    , Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K., Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A., Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M., Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y., Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M., Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K., Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S., Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J., Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y., Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N., Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S., Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S., Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O., Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H., Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B., Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y., Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T., Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y., Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S., Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T., Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M., Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T., Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K., Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R., Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.
    Nat. Genet. 36:40-45(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 3).
    Tissue: Cerebellum.
  6. "The DNA sequence and biology of human chromosome 19."
    Grimwood J., Gordon L.A., Olsen A.S., Terry A., Schmutz J., Lamerdin J.E., Hellsten U., Goodstein D., Couronne O., Tran-Gyamfi M., Aerts A., Altherr M., Ashworth L., Bajorek E., Black S., Branscomb E., Caenepeel S., Carrano A.V.
    , Caoile C., Chan Y.M., Christensen M., Cleland C.A., Copeland A., Dalin E., Dehal P., Denys M., Detter J.C., Escobar J., Flowers D., Fotopulos D., Garcia C., Georgescu A.M., Glavina T., Gomez M., Gonzales E., Groza M., Hammon N., Hawkins T., Haydu L., Ho I., Huang W., Israni S., Jett J., Kadner K., Kimball H., Kobayashi A., Larionov V., Leem S.-H., Lopez F., Lou Y., Lowry S., Malfatti S., Martinez D., McCready P.M., Medina C., Morgan J., Nelson K., Nolan M., Ovcharenko I., Pitluck S., Pollard M., Popkie A.P., Predki P., Quan G., Ramirez L., Rash S., Retterer J., Rodriguez A., Rogers S., Salamov A., Salazar A., She X., Smith D., Slezak T., Solovyev V., Thayer N., Tice H., Tsai M., Ustaszewska A., Vo N., Wagner M., Wheeler J., Wu K., Xie G., Yang J., Dubchak I., Furey T.S., DeJong P., Dickson M., Gordon D., Eichler E.E., Pennacchio L.A., Richardson P., Stubbs L., Rokhsar D.S., Myers R.M., Rubin E.M., Lucas S.M.
    Nature 428:529-535(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  7. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  8. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
    Tissue: Pancreas, Placenta, Skin and Uterus.
  9. Suzuki H., Schullery D., Shnyreva M.G., Ostrowski J., Denisenko O., Mochizuki S., Bomsztyk K.
    Submitted (MAY-1998) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 97-317 (ISOFORM 1).
    Tissue: Heart.
  10. "Cloning of human full open reading frames in Gateway(TM) system entry vector (pDONR201)."
    Ebert L., Schick M., Neubert P., Schatten R., Henze S., Korn B.
    Submitted (MAY-2004) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 103-317 (ISOFORM 1).
  11. "A core-BRAF35 complex containing histone deacetylase mediates repression of neuronal-specific genes."
    Hakimi M.-A., Bochar D.A., Chenoweth J., Lane W.S., Mandel G., Shiekhattar R.
    Proc. Natl. Acad. Sci. U.S.A. 99:7420-7425(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION AS A COMPONENT OF A HISTONE DEACETYLASE COMPLEX, MUTAGENESIS OF LYS-116.
  12. "A candidate X-linked mental retardation gene is a component of a new family of histone deacetylase-containing complexes."
    Hakimi M.-A., Dong Y., Lane W.S., Speicher D.W., Shiekhattar R.
    J. Biol. Chem. 278:7234-7239(2002) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION IN THE BHC COMPLEX WITH HDAC1; HDAC2; RCOR1; KDM1A; PHF21A; ZMYM2; ZNF217; ZMYM3; KIAA0182 AND GTF2I.

Entry informationi

Entry nameiHM20B_HUMAN
AccessioniPrimary (citable) accession number: Q9P0W2
Secondary accession number(s): A6NMS5
, D6W616, Q6IBP8, Q8NBD5, Q9HD21, Q9Y491, Q9Y4A2
Entry historyi
Integrated into UniProtKB/Swiss-Prot: January 3, 2005
Last sequence update: September 30, 2000
Last modified: March 31, 2015
This is version 123 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 19
    Human chromosome 19: entries, gene names and cross-references to MIM
  2. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  3. SIMILARITY comments
    Index of protein domains and families

External Data

Dasty 3

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into Uniref entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.