Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Methyl-CpG-binding domain protein 1

Gene

Mbd1

Organism
Mus musculus (Mouse)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Transcriptional repressor that binds CpG islands in promoters where the DNA is methylated at position 5 of cytosine within CpG dinucleotides. Binding is abolished by the presence of 7-mG that is produced by DNA damage by methylmethanesulfonate (MMS). Acts as transcriptional repressor and plays a role in gene silencing by recruiting AFT7IP, which in turn recruits factors such as the histone methyltransferase SETDB1. Probably forms a complex with SETDB1 and ATF7IP that represses transcription and couples DNA methylation and histone 'Lys-9' trimethylation. Isoform 1 can also repress transcription from unmethylated promoters.2 Publications

Regions

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Zinc fingeri187 – 23448CXXC-type 1PROSITE-ProRule annotationAdd
BLAST
Zinc fingeri235 – 28147CXXC-type 2PROSITE-ProRule annotationAdd
BLAST
Zinc fingeri348 – 39649CXXC-type 3PROSITE-ProRule annotationAdd
BLAST

GO - Molecular functioni

GO - Biological processi

Complete GO annotation...

Keywords - Biological processi

Transcription, Transcription regulation

Keywords - Ligandi

DNA-binding, Metal-binding, Zinc

Names & Taxonomyi

Protein namesi
Recommended name:
Methyl-CpG-binding domain protein 1
Alternative name(s):
Methyl-CpG-binding protein MBD1
Gene namesi
Name:Mbd1
OrganismiMus musculus (Mouse)
Taxonomic identifieri10090 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeMusMus
Proteomesi
  • UP000000589 Componenti: Unplaced

Organism-specific databases

MGIiMGI:1333811. Mbd1.

Subcellular locationi

GO - Cellular componenti

  • chromatin Source: MGI
  • cytoplasm Source: MGI
  • heterochromatin Source: MGI
  • nuclear matrix Source: UniProtKB
  • nuclear speck Source: UniProtKB
  • nucleus Source: MGI
Complete GO annotation...

Keywords - Cellular componenti

Chromosome, Nucleus

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 636636Methyl-CpG-binding domain protein 1PRO_0000096259Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Cross-linki293 – 293Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)By similarity
Modified residuei409 – 4091PhosphoserineBy similarity
Cross-linki443 – 443Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)By similarity
Cross-linki520 – 520Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO)By similarity
Cross-linki559 – 559Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO); alternateBy similarity
Cross-linki559 – 559Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2); alternateBy similarity

Post-translational modificationi

Sumoylated with SUMO1 by PIAS1 and PIAS3. Sumoylation affects transcriptional silencing by preventing the interaction with SETDB1. In contrast, sumoylation may increase interaction with AFT7IP (By similarity).By similarity

Keywords - PTMi

Isopeptide bond, Phosphoprotein, Ubl conjugation

Proteomic databases

EPDiQ9Z2E2.
MaxQBiQ9Z2E2.
PeptideAtlasiQ9Z2E2.
PRIDEiQ9Z2E2.

PTM databases

iPTMnetiQ9Z2E2.
PhosphoSiteiQ9Z2E2.

Expressioni

Tissue specificityi

Highly expressed in kidney, liver and brain. Detected at lower levels in heart, lung, skeletal muscle, spleen and testis.1 Publication

Gene expression databases

BgeeiQ9Z2E2.
CleanExiMM_MBD1.

Interactioni

Subunit structurei

Interacts with OASL, AFT7IP, AFT7IP2 and BAHD1. Binds CHAF1A and the SUV39H1-CBX5 complex via the MBD domain. Binds MGP via the TRD domain. May be part of the MeCP1 complex. During DNA replication, it recruits SETDB1 to form a S phase-specific complex that facilitates methylation of H3 'Lys-9' during replication-coupled chromatin assembly and is at least composed of the CAF-1 subunit CHAF1A, MBD1 and SETDB1 (By similarity). Isoform 2 interacts with the Ten-1 ICD form of TENM1.By similarity1 Publication

Protein-protein interaction databases

BioGridi201330. 1 interaction.
MINTiMINT-1342412.

Structurei

3D structure databases

ProteinModelPortaliQ9Z2E2.
SMRiQ9Z2E2. Positions 1-75, 185-284, 354-395.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini1 – 6969MBDPROSITE-ProRule annotationAdd
BLAST

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni550 – 61263TRDAdd
BLAST

Motif

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Motifi84 – 885Nuclear localization signalSequence analysis

Domaini

The methyl-CpG-binding domain (MBD) functions both in binding to methylated DNA and in protein interactions.By similarity
The third CXXC-type zinc finger mediates binding to non-methylated CpG dinucleotides.By similarity
The transcriptional repression domain (TRD) is involved in transcription repression and in protein interactions.By similarity

Sequence similaritiesi

Contains 3 CXXC-type zinc fingers.PROSITE-ProRule annotation
Contains 1 MBD (methyl-CpG-binding) domain.PROSITE-ProRule annotation

Zinc finger

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Zinc fingeri187 – 23448CXXC-type 1PROSITE-ProRule annotationAdd
BLAST
Zinc fingeri235 – 28147CXXC-type 2PROSITE-ProRule annotationAdd
BLAST
Zinc fingeri348 – 39649CXXC-type 3PROSITE-ProRule annotationAdd
BLAST

Keywords - Domaini

Repeat, Zinc-finger

Phylogenomic databases

HOVERGENiHBG052416.
InParanoidiQ9Z2E2.
KOiK11589.
OrthoDBiEOG7QNVMV.
PhylomeDBiQ9Z2E2.
TreeFamiTF350557.

Family and domain databases

Gene3Di3.30.890.10. 1 hit.
InterProiIPR016177. DNA-bd_dom.
IPR001739. Methyl_CpG_DNA-bd.
IPR002857. Znf_CXXC.
[Graphical view]
PfamiPF01429. MBD. 1 hit.
PF02008. zf-CXXC. 3 hits.
[Graphical view]
SMARTiSM00391. MBD. 1 hit.
[Graphical view]
SUPFAMiSSF54171. SSF54171. 1 hit.
PROSITEiPS50982. MBD. 1 hit.
PS51058. ZF_CXXC. 3 hits.
[Graphical view]

Sequences (5)i

Sequence statusi: Complete.

This entry describes 5 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q9Z2E2-1) [UniParc]FASTAAdd to basket

Also known as: MBD1a

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MAESWQDCPA LGPGWKRRES FRKSGASFGR SDIYYQSPTG EKIRSKVELT
60 70 80 90 100
RYLGPACDLT LFDFRQGTLC HPIPKTHPLA VPSKKKKKPS KPAKTKKQQV
110 120 130 140 150
GLQRSEVRIE TPQGEYKAPT ATALASLSVS ASASSSASAS ASASSHAPVC
160 170 180 190 200
CENCGIHFSW DGVKRQRLKT LCKDCRAQRI AFNREQRMFK RVGCGDCAAC
210 220 230 240 250
LVKEDCGVCS TCRLQLPSDV ASGLYCKCER RRCLRIMEKS RGCGVCRGCQ
260 270 280 290 300
TQEDCGHCCI CLRSPRPGLK RQWRCLQRRC FWGKRDSSKR GSKVASQRHS
310 320 330 340 350
QAPPLPPHPA SQYTEPTELH ISDIAPTSPA EFIYYCVDED EDELQPYTNQ
360 370 380 390 400
RQNRKCGACA ACLRRMDCGR CDFCCDKPKF GGGNQKRQKC RWRQCLQFAM
410 420 430 440 450
KRLLPSAGSG SGEGAGLRPY QTHQTHQKRP ASARQLQLSS PLKAPWAVVT
460 470 480 490 500
APPGPVRDSR KQQAGRGSVL PQPDTDFVFL QEGTSSAMQM PGTAAASTEV
510 520 530 540 550
PVQAAQCSAP SWVVALPQVK QETADAPEEW TAVTTFLTSS TLQSGFPSKA
560 570 580 590 600
ADPDLSPVKQ EPPGPEEDGE EKKDDVSETT PAEEIGGVGT PVITEIFSLG
610 620 630
GTRLRDAEAW LPRLHKLLAV NEKEYFTELQ LKEEVL
Length:636
Mass (Da):70,023
Last modified:July 24, 2007 - v2
Checksum:i02D8B94EBD522F65
GO
Isoform 2 (identifier: Q9Z2E2-2) [UniParc]FASTAAdd to basket

Also known as: MBD1b

The sequence of this isoform differs from the canonical sequence as follows:
     142-142: S → SSSASASAS
     345-400: Missing.

Show »
Length:588
Mass (Da):64,124
Checksum:i5C560DE1FDFA7D77
GO
Isoform 3 (identifier: Q9Z2E2-3) [UniParc]FASTAAdd to basket

Also known as: MBD1d

The sequence of this isoform differs from the canonical sequence as follows:
     142-142: S → SSSASASAS
     345-400: Missing.
     614-625: LHKLLAVNEKEY → SKDLKNPEAKMQ
     626-636: Missing.

Show »
Length:577
Mass (Da):62,725
Checksum:i5D4C5CDDC6FD98BF
GO
Isoform 4 (identifier: Q9Z2E2-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     345-400: Missing.

Show »
Length:580
Mass (Da):63,475
Checksum:i90686B0BDD2E34B2
GO
Isoform 5 (identifier: Q9Z2E2-5) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     345-400: Missing.
     614-625: LHKLLAVNEKEY → SKDLKNPEAKMQ
     626-636: Missing.

Show »
Length:569
Mass (Da):62,076
Checksum:iB450A440E447410B
GO

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti109 – 1091I → R in BAB23419 (PubMed:16141072).Curated
Sequence conflicti109 – 1091I → R in AAH69837 (PubMed:15489334).Curated
Sequence conflicti146 – 1461H → R in AAH69837 (PubMed:15489334).Curated
Sequence conflicti457 – 4571R → Q in AAC68869 (PubMed:9774669).Curated
Sequence conflicti457 – 4571R → Q in AAD48908 (PubMed:10441743).Curated
Sequence conflicti457 – 4571R → Q in BAE33700 (PubMed:16141072).Curated
Sequence conflicti484 – 4841T → A in AAC68869 (PubMed:9774669).Curated
Sequence conflicti484 – 4841T → A in AAD48908 (PubMed:10441743).Curated
Sequence conflicti484 – 4841T → A in BAE33700 (PubMed:16141072).Curated
Sequence conflicti497 – 4971S → C in AAC68869 (PubMed:9774669).Curated
Sequence conflicti497 – 4971S → C in AAD48908 (PubMed:10441743).Curated
Sequence conflicti497 – 4971S → C in BAE33700 (PubMed:16141072).Curated
Sequence conflicti556 – 5561S → P in AAC68869 (PubMed:9774669).Curated
Sequence conflicti556 – 5561S → P in AAD48908 (PubMed:10441743).Curated
Sequence conflicti556 – 5561S → P in BAE33700 (PubMed:16141072).Curated
Sequence conflicti604 – 6041L → F in AAH69837 (PubMed:15489334).Curated
Isoform 2 (identifier: Q9Z2E2-2)
Sequence conflicti144 – 1441S → F in AAH69837 (PubMed:15489334).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei142 – 1421S → SSSASASAS in isoform 2 and isoform 3. 2 PublicationsVSP_011072
Alternative sequencei345 – 40056Missing in isoform 2, isoform 3, isoform 4 and isoform 5. 2 PublicationsVSP_011073Add
BLAST
Alternative sequencei614 – 62512LHKLL…NEKEY → SKDLKNPEAKMQ in isoform 3 and isoform 5. 1 PublicationVSP_011074Add
BLAST
Alternative sequencei626 – 63611Missing in isoform 3 and isoform 5. 1 PublicationVSP_011075Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF072240 mRNA. Translation: AAC68869.1.
AF120978 Genomic DNA. Translation: AAD48908.1.
AK004624 mRNA. Translation: BAB23419.1.
AK032535 mRNA. Translation: BAC27914.1.
AK156401 mRNA. Translation: BAE33700.1.
AK166042 mRNA. Translation: BAE38538.1.
BC069837 mRNA. Translation: AAH69837.1.
CCDSiCCDS50321.1. [Q9Z2E2-2]
RefSeqiNP_038622.2. NM_013594.2.
UniGeneiMm.22522.

Genome annotation databases

GeneIDi17190.
KEGGimmu:17190.
UCSCiuc008fpj.2. mouse. [Q9Z2E2-2]
uc008fpk.1. mouse. [Q9Z2E2-5]
uc012bex.1. mouse. [Q9Z2E2-1]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF072240 mRNA. Translation: AAC68869.1.
AF120978 Genomic DNA. Translation: AAD48908.1.
AK004624 mRNA. Translation: BAB23419.1.
AK032535 mRNA. Translation: BAC27914.1.
AK156401 mRNA. Translation: BAE33700.1.
AK166042 mRNA. Translation: BAE38538.1.
BC069837 mRNA. Translation: AAH69837.1.
CCDSiCCDS50321.1. [Q9Z2E2-2]
RefSeqiNP_038622.2. NM_013594.2.
UniGeneiMm.22522.

3D structure databases

ProteinModelPortaliQ9Z2E2.
SMRiQ9Z2E2. Positions 1-75, 185-284, 354-395.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi201330. 1 interaction.
MINTiMINT-1342412.

PTM databases

iPTMnetiQ9Z2E2.
PhosphoSiteiQ9Z2E2.

Proteomic databases

EPDiQ9Z2E2.
MaxQBiQ9Z2E2.
PeptideAtlasiQ9Z2E2.
PRIDEiQ9Z2E2.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

GeneIDi17190.
KEGGimmu:17190.
UCSCiuc008fpj.2. mouse. [Q9Z2E2-2]
uc008fpk.1. mouse. [Q9Z2E2-5]
uc012bex.1. mouse. [Q9Z2E2-1]

Organism-specific databases

CTDi4152.
MGIiMGI:1333811. Mbd1.

Phylogenomic databases

HOVERGENiHBG052416.
InParanoidiQ9Z2E2.
KOiK11589.
OrthoDBiEOG7QNVMV.
PhylomeDBiQ9Z2E2.
TreeFamiTF350557.

Miscellaneous databases

PROiQ9Z2E2.
SOURCEiSearch...

Gene expression databases

BgeeiQ9Z2E2.
CleanExiMM_MBD1.

Family and domain databases

Gene3Di3.30.890.10. 1 hit.
InterProiIPR016177. DNA-bd_dom.
IPR001739. Methyl_CpG_DNA-bd.
IPR002857. Znf_CXXC.
[Graphical view]
PfamiPF01429. MBD. 1 hit.
PF02008. zf-CXXC. 3 hits.
[Graphical view]
SMARTiSM00391. MBD. 1 hit.
[Graphical view]
SUPFAMiSSF54171. SSF54171. 1 hit.
PROSITEiPS50982. MBD. 1 hit.
PS51058. ZF_CXXC. 3 hits.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Identification and characterization of a family of mammalian methyl-CpG binding proteins."
    Hendrich B., Bird A.
    Mol. Cell. Biol. 18:6538-6547(1998) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), FUNCTION, TISSUE SPECIFICITY.
    Strain: C57BL/6J.
    Tissue: Brain.
  2. "Genomic structure and chromosomal mapping of the murine and human mbd1, mbd2, mbd3, and mbd4 genes."
    Hendrich B., Abbott C., McQueen H., Chambers D., Cross S.H., Bird A.
    Mamm. Genome 10:906-912(1999) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA].
    Strain: 129.
  3. "The transcriptional landscape of the mammalian genome."
    Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C., Kodzius R., Shimokawa K., Bajic V.B., Brenner S.E., Batalov S., Forrest A.R., Zavolan M., Davis M.J.
    , Wilming L.G., Aidinis V., Allen J.E., Ambesi-Impiombato A., Apweiler R., Aturaliya R.N., Bailey T.L., Bansal M., Baxter L., Beisel K.W., Bersano T., Bono H., Chalk A.M., Chiu K.P., Choudhary V., Christoffels A., Clutterbuck D.R., Crowe M.L., Dalla E., Dalrymple B.P., de Bono B., Della Gatta G., di Bernardo D., Down T., Engstrom P., Fagiolini M., Faulkner G., Fletcher C.F., Fukushima T., Furuno M., Futaki S., Gariboldi M., Georgii-Hemming P., Gingeras T.R., Gojobori T., Green R.E., Gustincich S., Harbers M., Hayashi Y., Hensch T.K., Hirokawa N., Hill D., Huminiecki L., Iacono M., Ikeo K., Iwama A., Ishikawa T., Jakt M., Kanapin A., Katoh M., Kawasawa Y., Kelso J., Kitamura H., Kitano H., Kollias G., Krishnan S.P., Kruger A., Kummerfeld S.K., Kurochkin I.V., Lareau L.F., Lazarevic D., Lipovich L., Liu J., Liuni S., McWilliam S., Madan Babu M., Madera M., Marchionni L., Matsuda H., Matsuzawa S., Miki H., Mignone F., Miyake S., Morris K., Mottagui-Tabar S., Mulder N., Nakano N., Nakauchi H., Ng P., Nilsson R., Nishiguchi S., Nishikawa S., Nori F., Ohara O., Okazaki Y., Orlando V., Pang K.C., Pavan W.J., Pavesi G., Pesole G., Petrovsky N., Piazza S., Reed J., Reid J.F., Ring B.Z., Ringwald M., Rost B., Ruan Y., Salzberg S.L., Sandelin A., Schneider C., Schoenbach C., Sekiguchi K., Semple C.A., Seno S., Sessa L., Sheng Y., Shibata Y., Shimada H., Shimada K., Silva D., Sinclair B., Sperling S., Stupka E., Sugiura K., Sultana R., Takenaka Y., Taki K., Tammoja K., Tan S.L., Tang S., Taylor M.S., Tegner J., Teichmann S.A., Ueda H.R., van Nimwegen E., Verardo R., Wei C.L., Yagi K., Yamanishi H., Zabarovsky E., Zhu S., Zimmer A., Hide W., Bult C., Grimmond S.M., Teasdale R.D., Liu E.T., Brusic V., Quackenbush J., Wahlestedt C., Mattick J.S., Hume D.A., Kai C., Sasaki D., Tomaru Y., Fukuda S., Kanamori-Katayama M., Suzuki M., Aoki J., Arakawa T., Iida J., Imamura K., Itoh M., Kato T., Kawaji H., Kawagashira N., Kawashima T., Kojima M., Kondo S., Konno H., Nakano K., Ninomiya N., Nishio T., Okada M., Plessy C., Shibata K., Shiraki T., Suzuki S., Tagami M., Waki K., Watahiki A., Okamura-Oho Y., Suzuki H., Kawai J., Hayashizaki Y.
    Science 309:1559-1563(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 2; 3; 4 AND 5).
    Strain: C57BL/6J and NOD.
    Tissue: Lung, Olfactory bulb and Spleen.
  4. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2).
    Strain: C57BL/6J.
    Tissue: Embryo.
  5. "The methyl-CpG binding protein MBD1 interacts with the p150 subunit of chromatin assembly factor 1."
    Reese B.E., Bachman K.E., Baylin S.B., Rountree M.R.
    Mol. Cell. Biol. 23:3226-3236(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: SUBCELLULAR LOCATION.
  6. "Mbd1 is recruited to both methylated and nonmethylated CpGs via distinct DNA binding domains."
    Joergensen H.F., Ben-Porath I., Bird A.P.
    Mol. Cell. Biol. 24:3387-3395(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, ALTERNATIVE SPLICING.
  7. "The intracellular domain of teneurin-1 interacts with MBD1 and CAP/ponsin resulting in subcellular codistribution and translocation to the nuclear matrix."
    Nunes S.M., Ferralli J., Choi K., Brown-Luedi M., Minet A.D., Chiquet-Ehrismann R.
    Exp. Cell Res. 305:122-132(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH TENM1, SUBCELLULAR LOCATION.

Entry informationi

Entry nameiMBD1_MOUSE
AccessioniPrimary (citable) accession number: Q9Z2E2
Secondary accession number(s): Q3TMA4
, Q3U101, Q6NSW0, Q792D6, Q8CCL9, Q9DC19
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 19, 2004
Last sequence update: July 24, 2007
Last modified: July 6, 2016
This is version 130 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. MGD cross-references
    Mouse Genome Database (MGD) cross-references in UniProtKB/Swiss-Prot
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.