Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Methylcytosine dioxygenase TET3

Gene

TET3

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Dioxygenase that catalyzes the conversion of the modified genomic base 5-methylcytosine (5mC) into 5-hydroxymethylcytosine (5hmC) and plays a key role in epigenetic chromatin reprogramming in the zygote following fertilization. Also mediates subsequent conversion of 5hmC into 5-formylcytosine (5fC), and conversion of 5fC to 5-carboxylcytosine (5caC). Conversion of 5mC into 5hmC, 5fC and 5caC probably constitutes the first step in cytosine demethylation. In zygotes, DNA demethylation occurs selectively in the paternal pronucleus before the first cell division, while the adjacent maternal pronucleus and certain paternally-imprinted loci are protected from this process. Participates in DNA demethylation in the paternal pronucleus by mediating conversion of 5mC into 5hmC, 5fC and 5caC. Does not mediate DNA demethylation of maternal pronucleus because of the presence of DPPA3/PGC7 on maternal chromatin that prevents TET3-binding to chromatin (By similarity). In addition to its role in DNA demethylation, also involved in the recruitment of the O-GlcNAc transferase OGT to CpG-rich transcription start sites of active genes, thereby promoting histone H2B GlcNAcylation by OGT.By similarity1 Publication

Catalytic activityi

DNA 5-methylcytosine + 2-oxoglutarate + O2 = DNA 5-hydroxymethylcytosine + succinate + CO2.By similarity

Cofactori

Protein has several cofactor binding sites:
  • Fe2+By similarityNote: Binds 1 Fe2+ ion per subunit.By similarity
  • Zn2+By similarityNote: Binds 3 zinc ions per subunit. The zinc ions have a structural role.By similarity

Sites

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Metal bindingi693 – 6931Zinc 1By similarity
Metal bindingi695 – 6951Zinc 1By similarity
Metal bindingi753 – 7531Zinc 2By similarity
Metal bindingi779 – 7791Zinc 1; via pros nitrogenBy similarity
Metal bindingi781 – 7811Zinc 1By similarity
Binding sitei821 – 82112-oxoglutarateBy similarity
Metal bindingi831 – 8311Zinc 2By similarity
Metal bindingi833 – 8331Zinc 2By similarity
Metal bindingi849 – 8491Zinc 3By similarity
Metal bindingi858 – 8581Zinc 3By similarity
Metal bindingi918 – 9181Zinc 3By similarity
Binding sitei934 – 93412-oxoglutarateBy similarity
Metal bindingi940 – 9401Zinc 2; via tele nitrogenBy similarity
Metal bindingi942 – 9421Iron; catalyticBy similarity
Metal bindingi944 – 9441Iron; catalyticBy similarity
Binding sitei947 – 9471SubstrateBy similarity
Binding sitei976 – 97612-oxoglutarateBy similarity
Metal bindingi1538 – 15381Iron; catalyticBy similarity
Metal bindingi1569 – 15691Zinc 3; via pros nitrogenBy similarity

GO - Molecular functioni

GO - Biological processi

  • DNA demethylation Source: UniProtKB
  • DNA demethylation of male pronucleus Source: UniProtKB
  • histone H3-K4 trimethylation Source: UniProtKB
  • positive regulation of transcription from RNA polymerase II promoter Source: UniProtKB
  • protein O-linked glycosylation Source: UniProtKB
Complete GO annotation...

Keywords - Molecular functioni

Chromatin regulator, Developmental protein, Dioxygenase, Oxidoreductase

Keywords - Ligandi

DNA-binding, Iron, Metal-binding, Zinc

Enzyme and pathway databases

ReactomeiR-HSA-5221030. TET1,2,3 and TDG demethylate DNA.
SIGNORiO43151.

Names & Taxonomyi

Protein namesi
Recommended name:
Methylcytosine dioxygenase TET3 (EC:1.14.11.n2By similarity)
Gene namesi
Name:TET3
Synonyms:KIAA0401
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 2

Organism-specific databases

HGNCiHGNC:28313. TET3.

Subcellular locationi

  • Nucleus By similarity
  • Cytoplasm By similarity

  • Note: At the zygotic stage, localizes in the male pronucleus, while it localizes to the cytoplasm at other preimplantation stages.By similarity

GO - Cellular componenti

Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Nucleus

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA162405645.

Polymorphism and mutation databases

BioMutaiTET3.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 16601660Methylcytosine dioxygenase TET3PRO_0000050750Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Cross-linki1262 – 1262Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources

Keywords - PTMi

Isopeptide bond, Ubl conjugation

Proteomic databases

EPDiO43151.
MaxQBiO43151.
PaxDbiO43151.
PeptideAtlasiO43151.
PRIDEiO43151.

PTM databases

iPTMnetiO43151.
PhosphoSiteiO43151.

Expressioni

Tissue specificityi

Expressed in colon, muscle, adrenal gland and peripheral blood lymphocytes.1 Publication

Developmental stagei

Expressed in fetal brain but not adult brain.1 Publication

Gene expression databases

BgeeiO43151.
CleanExiHS_TET3.
ExpressionAtlasiO43151. baseline and differential.
GenevisibleiO43151. HS.

Organism-specific databases

HPAiHPA050845.

Interactioni

Subunit structurei

Interacts with HCFC1 and OGT.2 Publications

Binary interactionsi

WithEntry#Exp.IntActNotes
OGTO152944EBI-2831148,EBI-539828

Protein-protein interaction databases

BioGridi128327. 6 interactions.
IntActiO43151. 27 interactions.
STRINGi9606.ENSP00000386869.

Structurei

3D structure databases

ProteinModelPortaliO43151.
SMRiO43151. Positions 704-1023, 1501-1580.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni850 – 86314Interaction with DNABy similarityAdd
BLAST
Regioni1553 – 155532-oxoglutarate bindingBy similarity
Regioni1559 – 15613Substrate bindingBy similarity

Sequence similaritiesi

Belongs to the TET family.Curated

Phylogenomic databases

eggNOGiENOG410IE22. Eukaryota.
ENOG410XPWW. LUCA.
HOGENOMiHOG000154550.
HOVERGENiHBG079550.
InParanoidiO43151.
PhylomeDBiO43151.
TreeFamiTF342373.

Family and domain databases

InterProiIPR024779. 2OGFeDO_noxygenase_dom.
[Graphical view]
PfamiPF12851. Tet_JBP. 1 hit.
[Graphical view]
SMARTiSM01333. Tet_JBP. 1 hit.
[Graphical view]

Sequences (3)i

Sequence statusi: Complete.

This entry describes 3 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: O43151-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MDSGPVYHGD SRQLSASGVP VNGAREPAGP SLLGTGGPWR VDQKPDWEAA
60 70 80 90 100
PGPAHTARLE DAHDLVAFSA VAEAVSSYGA LSTRLYETFN REMSREAGNN
110 120 130 140 150
SRGPRPGPEG CSAGSEDLDT LQTALALARH GMKPPNCNCD GPECPDYLEW
160 170 180 190 200
LEGKIKSVVM EGGEERPRLP GPLPPGEAGL PAPSTRPLLS SEVPQISPQE
210 220 230 240 250
GLPLSQSALS IAKEKNISLQ TAIAIEALTQ LSSALPQPSH STPQASCPLP
260 270 280 290 300
EALSPPAPFR SPQSYLRAPS WPVVPPEEHS SFAPDSSAFP PATPRTEFPE
310 320 330 340 350
AWGTDTPPAT PRSSWPMPRP SPDPMAELEQ LLGSASDYIQ SVFKRPEALP
360 370 380 390 400
TKPKVKVEAP SSSPAPAPSP VLQREAPTPS SEPDTHQKAQ TALQQHLHHK
410 420 430 440 450
RSLFLEQVHD TSFPAPSEPS APGWWPPPSS PVPRLPDRPP KEKKKKLPTP
460 470 480 490 500
AGGPVGTEKA APGIKPSVRK PIQIKKSRPR EAQPLFPPVR QIVLEGLRSP
510 520 530 540 550
ASQEVQAHPP APLPASQGSA VPLPPEPSLA LFAPSPSRDS LLPPTQEMRS
560 570 580 590 600
PSPMTALQPG STGPLPPADD KLEELIRQFE AEFGDSFGLP GPPSVPIQDP
610 620 630 640 650
ENQQTCLPAP ESPFATRSPK QIKIESSGAV TVLSTTCFHS EEGGQEATPT
660 670 680 690 700
KAENPLTPTL SGFLESPLKY LDTPTKSLLD TPAKRAQAEF PTCDCVEQIV
710 720 730 740 750
EKDEGPYYTH LGSGPTVASI RELMEERYGE KGKAIRIEKV IYTGKEGKSS
760 770 780 790 800
RGCPIAKWVI RRHTLEEKLL CLVRHRAGHH CQNAVIVILI LAWEGIPRSL
810 820 830 840 850
GDTLYQELTD TLRKYGNPTS RRCGLNDDRT CACQGKDPNT CGASFSFGCS
860 870 880 890 900
WSMYFNGCKY ARSKTPRKFR LAGDNPKEEE VLRKSFQDLA TEVAPLYKRL
910 920 930 940 950
APQAYQNQVT NEEIAIDCRL GLKEGRPFAG VTACMDFCAH AHKDQHNLYN
960 970 980 990 1000
GCTVVCTLTK EDNRCVGKIP EDEQLHVLPL YKMANTDEFG SEENQNAKVG
1010 1020 1030 1040 1050
SGAIQVLTAF PREVRRLPEP AKSCRQRQLE ARKAAAEKKK IQKEKLSTPE
1060 1070 1080 1090 1100
KIKQEALELA GITSDPGLSL KGGLSQQGLK PSLKVEPQNH FSSFKYSGNA
1110 1120 1130 1140 1150
VVESYSVLGN CRPSDPYSMN SVYSYHSYYA QPSLTSVNGF HSKYALPSFS
1160 1170 1180 1190 1200
YYGFPSSNPV FPSQFLGPGA WGHSGSSGSF EKKPDLHALH NSLSPAYGGA
1210 1220 1230 1240 1250
EFAELPSQAV PTDAHHPTPH HQQPAYPGPK EYLLPKAPLL HSVSRDPSPF
1260 1270 1280 1290 1300
AQSSNCYNRS IKQEPVDPLT QAEPVPRDAG KMGKTPLSEV SQNGGPSHLW
1310 1320 1330 1340 1350
GQYSGGPSMS PKRTNGVGGS WGVFSSGESP AIVPDKLSSF GASCLAPSHF
1360 1370 1380 1390 1400
TDGQWGLFPG EGQQAASHSG GRLRGKPWSP CKFGNSTSAL AGPSLTEKPW
1410 1420 1430 1440 1450
ALGAGDFNSA LKGSPGFQDK LWNPMKGEEG RIPAAGASQL DRAWQSFGLP
1460 1470 1480 1490 1500
LGSSEKLFGA LKSEEKLWDP FSLEEGPAEE PPSKGAVKEE KGGGGAEEEE
1510 1520 1530 1540 1550
EELWSDSEHN FLDENIGGVA VAPAHGSILI ECARRELHAT TPLKKPNRCH
1560 1570 1580 1590 1600
PTRISLVFYQ HKNLNQPNHG LALWEAKMKQ LAERARARQE EAARLGLGQQ
1610 1620 1630 1640 1650
EAKLYGKKRK WGGTVVAEPQ QKEKKGVVPT RQALAVPTDS AVTVSSYAYT
1660
KVTGPYSRWI
Length:1,660
Mass (Da):179,350
Last modified:June 10, 2008 - v3
Checksum:i181024BFAC9B54D2
GO
Isoform 2 (identifier: O43151-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1440-1555: Missing.

Note: No experimental confirmation available.
Show »
Length:1,544
Mass (Da):166,706
Checksum:iDD049A7E3E8C6941
GO
Isoform 3 (identifier: O43151-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     728-1660: Missing.

Note: No experimental confirmation available.
Show »
Length:727
Mass (Da):77,471
Checksum:i936458DE78E8F254
GO

Sequence cautioni

The sequence AAH22243.1 differs from that shown. Reason: Erroneous initiation. Curated
The sequence AAX93057.1 differs from that shown. Reason: Erroneous gene model prediction. Curated
The sequence BAA23697.1 differs from that shown. Reason: Erroneous initiation. Curated

Natural variant

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti577 – 5771R → Q.
Corresponds to variant rs57955681 [ dbSNP | Ensembl ].
VAR_062235

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei728 – 1660933Missing in isoform 3. 1 PublicationVSP_034192Add
BLAST
Alternative sequencei1440 – 1555116Missing in isoform 2. 1 PublicationVSP_021628Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AC073263 Genomic DNA. Translation: AAX93057.1. Sequence problems.
AC110801 Genomic DNA. No translation available.
AF466365 mRNA. Translation: AAO33386.1.
BC022243 mRNA. Translation: AAH22243.1. Different initiation.
AB007861 mRNA. Translation: BAA23697.1. Different initiation.
UniGeneiHs.516107.

Genome annotation databases

EnsembliENST00000409262; ENSP00000386869; ENSG00000187605.
UCSCiuc002skb.6. human. [O43151-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AC073263 Genomic DNA. Translation: AAX93057.1. Sequence problems.
AC110801 Genomic DNA. No translation available.
AF466365 mRNA. Translation: AAO33386.1.
BC022243 mRNA. Translation: AAH22243.1. Different initiation.
AB007861 mRNA. Translation: BAA23697.1. Different initiation.
UniGeneiHs.516107.

3D structure databases

ProteinModelPortaliO43151.
SMRiO43151. Positions 704-1023, 1501-1580.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi128327. 6 interactions.
IntActiO43151. 27 interactions.
STRINGi9606.ENSP00000386869.

PTM databases

iPTMnetiO43151.
PhosphoSiteiO43151.

Polymorphism and mutation databases

BioMutaiTET3.

Proteomic databases

EPDiO43151.
MaxQBiO43151.
PaxDbiO43151.
PeptideAtlasiO43151.
PRIDEiO43151.

Protocols and materials databases

DNASUi200424.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000409262; ENSP00000386869; ENSG00000187605.
UCSCiuc002skb.6. human. [O43151-1]

Organism-specific databases

GeneCardsiTET3.
H-InvDBHIX0200247.
HGNCiHGNC:28313. TET3.
HPAiHPA050845.
MIMi613555. gene.
neXtProtiNX_O43151.
PharmGKBiPA162405645.
HUGEiSearch...
GenAtlasiSearch...

Phylogenomic databases

eggNOGiENOG410IE22. Eukaryota.
ENOG410XPWW. LUCA.
HOGENOMiHOG000154550.
HOVERGENiHBG079550.
InParanoidiO43151.
PhylomeDBiO43151.
TreeFamiTF342373.

Enzyme and pathway databases

ReactomeiR-HSA-5221030. TET1,2,3 and TDG demethylate DNA.
SIGNORiO43151.

Miscellaneous databases

ChiTaRSiTET3. human.
PROiO43151.
SOURCEiSearch...

Gene expression databases

BgeeiO43151.
CleanExiHS_TET3.
ExpressionAtlasiO43151. baseline and differential.
GenevisibleiO43151. HS.

Family and domain databases

InterProiIPR024779. 2OGFeDO_noxygenase_dom.
[Graphical view]
PfamiPF12851. Tet_JBP. 1 hit.
[Graphical view]
SMARTiSM01333. Tet_JBP. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Generation and annotation of the DNA sequences of human chromosomes 2 and 4."
    Hillier L.W., Graves T.A., Fulton R.S., Fulton L.A., Pepin K.H., Minx P., Wagner-McPherson C., Layman D., Wylie K., Sekhon M., Becker M.C., Fewell G.A., Delehaunty K.D., Miner T.L., Nash W.E., Kremitzki C., Oddy L., Du H.
    , Sun H., Bradshaw-Cordum H., Ali J., Carter J., Cordes M., Harris A., Isak A., van Brunt A., Nguyen C., Du F., Courtney L., Kalicki J., Ozersky P., Abbott S., Armstrong J., Belter E.A., Caruso L., Cedroni M., Cotton M., Davidson T., Desai A., Elliott G., Erb T., Fronick C., Gaige T., Haakenson W., Haglund K., Holmes A., Harkins R., Kim K., Kruchowski S.S., Strong C.M., Grewal N., Goyea E., Hou S., Levy A., Martinka S., Mead K., McLellan M.D., Meyer R., Randall-Maher J., Tomlinson C., Dauphin-Kohlberg S., Kozlowicz-Reilly A., Shah N., Swearengen-Shahid S., Snider J., Strong J.T., Thompson J., Yoakum M., Leonard S., Pearman C., Trani L., Radionenko M., Waligorski J.E., Wang C., Rock S.M., Tin-Wollam A.-M., Maupin R., Latreille P., Wendl M.C., Yang S.-P., Pohl C., Wallis J.W., Spieth J., Bieri T.A., Berkowicz N., Nelson J.O., Osborne J., Ding L., Meyer R., Sabo A., Shotland Y., Sinha P., Wohldmann P.E., Cook L.L., Hickenbotham M.T., Eldred J., Williams D., Jones T.A., She X., Ciccarelli F.D., Izaurralde E., Taylor J., Schmutz J., Myers R.M., Cox D.R., Huang X., McPherson J.D., Mardis E.R., Clifton S.W., Warren W.C., Chinwalla A.T., Eddy S.R., Marra M.A., Ovcharenko I., Furey T.S., Miller W., Eichler E.E., Bork P., Suyama M., Torrents D., Waterston R.H., Wilson R.K.
    Nature 434:724-731(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  2. Kim N.-S., Shon H.-Y., Oh J.-H., Lee J.-Y., Kim J.-M., Hahn Y., Kim Y.
    Submitted (JAN-2002) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 280-1660 (ISOFORM 3).
  3. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 714-1660 (ISOFORM 1).
    Tissue: Duodenum.
  4. "Prediction of the coding sequences of unidentified human genes. VIII. 78 new cDNA clones from brain which code for large proteins in vitro."
    Ishikawa K., Nagase T., Nakajima D., Seki N., Ohira M., Miyajima N., Tanaka A., Kotani H., Nomura N., Ohara O.
    DNA Res. 4:307-313(1997) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 1201-1660 (ISOFORM 2).
    Tissue: Brain.
  5. "TET1, a member of a novel protein family, is fused to MLL in acute myeloid leukemia containing the t(10;11)(q22;q23)."
    Lorsbach R.B., Moore J., Mathew S., Raimondi S.C., Mukatira S.T., Downing J.R.
    Leukemia 17:637-641(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE.
  6. Cited for: FUNCTION, INTERACTION WITH HCFC1 AND OGT.
  7. "TET2 promotes histone O-GlcNAcylation during gene transcription."
    Chen Q., Chen Y., Bian C., Fujiki R., Yu X.
    Nature 493:561-564(2013) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH OGT.
  8. "Uncovering global SUMOylation signaling networks in a site-specific manner."
    Hendriks I.A., D'Souza R.C., Yang B., Verlaan-de Vries M., Mann M., Vertegaal A.C.
    Nat. Struct. Mol. Biol. 21:927-936(2014) [PubMed] [Europe PMC] [Abstract]
    Cited for: SUMOYLATION [LARGE SCALE ANALYSIS] AT LYS-1262, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].

Entry informationi

Entry nameiTET3_HUMAN
AccessioniPrimary (citable) accession number: O43151
Secondary accession number(s): A6NEI3, Q86Z24, Q8TBM9
Entry historyi
Integrated into UniProtKB/Swiss-Prot: February 21, 2001
Last sequence update: June 10, 2008
Last modified: July 6, 2016
This is version 108 of the entry and version 3 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Caution

Subsequent steps in cytosine demethylation are subject to discussion. According to a first model cytosine demethylation occurs through deamination of 5hmC into 5-hydroxymethyluracil (5hmU) and subsequent replacement by unmethylated cytosine by the base excision repair system. According to another model, cytosine demethylation is rather mediated via conversion of 5hmC into 5fC and 5caC, followed by excision by TDG.Curated

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 2
    Human chromosome 2: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.