Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

WD repeat-containing protein 81

Gene

WDR81

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Seems to have a role in maintenance of normal mitochondrial structure and organization. Promotes Purkinje and photoreceptor cell survival.By similarity

GO - Molecular functioni

GO - Biological processi

Complete GO annotation...

Enzyme and pathway databases

SignaLinkiQ562E7.

Names & Taxonomyi

Protein namesi
Recommended name:
WD repeat-containing protein 81
Gene namesi
Name:WDR81
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 17

Organism-specific databases

HGNCiHGNC:26600. WDR81.

Subcellular locationi

  • Mitochondrion By similarity

GO - Cellular componenti

Complete GO annotation...

Keywords - Cellular componenti

Mitochondrion

Pathology & Biotechi

Involvement in diseasei

Cerebellar ataxia, mental retardation, and dysequilibrium syndrome 2 (CAMRQ2)1 Publication
The disease is caused by mutations affecting the gene represented in this entry.
Disease descriptionA congenital cerebellar ataxia associated with cerebellar hypoplasia, mental retardation, and inability to walk bipedally, resulting in quadrupedal locomotion as a functional adaptation. Additional findings include generalized brain atrophy and mild hypoplasia of the corpus callosum.
See also OMIM:610185
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_068220856P → L in CAMRQ2. 1 PublicationCorresponds to variant rs587776906dbSNPEnsembl.1

Keywords - Diseasei

Disease mutation, Mental retardation

Organism-specific databases

DisGeNETi124997.
MalaCardsiWDR81.
MIMi610185. phenotype.
OpenTargetsiENSG00000167716.
ENSG00000276021.
Orphaneti1766. Dysequilibrium syndrome.
PharmGKBiPA142670584.

Polymorphism and mutation databases

BioMutaiWDR81.
DMDMi403314383.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_0000247244? – 1941WD repeat-containing protein 81
Transit peptidei1 – ?MitochondrionCurated

Proteomic databases

EPDiQ562E7.
MaxQBiQ562E7.
PaxDbiQ562E7.
PeptideAtlasiQ562E7.
PRIDEiQ562E7.

PTM databases

iPTMnetiQ562E7.
PhosphoSitePlusiQ562E7.

Expressioni

Tissue specificityi

Widely expressed. In the brain, highest levels in cerebellum and corpus callosum.1 Publication

Gene expression databases

BgeeiENSG00000167716.
CleanExiHS_WDR81.
ExpressionAtlasiQ562E7. baseline and differential.
GenevisibleiQ562E7. HS.

Organism-specific databases

HPAiHPA023044.

Interactioni

GO - Molecular functioni

Protein-protein interaction databases

BioGridi125911. 13 interactors.
IntActiQ562E7. 6 interactors.
STRINGi9606.ENSP00000386609.

Structurei

3D structure databases

ProteinModelPortaliQ562E7.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini337 – 614BEACHPROSITE-ProRule annotationAdd BLAST278
Repeati1639 – 1677WD 1Add BLAST39
Repeati1686 – 1724WD 2Add BLAST39
Repeati1729 – 1769WD 3Add BLAST41
Repeati1777 – 1815WD 4Add BLAST39
Repeati1819 – 1856WD 5Add BLAST38
Repeati1860 – 1896WD 6Add BLAST37
Repeati1902 – 1941WD 7Add BLAST40

Compositional bias

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Compositional biasi1150 – 1217Glu-richAdd BLAST68
Compositional biasi1587 – 1596Poly-Gly10

Sequence similaritiesi

Contains 1 BEACH domain.PROSITE-ProRule annotation
Contains 7 WD repeats.PROSITE-ProRule annotation

Keywords - Domaini

Repeat, Transit peptide, WD repeat

Phylogenomic databases

eggNOGiKOG1786. Eukaryota.
KOG4190. Eukaryota.
ENOG410XR4I. LUCA.
GeneTreeiENSGT00390000003969.
HOGENOMiHOG000154851.
HOVERGENiHBG083542.
InParanoidiQ562E7.
KOiK17601.
OMAiDFTYEMT.
OrthoDBiEOG091G00OI.
PhylomeDBiQ562E7.
TreeFamiTF323353.

Family and domain databases

CDDicd06071. Beach. 1 hit.
Gene3Di1.10.1540.10. 1 hit.
2.130.10.10. 1 hit.
InterProiIPR000409. BEACH_dom.
IPR011009. Kinase-like_dom.
IPR015943. WD40/YVTN_repeat-like_dom.
IPR001680. WD40_repeat.
IPR017986. WD40_repeat_dom.
[Graphical view]
PfamiPF02138. Beach. 1 hit.
PF00400. WD40. 1 hit.
[Graphical view]
SMARTiSM01026. Beach. 1 hit.
SM00320. WD40. 6 hits.
[Graphical view]
SUPFAMiSSF50978. SSF50978. 1 hit.
SSF56112. SSF56112. 1 hit.
SSF81837. SSF81837. 1 hit.
PROSITEiPS50197. BEACH. 1 hit.
PS50082. WD_REPEATS_2. 1 hit.
PS50294. WD_REPEATS_REGION. 1 hit.
[Graphical view]

Sequences (6)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 6 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q562E7-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MAQGSGGREG ALRTPAGGWH SPPSPDMQEL LRSVERDLSI DPRQLAPAPG
60 70 80 90 100
GTHVVALVPA RWLASLRDRR LPLGPCPRAE GLGEAEVRTL LQRSVQRLPA
110 120 130 140 150
GWTRVEVHGL RKRRLSYPLG GGLPFEDGSC GPETLTRFMQ EVAAQNYRNL
160 170 180 190 200
WRHAYHTYGQ PYSHSPAPSA VPALDSVRQA LQRVYGCSFL PVGETTQCPS
210 220 230 240 250
YAREGPCPPR GSPACPSLLR AEALLESPEM LYVVHPYVQF SLHDVVTFSP
260 270 280 290 300
AKLTNSQAKV LFILFRVLRA MDACHRQGLA CGALSLYHIA VDEKLCSELR
310 320 330 340 350
LDLSAYERPE EDENEEAPVA RDEAGIVSQE EQGGQPGQPT GQEELRSLVL
360 370 380 390 400
DWVHGRISNF HYLMQLNRLA GRRQGDPNYH PVLPWVVDFT TPHGRFRDLR
410 420 430 440 450
KSKFRLNKGD KQLDFTYEMT RQAFVAGGAG GGEPPHVPHH ISDVLSDITY
460 470 480 490 500
YVYKARRTPR SVLCGHVRAQ WEPHEYPASM ERMQNWTPDE CIPEFYTDPS
510 520 530 540 550
IFRSIHPDMP DLDVPAWCSS SQEFVAAHRA LLESREVSRD LHHWIDLTFG
560 570 580 590 600
YKLQGKEAVK EKNVCLHLVD AHTHLASYGV VQLFDQPHPQ RLAGAPALAP
610 620 630 640 650
EPPLIPKLLV QTIQETTGRE DFTENPGQLP NGVGRPVLEA TPCEASWTRD
660 670 680 690 700
RPVAGEDDLE QATEALDSIS LAGKAGDQLG SSSQASPGLL SFSVASASRP
710 720 730 740 750
GRRNKAAGAD PGEGEEGRIL LPEGFNPMQA LEELEKTGNF LAKGLGGLLE
760 770 780 790 800
VPEQPRVQPA VPLQCLLHRD MQALGVLLAE MVFATRVRTL QPDAPLWVRF
810 820 830 840 850
QAVRGLCTRH PKEVPVSLQP VLDTLLQMSG PEVPMGAERG KLDQLFEYRP
860 870 880 890 900
VSQGLPPPCP SQLLSPFSSV VPFPPYFPAL HRFILLYQAR RVEDEAQGRE
910 920 930 940 950
LVFALWQQLG AVLKDITPEG LEILLPFVLS LMSEEHTAVY TAWYLFEPVA
960 970 980 990 1000
KALGPKNANK YLLKPLIGAY ESPCQLHGRF YLYTDCFVAQ LMVRLGLQAF
1010 1020 1030 1040 1050
LTHLLPHVLQ VLAGAEASQE ESKDLAGAAE EEESGLPGAG PGSCAFGEEI
1060 1070 1080 1090 1100
PMDGEPPASS GLGLPDYTSG VSFHDQADLP ETEDFQAGLY VTESPQPQEA
1110 1120 1130 1140 1150
EAVSLGRLSD KSSTSETSLG EERAPDEGGA PVDKSSLRSG DSSQDLKQSE
1160 1170 1180 1190 1200
GSEEEEEEED SCVVLEEEEG EQEEVTGASE LTLSDTVLSM ETVVAGGSGG
1210 1220 1230 1240 1250
DGEEEEEALP EQSEGKEQKI LLDTACKMVR WLSAKLGPTV ASRHVARNLL
1260 1270 1280 1290 1300
RLLTSCYVGP TRQQFTVSSG ESPPLSAGNI YQKRPVLGDI VSGPVLSCLL
1310 1320 1330 1340 1350
HIARLYGEPV LTYQYLPYIS YLVAPGSASG PSRLNSRKEA GLLAAVTLTQ
1360 1370 1380 1390 1400
KIIVYLSDTT LMDILPRISH EVLLPVLSFL TSLVTGFPSG AQARTILCVK
1410 1420 1430 1440 1450
TISLIALICL RIGQEMVQQH LSEPVATFFQ VFSQLHELRQ QDLKLDPAGR
1460 1470 1480 1490 1500
GEGQLPQVVF SDGQQRPVDP ALLDELQKVF TLEMAYTIYV PFSCLLGDII
1510 1520 1530 1540 1550
RKIIPNHELV GELAALYLES ISPSSRNPAS VEPTMPGTGP EWDPHGGGCP
1560 1570 1580 1590 1600
QDDGHSGTFG SVLVGNRIQI PNDSRPENPG PLGPISGVGG GGLGSGSDDN
1610 1620 1630 1640 1650
ALKQELPRSV HGLSGNWLAY WQYEIGVSQQ DAHFHFHQIR LQSFPGHSGA
1660 1670 1680 1690 1700
VKCVAPLSSE DFFLSGSKDR TVRLWPLYNY GDGTSETAPR LVYTQHRKSV
1710 1720 1730 1740 1750
FFVGQLEAPQ HVVSCDGAVH VWDPFTGKTL RTVEPLDSRV PLTAVAVMPA
1760 1770 1780 1790 1800
PHTSITMASS DSTLRFVDCR KPGLQHEFRL GGGLNPGLVR ALAISPSGRS
1810 1820 1830 1840 1850
VVAGFSSGFM VLLDTRTGLV LRGWPAHEGD ILQIKAVEGS VLVSSSSDHS
1860 1870 1880 1890 1900
LTVWKELEQK PTHHYKSASD PIHTFDLYGS EVVTGTVSNK IGVCSLLEPP
1910 1920 1930 1940
SQATTKLSSE NFRGTLTSLA LLPTKRHLLL GSDNGVIRLL A
Length:1,941
Mass (Da):211,697
Last modified:September 5, 2012 - v2
Checksum:iF6E62BC07FAAE8DB
GO
Isoform 2 (identifier: Q562E7-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-1051: Missing.
     1115-1312: Missing.

Note: No experimental confirmation available.
Show »
Length:692
Mass (Da):74,572
Checksum:i758799F2476008EE
GO
Isoform 3 (identifier: Q562E7-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-1051: Missing.

Note: No experimental confirmation available.
Show »
Length:890
Mass (Da):95,676
Checksum:i96E0102018E3D532
GO
Isoform 4 (identifier: Q562E7-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1776-1941: HEFRLGGGLN...SDNGVIRLLA → VRGVQFPEHS...KARMLFWGPS

Note: No experimental confirmation available.
Show »
Length:1,811
Mass (Da):197,997
Checksum:i94A1CFEE27D7F3B6
GO
Isoform 5 (identifier: Q562E7-5) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-1203: Missing.
     1204-1223: EEEEALPEQSEGKEQKILLD → MLVRVVLSLTPSFPEPSALY

Note: No experimental confirmation available.
Show »
Length:738
Mass (Da):79,777
Checksum:iB8420D365D43F9AD
GO
Isoform 6 (identifier: Q562E7-6) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-1227: Missing.

Note: No experimental confirmation available.
Show »
Length:714
Mass (Da):77,172
Checksum:iAA6104BD1F60847A
GO

Sequence cautioni

The sequence BAG53978 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti1033Missing in BAB84937 (PubMed:14702039).Curated1
Sequence conflicti1051P → L in CAD39042 (PubMed:17974005).Curated1
Sequence conflicti1298C → Y in BAC03593 (PubMed:14702039).Curated1
Sequence conflicti1540P → H in BAB84937 (PubMed:14702039).Curated1
Sequence conflicti1573D → G in BAG53978 (PubMed:14702039).Curated1
Sequence conflicti1744A → V in BAG54603 (PubMed:14702039).Curated1
Sequence conflicti1760S → T in BAC03593 (PubMed:14702039).Curated1
Sequence conflicti1776H → Y in BAC03593 (PubMed:14702039).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_068220856P → L in CAMRQ2. 1 PublicationCorresponds to variant rs587776906dbSNPEnsembl.1
Natural variantiVAR_0621071535M → V.1 PublicationCorresponds to variant rs3809870dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0440631 – 1227Missing in isoform 6. CuratedAdd BLAST1227
Alternative sequenceiVSP_0440641 – 1203Missing in isoform 5. 1 PublicationAdd BLAST1203
Alternative sequenceiVSP_0440651 – 1051Missing in isoform 2 and isoform 3. 2 PublicationsAdd BLAST1051
Alternative sequenceiVSP_0199551115 – 1312Missing in isoform 2. 1 PublicationAdd BLAST198
Alternative sequenceiVSP_0440661204 – 1223EEEEA…KILLD → MLVRVVLSLTPSFPEPSALY in isoform 5. 1 PublicationAdd BLAST20
Alternative sequenceiVSP_0440671776 – 1941HEFRL…IRLLA → VRGVQFPEHSPGSLGTWQGG ETPQKQKARMLFWGPS in isoform 4. 1 PublicationAdd BLAST166

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AK074111 mRNA. Translation: BAB84937.1.
AK091136 mRNA. Translation: BAC03593.1.
AK123896 mRNA. Translation: BAG53978.1. Different initiation.
AK127946 mRNA. Translation: BAG54603.1.
AK298567 mRNA. Translation: BAH12815.1.
AC130343 Genomic DNA. No translation available.
BC092513 mRNA. Translation: AAH92513.1.
BC114568 mRNA. Translation: AAI14569.1.
AL834379 mRNA. Translation: CAD39042.1.
CCDSiCCDS54061.1. [Q562E7-5]
CCDS54062.1. [Q562E7-1]
CCDS54063.1. [Q562E7-6]
RefSeqiNP_001157145.1. NM_001163673.1. [Q562E7-5]
NP_001157281.1. NM_001163809.1. [Q562E7-1]
NP_001157283.1. NM_001163811.1. [Q562E7-6]
NP_689561.2. NM_152348.3. [Q562E7-3]
XP_011521953.1. XM_011523651.2. [Q562E7-3]
UniGeneiHs.234572.

Genome annotation databases

EnsembliENST00000309182; ENSP00000312074; ENSG00000167716. [Q562E7-3]
ENST00000409644; ENSP00000386609; ENSG00000167716. [Q562E7-1]
ENST00000419248; ENSP00000407845; ENSG00000167716. [Q562E7-6]
ENST00000437219; ENSP00000391074; ENSG00000167716. [Q562E7-5]
ENST00000611758; ENSP00000480442; ENSG00000276021. [Q562E7-3]
ENST00000613381; ENSP00000480101; ENSG00000276021. [Q562E7-6]
ENST00000613616; ENSP00000477991; ENSG00000276021. [Q562E7-5]
GeneIDi124997.
KEGGihsa:124997.
UCSCiuc002fth.3. human. [Q562E7-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AK074111 mRNA. Translation: BAB84937.1.
AK091136 mRNA. Translation: BAC03593.1.
AK123896 mRNA. Translation: BAG53978.1. Different initiation.
AK127946 mRNA. Translation: BAG54603.1.
AK298567 mRNA. Translation: BAH12815.1.
AC130343 Genomic DNA. No translation available.
BC092513 mRNA. Translation: AAH92513.1.
BC114568 mRNA. Translation: AAI14569.1.
AL834379 mRNA. Translation: CAD39042.1.
CCDSiCCDS54061.1. [Q562E7-5]
CCDS54062.1. [Q562E7-1]
CCDS54063.1. [Q562E7-6]
RefSeqiNP_001157145.1. NM_001163673.1. [Q562E7-5]
NP_001157281.1. NM_001163809.1. [Q562E7-1]
NP_001157283.1. NM_001163811.1. [Q562E7-6]
NP_689561.2. NM_152348.3. [Q562E7-3]
XP_011521953.1. XM_011523651.2. [Q562E7-3]
UniGeneiHs.234572.

3D structure databases

ProteinModelPortaliQ562E7.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi125911. 13 interactors.
IntActiQ562E7. 6 interactors.
STRINGi9606.ENSP00000386609.

PTM databases

iPTMnetiQ562E7.
PhosphoSitePlusiQ562E7.

Polymorphism and mutation databases

BioMutaiWDR81.
DMDMi403314383.

Proteomic databases

EPDiQ562E7.
MaxQBiQ562E7.
PaxDbiQ562E7.
PeptideAtlasiQ562E7.
PRIDEiQ562E7.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000309182; ENSP00000312074; ENSG00000167716. [Q562E7-3]
ENST00000409644; ENSP00000386609; ENSG00000167716. [Q562E7-1]
ENST00000419248; ENSP00000407845; ENSG00000167716. [Q562E7-6]
ENST00000437219; ENSP00000391074; ENSG00000167716. [Q562E7-5]
ENST00000611758; ENSP00000480442; ENSG00000276021. [Q562E7-3]
ENST00000613381; ENSP00000480101; ENSG00000276021. [Q562E7-6]
ENST00000613616; ENSP00000477991; ENSG00000276021. [Q562E7-5]
GeneIDi124997.
KEGGihsa:124997.
UCSCiuc002fth.3. human. [Q562E7-1]

Organism-specific databases

CTDi124997.
DisGeNETi124997.
GeneCardsiWDR81.
HGNCiHGNC:26600. WDR81.
HPAiHPA023044.
MalaCardsiWDR81.
MIMi610185. phenotype.
614218. gene.
neXtProtiNX_Q562E7.
OpenTargetsiENSG00000167716.
ENSG00000276021.
Orphaneti1766. Dysequilibrium syndrome.
PharmGKBiPA142670584.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG1786. Eukaryota.
KOG4190. Eukaryota.
ENOG410XR4I. LUCA.
GeneTreeiENSGT00390000003969.
HOGENOMiHOG000154851.
HOVERGENiHBG083542.
InParanoidiQ562E7.
KOiK17601.
OMAiDFTYEMT.
OrthoDBiEOG091G00OI.
PhylomeDBiQ562E7.
TreeFamiTF323353.

Enzyme and pathway databases

SignaLinkiQ562E7.

Miscellaneous databases

GenomeRNAii124997.
PROiQ562E7.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000167716.
CleanExiHS_WDR81.
ExpressionAtlasiQ562E7. baseline and differential.
GenevisibleiQ562E7. HS.

Family and domain databases

CDDicd06071. Beach. 1 hit.
Gene3Di1.10.1540.10. 1 hit.
2.130.10.10. 1 hit.
InterProiIPR000409. BEACH_dom.
IPR011009. Kinase-like_dom.
IPR015943. WD40/YVTN_repeat-like_dom.
IPR001680. WD40_repeat.
IPR017986. WD40_repeat_dom.
[Graphical view]
PfamiPF02138. Beach. 1 hit.
PF00400. WD40. 1 hit.
[Graphical view]
SMARTiSM01026. Beach. 1 hit.
SM00320. WD40. 6 hits.
[Graphical view]
SUPFAMiSSF50978. SSF50978. 1 hit.
SSF56112. SSF56112. 1 hit.
SSF81837. SSF81837. 1 hit.
PROSITEiPS50197. BEACH. 1 hit.
PS50082. WD_REPEATS_2. 1 hit.
PS50294. WD_REPEATS_REGION. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiWDR81_HUMAN
AccessioniPrimary (citable) accession number: Q562E7
Secondary accession number(s): B3KW16
, B3KXU1, B7Z579, E9PHG7, Q24JP6, Q8N277, Q8N3F3, Q8TEL1
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 25, 2006
Last sequence update: September 5, 2012
Last modified: November 30, 2016
This is version 110 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 17
    Human chromosome 17: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.