Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Dedicator of cytokinesis protein 6

Gene

DOCK6

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Acts as guanine nucleotide exchange factor (GEF) for CDC42 and RAC1 small GTPases. Through its activation of CDC42 and RAC1, may regulate neurite outgrowth (By similarity).By similarity1 Publication

GO - Molecular functioni

GO - Biological processi

Complete GO annotation...

Keywords - Molecular functioni

Guanine-nucleotide releasing factor

Enzyme and pathway databases

ReactomeiR-HSA-983231. Factors involved in megakaryocyte development and platelet production.

Names & Taxonomyi

Protein namesi
Recommended name:
Dedicator of cytokinesis protein 6
Gene namesi
Name:DOCK6
Synonyms:KIAA1395
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 19

Organism-specific databases

HGNCiHGNC:19189. DOCK6.

Subcellular locationi

GO - Cellular componenti

Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm

Pathology & Biotechi

Involvement in diseasei

Adams-Oliver syndrome 2 (AOS2)1 Publication
The disease is caused by mutations affecting the gene represented in this entry.
Disease descriptionA disorder characterized by the congenital absence of skin (aplasia cutis congenita) in combination with transverse limb defects. Aplasia cutis congenita can be located anywhere on the body, but in the vast majority of the cases, it is present on the posterior parietal region where it is often associated with an underlying defect of the parietal bones. Limb abnormalities are typically limb truncation defects affecting the distal phalanges or entire digits (true ectrodactyly). Only rarely, metatarsals/metacarpals or more proximal limb structures are also affected. Apart from transverse limb defects, syndactyly, most commonly of second and third toes, can also be observed. The clinical features are highly variable and can also include cardiovascular malformations, brain abnormalities and vascular defects such as cutis marmorata and dilated scalp veins.
See also OMIM:614219

Organism-specific databases

MalaCardsiDOCK6.
MIMi614219. phenotype.
Orphaneti974. Adams-Oliver syndrome.
PharmGKBiPA134913824.

Polymorphism and mutation databases

BioMutaiDOCK6.
DMDMi296439370.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 20472047Dedicator of cytokinesis protein 6PRO_0000189993Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei178 – 1781PhosphoserineBy similarity
Modified residuei872 – 8721PhosphoserineBy similarity
Modified residuei880 – 8801PhosphoserineBy similarity
Modified residuei884 – 8841PhosphoserineBy similarity
Modified residuei1308 – 13081PhosphoserineCombined sources

Keywords - PTMi

Phosphoprotein

Proteomic databases

EPDiQ96HP0.
MaxQBiQ96HP0.
PaxDbiQ96HP0.
PRIDEiQ96HP0.

PTM databases

iPTMnetiQ96HP0.
PhosphoSiteiQ96HP0.

Expressioni

Tissue specificityi

Widely expressed. Expressed at low level in spleen, cerebellum, hippocampus and in substantia nigra.1 Publication

Gene expression databases

BgeeiQ96HP0.
CleanExiHS_DOCK6.
ExpressionAtlasiQ96HP0. baseline and differential.
GenevisibleiQ96HP0. HS.

Organism-specific databases

HPAiHPA049423.
HPA049424.

Interactioni

Protein-protein interaction databases

BioGridi121625. 25 interactions.
IntActiQ96HP0. 21 interactions.
STRINGi9606.ENSP00000294618.

Structurei

3D structure databases

ProteinModelPortaliQ96HP0.
SMRiQ96HP0. Positions 1748-2020.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini548 – 714167DHR-1Add
BLAST
Domaini1587 – 2023437DHR-2Add
BLAST

Domaini

The DHR-2 domain may mediate some GEF activity.By similarity

Sequence similaritiesi

Belongs to the DOCK family.Curated
Contains 1 DHR-1 domain.Curated
Contains 1 DHR-2 domain.Curated

Phylogenomic databases

eggNOGiKOG1997. Eukaryota.
ENOG410XNVY. LUCA.
GeneTreeiENSGT00760000119234.
HOGENOMiHOG000230910.
HOVERGENiHBG051390.
InParanoidiQ96HP0.
OMAiPHTSYRN.
OrthoDBiEOG7P8P7G.
PhylomeDBiQ96HP0.
TreeFamiTF313629.

Family and domain databases

InterProiIPR027007. DHR-1_domain.
IPR027357. DHR-2.
IPR026791. DOCK.
IPR026798. DOCK6.
IPR010703. DOCK_C.
IPR021816. DOCK_C/D_N.
[Graphical view]
PANTHERiPTHR23317. PTHR23317. 3 hits.
PTHR23317:SF65. PTHR23317:SF65. 3 hits.
PfamiPF06920. DHR-2. 1 hit.
PF14429. DOCK-C2. 1 hit.
PF11878. DUF3398. 1 hit.
[Graphical view]
PROSITEiPS51650. DHR_1. 1 hit.
PS51651. DHR_2. 1 hit.
[Graphical view]

Sequencei

Sequence statusi: Complete.

Q96HP0-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MAASERRAFA HKINRTVAAE VRKQVSRERS GSPHSSRRCS SSLGVPLTEV
60 70 80 90 100
VEPLDFEDVL LSRPPDAEPG PLRDLVEFPA DDLELLLQPR ECRTTEPGIP
110 120 130 140 150
KDEKLDAQVR AAVEMYIEDW VIVHRRYQYL SAAYSPVTTD TQRERQKGLP
160 170 180 190 200
RQVFEQDASG DERSGPEDSN DSRRGSGSPE DTPRSSGASS IFDLRNLAAD
210 220 230 240 250
SLLPSLLERA APEDVDRRNE TLRRQHRPPA LLTLYPAPDE DEAVERCSRP
260 270 280 290 300
EPPREHFGQR ILVKCLSLKF EIEIEPIFGI LALYDVREKK KISENFYFDL
310 320 330 340 350
NSDSMKGLLR AHGTHPAIST LARSAIFSVT YPSPDIFLVI KLEKVLQQGD
360 370 380 390 400
ISECCEPYMV LKEVDTAKNK EKLEKLRLAA EQFCTRLGRY RMPFAWTAVH
410 420 430 440 450
LANIVSSAGQ LDRDSDSEGE RRPAWTDRRR RGPQDRASSG DDACSFSGFR
460 470 480 490 500
PATLTVTNFF KQEAERLSDE DLFKFLADMR RPSSLLRRLR PVTAQLKIDI
510 520 530 540 550
SPAPENPHFC LSPELLHIKP YPDPRGRPTK EILEFPAREV YAPHTSYRNL
560 570 580 590 600
LYVYPHSLNF SSRQGSVRNL AVRVQYMTGE DPSQALPVIF GKSSCSEFTR
610 620 630 640 650
EAFTPVVYHN KSPEFYEEFK LHLPACVTEN HHLLFTFYHV SCQPRPGTAL
660 670 680 690 700
ETPVGFTWIP LLQHGRLRTG PFCLPVSVDQ PPPSYSVLTP DVALPGMRWV
710 720 730 740 750
DGHKGVFSVE LTAVSSVHPQ DPYLDKFFTL VHVLEEGAFP FRLKDTVLSE
760 770 780 790 800
GNVEQELRAS LAALRLASPE PLVAFSHHVL DKLVRLVIRP PIISGQIVNL
810 820 830 840 850
GRGAFEAMAH VVSLVHRSLE AAQDARGHCP QLAAYVHYAF RLPGTEPSLP
860 870 880 890 900
DGAPPVTVQA ATLARGSGRP ASLYLARSKS ISSSNPDLAV APGSVDDEVS
910 920 930 940 950
RILASKLLHE ELALQWVVSS SAVREAILQH AWFFFQLMVK SMALHLLLGQ
960 970 980 990 1000
RLDTPRKLRF PGRFLDDITA LVGSVGLEVI TRVHKDVELA EHLNASLAFF
1010 1020 1030 1040 1050
LSDLLSLVDR GFVFSLVRAH YKQVATRLQS SPNPAALLTL RMEFTRILCS
1060 1070 1080 1090 1100
HEHYVTLNLP CCPLSPPASP SPSVSSTTSQ SSTFSSQAPD PKVTSMFELS
1110 1120 1130 1140 1150
GPFRQQHFLA GLLLTELALA LEPEAEGAFL LHKKAISAVH SLLCGHDTDP
1160 1170 1180 1190 1200
RYAEATVKAR VAELYLPLLS IARDTLPRLH DFAEGPGQRS RLASMLDSDT
1210 1220 1230 1240 1250
EGEGDIAGTI NPSVAMAIAG GPLAPGSRAS ISQGPPTASR AGCALSAESS
1260 1270 1280 1290 1300
RTLLACVLWV LKNTEPALLQ RWATDLTLPQ LGRLLDLLYL CLAAFEYKGK
1310 1320 1330 1340 1350
KAFERINSLT FKKSLDMKAR LEEAILGTIG ARQEMVRRSR ERSPFGNPEN
1360 1370 1380 1390 1400
VRWRKSVTHW KQTSDRVDKT KDEMEHEALV EGNLATEASL VVLDTLEIIV
1410 1420 1430 1440 1450
QTVMLSEARE SVLGAVLKVV LYSLGSAQSA LFLQHGLATQ RALVSKFPEL
1460 1470 1480 1490 1500
LFEEDTELCA DLCLRLLRHC GSRISTIRTH ASASLYLLMR QNFEIGHNFA
1510 1520 1530 1540 1550
RVKMQVTMSL SSLVGTTQNF SEEHLRRSLK TILTYAEEDM GLRDSTFAEQ
1560 1570 1580 1590 1600
VQDLMFNLHM ILTDTVKMKE HQEDPEMLID LMYRIARGYQ GSPDLRLTWL
1610 1620 1630 1640 1650
QNMAGKHAEL GNHAEAAQCM VHAAALVAEY LALLEDHRHL PVGCVSFQNI
1660 1670 1680 1690 1700
SSNVLEESAI SDDILSPDEE GFCSGKHFTE LGLVGLLEQA AGYFTMGGLY
1710 1720 1730 1740 1750
EAVNEVYKNL IPILEAHRDY KKLAAVHGKL QEAFTKIMHQ SSGWERVFGT
1760 1770 1780 1790 1800
YFRVGFYGAH FGDLDEQEFV YKEPSITKLA EISHRLEEFY TERFGDDVVE
1810 1820 1830 1840 1850
IIKDSNPVDK SKLDSQKAYI QITYVEPYFD TYELKDRVTY FDRNYGLRTF
1860 1870 1880 1890 1900
LFCTPFTPDG RAHGELPEQH KRKTLLSTDH AFPYIKTRIR VCHREETVLT
1910 1920 1930 1940 1950
PVEVAIEDMQ KKTRELAFAT EQDPPDAKML QMVLQGSVGP TVNQGPLEVA
1960 1970 1980 1990 2000
QVFLAEIPED PKLFRHHNKL RLCFKDFCKK CEDALRKNKA LIGPDQKEYH
2010 2020 2030 2040
RELERNYCRL REALQPLLTQ RLPQLMAPTP PGLRNSLNRA SFRKADL
Length:2,047
Mass (Da):229,558
Last modified:May 18, 2010 - v3
Checksum:i6370F02FFF80D070
GO

Sequence cautioni

The sequence AAH08335.2 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence BAA92633.2 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.Curated

Natural variant

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti250 – 2501P → L.
Corresponds to variant rs12978266 [ dbSNP | Ensembl ].
VAR_029830
Natural varianti555 – 5551P → L.
Corresponds to variant rs12609039 [ dbSNP | Ensembl ].
VAR_029831
Natural varianti665 – 6651G → R.2 Publications
Corresponds to variant rs17001264 [ dbSNP | Ensembl ].
VAR_029832
Natural varianti826 – 8261R → C.
Corresponds to variant rs35881692 [ dbSNP | Ensembl ].
VAR_057522
Natural varianti1420 – 14201V → L.
Corresponds to variant rs8108071 [ dbSNP | Ensembl ].
VAR_029833
Natural varianti1442 – 14421A → T.
Corresponds to variant rs34243815 [ dbSNP | Ensembl ].
VAR_057523

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB037816 mRNA. Translation: BAA92633.2. Different initiation.
AC009000 Genomic DNA. No translation available.
AC011472 Genomic DNA. No translation available.
BC008335 mRNA. Translation: AAH08335.2. Different initiation.
BC051330 mRNA. Translation: AAH51330.1.
BC146786 mRNA. Translation: AAI46787.1.
CCDSiCCDS45975.1.
RefSeqiNP_065863.2. NM_020812.3.
UniGeneiHs.591002.

Genome annotation databases

EnsembliENST00000294618; ENSP00000294618; ENSG00000130158.
GeneIDi57572.
KEGGihsa:57572.
UCSCiuc002mqs.6. human.

Keywords - Coding sequence diversityi

Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB037816 mRNA. Translation: BAA92633.2. Different initiation.
AC009000 Genomic DNA. No translation available.
AC011472 Genomic DNA. No translation available.
BC008335 mRNA. Translation: AAH08335.2. Different initiation.
BC051330 mRNA. Translation: AAH51330.1.
BC146786 mRNA. Translation: AAI46787.1.
CCDSiCCDS45975.1.
RefSeqiNP_065863.2. NM_020812.3.
UniGeneiHs.591002.

3D structure databases

ProteinModelPortaliQ96HP0.
SMRiQ96HP0. Positions 1748-2020.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi121625. 25 interactions.
IntActiQ96HP0. 21 interactions.
STRINGi9606.ENSP00000294618.

PTM databases

iPTMnetiQ96HP0.
PhosphoSiteiQ96HP0.

Polymorphism and mutation databases

BioMutaiDOCK6.
DMDMi296439370.

Proteomic databases

EPDiQ96HP0.
MaxQBiQ96HP0.
PaxDbiQ96HP0.
PRIDEiQ96HP0.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000294618; ENSP00000294618; ENSG00000130158.
GeneIDi57572.
KEGGihsa:57572.
UCSCiuc002mqs.6. human.

Organism-specific databases

CTDi57572.
GeneCardsiDOCK6.
H-InvDBHIX0202696.
HGNCiHGNC:19189. DOCK6.
HPAiHPA049423.
HPA049424.
MalaCardsiDOCK6.
MIMi614194. gene.
614219. phenotype.
neXtProtiNX_Q96HP0.
Orphaneti974. Adams-Oliver syndrome.
PharmGKBiPA134913824.
HUGEiSearch...
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG1997. Eukaryota.
ENOG410XNVY. LUCA.
GeneTreeiENSGT00760000119234.
HOGENOMiHOG000230910.
HOVERGENiHBG051390.
InParanoidiQ96HP0.
OMAiPHTSYRN.
OrthoDBiEOG7P8P7G.
PhylomeDBiQ96HP0.
TreeFamiTF313629.

Enzyme and pathway databases

ReactomeiR-HSA-983231. Factors involved in megakaryocyte development and platelet production.

Miscellaneous databases

ChiTaRSiDOCK6. human.
GeneWikiiDock6.
GenomeRNAii57572.
NextBioi64101.
PROiQ96HP0.
SOURCEiSearch...

Gene expression databases

BgeeiQ96HP0.
CleanExiHS_DOCK6.
ExpressionAtlasiQ96HP0. baseline and differential.
GenevisibleiQ96HP0. HS.

Family and domain databases

InterProiIPR027007. DHR-1_domain.
IPR027357. DHR-2.
IPR026791. DOCK.
IPR026798. DOCK6.
IPR010703. DOCK_C.
IPR021816. DOCK_C/D_N.
[Graphical view]
PANTHERiPTHR23317. PTHR23317. 3 hits.
PTHR23317:SF65. PTHR23317:SF65. 3 hits.
PfamiPF06920. DHR-2. 1 hit.
PF14429. DOCK-C2. 1 hit.
PF11878. DUF3398. 1 hit.
[Graphical view]
PROSITEiPS51650. DHR_1. 1 hit.
PS51651. DHR_2. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Prediction of the coding sequences of unidentified human genes. XVI. The complete sequences of 150 new cDNA clones from brain which code for large proteins in vitro."
    Nagase T., Kikuno R., Ishikawa K., Hirosawa M., Ohara O.
    DNA Res. 7:65-73(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA], TISSUE SPECIFICITY, VARIANT ARG-665.
    Tissue: Brain.
  2. Ohara O., Nagase T., Kikuno R.
    Submitted (JAN-2003) to the EMBL/GenBank/DDBJ databases
    Cited for: SEQUENCE REVISION.
  3. "The DNA sequence and biology of human chromosome 19."
    Grimwood J., Gordon L.A., Olsen A.S., Terry A., Schmutz J., Lamerdin J.E., Hellsten U., Goodstein D., Couronne O., Tran-Gyamfi M., Aerts A., Altherr M., Ashworth L., Bajorek E., Black S., Branscomb E., Caenepeel S., Carrano A.V.
    , Caoile C., Chan Y.M., Christensen M., Cleland C.A., Copeland A., Dalin E., Dehal P., Denys M., Detter J.C., Escobar J., Flowers D., Fotopulos D., Garcia C., Georgescu A.M., Glavina T., Gomez M., Gonzales E., Groza M., Hammon N., Hawkins T., Haydu L., Ho I., Huang W., Israni S., Jett J., Kadner K., Kimball H., Kobayashi A., Larionov V., Leem S.-H., Lopez F., Lou Y., Lowry S., Malfatti S., Martinez D., McCready P.M., Medina C., Morgan J., Nelson K., Nolan M., Ovcharenko I., Pitluck S., Pollard M., Popkie A.P., Predki P., Quan G., Ramirez L., Rash S., Retterer J., Rodriguez A., Rogers S., Salamov A., Salazar A., She X., Smith D., Slezak T., Solovyev V., Thayer N., Tice H., Tsai M., Ustaszewska A., Vo N., Wagner M., Wheeler J., Wu K., Xie G., Yang J., Dubchak I., Furey T.S., DeJong P., Dickson M., Gordon D., Eichler E.E., Pennacchio L.A., Richardson P., Stubbs L., Rokhsar D.S., Myers R.M., Rubin E.M., Lucas S.M.
    Nature 428:529-535(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  4. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA], VARIANT ARG-665.
    Tissue: Kidney and Spleen.
  5. "Identification of an evolutionarily conserved superfamily of DOCK180-related proteins with guanine nucleotide exchange activity."
    Cote J.-F., Vuori K.
    J. Cell Sci. 115:4901-4913(2002) [PubMed] [Europe PMC] [Abstract]
    Cited for: NOMENCLATURE.
  6. "Dock6, a Dock-C subfamily guanine nucleotide exchanger, has the dual specificity for Rac1 and Cdc42 and regulates neurite outgrowth."
    Miyamoto Y., Yamauchi J., Sanbe A., Tanoue A.
    Exp. Cell Res. 313:791-804(2007) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, SUBCELLULAR LOCATION.
  7. "Combining protein-based IMAC, peptide-based IMAC, and MudPIT for efficient phosphoproteomic analysis."
    Cantin G.T., Yi W., Lu B., Park S.K., Xu T., Lee J.-D., Yates J.R. III
    J. Proteome Res. 7:1346-1351(2008) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Cervix carcinoma.
  8. Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Cervix carcinoma.
  9. "Recessive mutations in DOCK6, encoding the guanidine nucleotide exchange factor DOCK6, lead to abnormal actin cytoskeleton organization and Adams-Oliver syndrome."
    Shaheen R., Faqeih E., Sunker A., Morsy H., Al-Sheddi T., Shamseldin H.E., Adly N., Hashem M., Alkuraya F.S.
    Am. J. Hum. Genet. 89:328-333(2011) [PubMed] [Europe PMC] [Abstract]
    Cited for: INVOLVEMENT IN AOS2.
  10. Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  11. "An enzyme assisted RP-RPLC approach for in-depth analysis of human liver phosphoproteome."
    Bian Y., Song C., Cheng K., Dong M., Wang F., Huang J., Sun D., Wang L., Ye M., Zou H.
    J. Proteomics 96:253-262(2014) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-1308, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Liver.

Entry informationi

Entry nameiDOCK6_HUMAN
AccessioniPrimary (citable) accession number: Q96HP0
Secondary accession number(s): A6H8X5, Q7Z7P4, Q9P2F2
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 3, 2003
Last sequence update: May 18, 2010
Last modified: May 11, 2016
This is version 122 of the entry and version 3 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 19
    Human chromosome 19: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.