Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Dyslexia-associated protein KIAA0319

Gene

KIAA0319

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Involved in neuronal migration during development of the cerebral neocortex. May function in a cell autonomous and a non-cell autonomous manner and play a role in appropriate adhesion between migrating neurons and radial glial fibers. May also regulate growth and differentiation of dendrites.1 Publication

GO - Biological processi

  • negative regulation of dendrite development Source: UniProtKB
  • neuron migration Source: UniProtKB
Complete GO annotation...

Keywords - Molecular functioni

Developmental protein

Keywords - Biological processi

Neurogenesis

Enzyme and pathway databases

ReactomeiR-HSA-8856825. Cargo recognition for clathrin-mediated endocytosis.
R-HSA-8856828. Clathrin-mediated endocytosis.

Names & Taxonomyi

Protein namesi
Recommended name:
Dyslexia-associated protein KIAA0319
Gene namesi
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 6

Organism-specific databases

HGNCiHGNC:21580. KIAA0319.

Subcellular locationi

Topology

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Topological domaini21 – 955ExtracellularSequence analysisAdd BLAST935
Transmembranei956 – 976HelicalSequence analysisAdd BLAST21
Topological domaini977 – 1072CytoplasmicSequence analysisAdd BLAST96

GO - Cellular componenti

  • early endosome Source: UniProtKB
  • early endosome membrane Source: UniProtKB-SubCell
  • integral component of membrane Source: UniProtKB-KW
  • plasma membrane Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Cell membrane, Endosome, Membrane

Pathology & Biotechi

Involvement in diseasei

Dyslexia 2 (DYX2)1 Publication
Disease susceptibility is associated with variations affecting the gene represented in this entry.
Disease descriptionA relatively common, complex cognitive disorder characterized by an impairment of reading performance despite adequate motivational, educational and intellectual opportunities. It is a multifactorial trait, with evidence for familial clustering and heritability.
See also OMIM:600202

Mutagenesis

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Mutagenesisi995Y → A: Loss of interaction with AP2M1 and impaired endocytosis. 1 Publication1

Organism-specific databases

DisGeNETi9856.
MalaCardsiKIAA0319.
MIMi600202. phenotype.
OpenTargetsiENSG00000137261.
PharmGKBiPA134936721.

Polymorphism and mutation databases

BioMutaiKIAA0319.
DMDMi74747200.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Signal peptidei1 – 20Sequence analysisAdd BLAST20
ChainiPRO_000004294621 – 1072Dyslexia-associated protein KIAA0319Add BLAST1052

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Glycosylationi196N-linked (GlcNAc...)Sequence analysis1
Glycosylationi219N-linked (GlcNAc...)Sequence analysis1
Glycosylationi262N-linked (GlcNAc...)Sequence analysis1
Glycosylationi394N-linked (GlcNAc...)Sequence analysis1
Glycosylationi421N-linked (GlcNAc...)Sequence analysis1
Glycosylationi498N-linked (GlcNAc...)Sequence analysis1
Glycosylationi513N-linked (GlcNAc...)Sequence analysis1
Glycosylationi536N-linked (GlcNAc...)Sequence analysis1
Glycosylationi551N-linked (GlcNAc...)Sequence analysis1
Glycosylationi733N-linked (GlcNAc...)Sequence analysis1

Post-translational modificationi

N-glycosylated.1 Publication
O-glycosylated.1 Publication
Shedding of the extracellular domain and intramembrane cleavage produce several proteolytic products. The intramembrane cleavage releases a soluble cytoplasmic polypeptide that translocates to the nucleolus.1 Publication

Keywords - PTMi

Glycoprotein

Proteomic databases

MaxQBiQ5VV43.
PaxDbiQ5VV43.
PeptideAtlasiQ5VV43.
PRIDEiQ5VV43.

PTM databases

iPTMnetiQ5VV43.
PhosphoSitePlusiQ5VV43.

Expressioni

Tissue specificityi

Detected in adult brain cortex and fetal frontal lobe (at protein level). Highly expressed in brain cortex, putamen, amygdala, hippocampus and cerebellum.2 Publications

Developmental stagei

Expressed in the developing cerebral neocortex and glanglionic eminence in 57 days post-fertilization fetal brain.1 Publication

Gene expression databases

BgeeiENSG00000137261.
CleanExiHS_KIAA0319.
ExpressionAtlasiQ5VV43. baseline and differential.
GenevisibleiQ5VV43. HS.

Organism-specific databases

HPAiHPA015607.

Interactioni

Subunit structurei

Homodimer. Interacts with AP2M1; required for clathrin-mediated endocytosis.2 Publications

Protein-protein interaction databases

BioGridi115190. 2 interactors.
IntActiQ5VV43. 3 interactors.
STRINGi9606.ENSP00000367459.

Structurei

Secondary structure

11072
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Beta strandi336 – 338Combined sources3
Beta strandi343 – 346Combined sources4
Beta strandi352 – 355Combined sources4
Beta strandi357 – 360Combined sources4
Beta strandi374 – 376Combined sources3
Beta strandi389 – 395Combined sources7
Beta strandi400 – 405Combined sources6
Beta strandi408 – 411Combined sources4
Beta strandi414 – 417Combined sources4
Beta strandi420 – 425Combined sources6

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
2E7MNMR-A329-428[»]
ProteinModelPortaliQ5VV43.
SMRiQ5VV43.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiQ5VV43.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini21 – 99MANSCPROSITE-ProRule annotationAdd BLAST79
Domaini341 – 427PKD 1PROSITE-ProRule annotationAdd BLAST87
Domaini435 – 524PKD 2PROSITE-ProRule annotationAdd BLAST90
Domaini530 – 620PKD 3PROSITE-ProRule annotationAdd BLAST91
Domaini621 – 714PKD 4PROSITE-ProRule annotationAdd BLAST94
Domaini720 – 811PKD 5PROSITE-ProRule annotationAdd BLAST92

Motif

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Motifi995 – 998Endocytosis signal4

Sequence similaritiesi

Contains 1 MANSC domain.PROSITE-ProRule annotation
Contains 5 PKD domains.PROSITE-ProRule annotation

Keywords - Domaini

Repeat, Signal, Transmembrane, Transmembrane helix

Phylogenomic databases

eggNOGiENOG410IFQB. Eukaryota.
ENOG410XQ5Y. LUCA.
GeneTreeiENSGT00860000133813.
HOGENOMiHOG000043880.
HOVERGENiHBG057130.
InParanoidiQ5VV43.
OMAiEGRTYSN.
OrthoDBiEOG091G016C.
PhylomeDBiQ5VV43.
TreeFamiTF323356.

Family and domain databases

Gene3Di2.60.40.670. 4 hits.
InterProiIPR003961. FN3_dom.
IPR013980. MANSC_dom.
IPR011106. MANSC_N.
IPR022409. PKD/Chitinase_dom.
IPR002859. PKD/REJ-like.
IPR000601. PKD_dom.
[Graphical view]
PfamiPF02010. REJ. 1 hit.
[Graphical view]
SMARTiSM00060. FN3. 4 hits.
SM00765. MANEC. 1 hit.
SM00089. PKD. 5 hits.
[Graphical view]
SUPFAMiSSF49299. SSF49299. 4 hits.
PROSITEiPS50986. MANSC. 1 hit.
PS50093. PKD. 1 hit.
[Graphical view]

Sequences (4)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 4 isoformsi produced by alternative splicing. AlignAdd to basket

Note: Additional isoforms seem to exist.
Isoform 1 (identifier: Q5VV43-1) [UniParc]FASTAAdd to basket
Also known as: A

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MAPPTGVLSS LLLLVTIAGC ARKQCSEGRT YSNAVISPNL ETTRIMRVSH
60 70 80 90 100
TFPVVDCTAA CCDLSSCDLA WWFEGRCYLV SCPHKENCEP KKMGPIRSYL
110 120 130 140 150
TFVLRPVQRP AQLLDYGDMM LNRGSPSGIW GDSPEDIRKD LTFLGKDWGL
160 170 180 190 200
EEMSEYSDDY RELEKDLLQP SGKQEPRGSA EYTDWGLLPG SEGAFNSSVG
210 220 230 240 250
DSPAVPAETQ QDPELHYLNE SASTPAPKLP ERSVLLPLPT TPSSGEVLEK
260 270 280 290 300
EKASQLQEQS SNSSGKEVLM PSHSLPPASL ELSSVTVEKS PVLTVTPGST
310 320 330 340 350
EHSIPTPPTS AAPSESTPSE LPISPTTAPR TVKELTVSAG DNLIITLPDN
360 370 380 390 400
EVELKAFVAP APPVETTYNY EWNLISHPTD YQGEIKQGHK QTLNLSQLSV
410 420 430 440 450
GLYVFKVTVS SENAFGEGFV NVTVKPARRV NLPPVAVVSP QLQELTLPLT
460 470 480 490 500
SALIDGSQST DDTEIVSYHW EEINGPFIEE KTSVDSPVLR LSNLDPGNYS
510 520 530 540 550
FRLTVTDSDG ATNSTTAALI VNNAVDYPPV ANAGPNHTIT LPQNSITLNG
560 570 580 590 600
NQSSDDHQIV LYEWSLGPGS EGKHVVMQGV QTPYLHLSAM QEGDYTFQLK
610 620 630 640 650
VTDSSRQQST AVVTVIVQPE NNRPPVAVAG PDKELIFPVE SATLDGSSSS
660 670 680 690 700
DDHGIVFYHW EHVRGPSAVE MENIDKAIAT VTGLQVGTYH FRLTVKDQQG
710 720 730 740 750
LSSTSTLTVA VKKENNSPPR ARAGGRHVLV LPNNSITLDG SRSTDDQRIV
760 770 780 790 800
SYLWIRDGQS PAAGDVIDGS DHSVALQLTN LVEGVYTFHL RVTDSQGASD
810 820 830 840 850
TDTATVEVQP DPRKSGLVEL TLQVGVGQLT EQRKDTLVRQ LAVLLNVLDS
860 870 880 890 900
DIKVQKIRAH SDLSTVIVFY VQSRPPFKVL KAAEVARNLH MRLSKEKADF
910 920 930 940 950
LLFKVLRVDT AGCLLKCSGH GHCDPLTKRC ICSHLWMENL IQRYIWDGES
960 970 980 990 1000
NCEWSIFYVT VLAFTLIVLT GGFTWLCICC CKRQKRTKIR KKTKYTILDN
1010 1020 1030 1040 1050
MDEQERMELR PKYGIKHRST EHNSSLMVSE SEFDSDQDTI FSREKMERGN
1060 1070
PKVSMNGSIR NGASFSYCSK DR
Length:1,072
Mass (Da):117,763
Last modified:December 7, 2004 - v1
Checksum:i94F33B03E7FE8C0F
GO
Isoform 2 (identifier: Q5VV43-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-19: MAPPTGVLSSLLLLVTIAG → MTRLGWPSPC

Show »
Length:1,063
Mass (Da):117,057
Checksum:i144EC867C3FD7CD5
GO
Isoform 3 (identifier: Q5VV43-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-45: Missing.

Show »
Length:1,027
Mass (Da):113,047
Checksum:i27D81BD05C976D80
GO
Isoform 4 (identifier: Q5VV43-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     953-1013: Missing.

Note: No experimental confirmation available.
Show »
Length:1,011
Mass (Da):110,370
Checksum:i6B875CD6ED56D253
GO

Sequence cautioni

The sequence BAA20777 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti97R → S in BAG58068 (PubMed:14702039).Curated1
Sequence conflicti157S → A in BAA20777 (PubMed:9205841).Curated1
Sequence conflicti157S → A in AAI52461 (PubMed:15489334).Curated1
Sequence conflicti256L → H in BAG59087 (PubMed:14702039).Curated1
Sequence conflicti926L → I in AAI44629 (PubMed:15489334).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_023837142T → P.3 PublicationsCorresponds to variant rs4576240dbSNPEnsembl.1
Natural variantiVAR_023838311A → T May be associated with susceptibility to dyslexia. 3 PublicationsCorresponds to variant rs4504469dbSNPEnsembl.1
Natural variantiVAR_049505567G → S.Corresponds to variant rs2744559dbSNPEnsembl.1
Natural variantiVAR_049506773S → G.Corresponds to variant rs2744550dbSNPEnsembl.1
Natural variantiVAR_049507774V → A.Corresponds to variant rs2817191dbSNPEnsembl.1
Natural variantiVAR_034032919G → A.Corresponds to variant rs10946705dbSNPEnsembl.1
Natural variantiVAR_0495081013Y → C.Corresponds to variant rs807534dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0362341 – 45Missing in isoform 3. 1 PublicationAdd BLAST45
Alternative sequenceiVSP_0362351 – 19MAPPT…VTIAG → MTRLGWPSPC in isoform 2. 2 PublicationsAdd BLAST19
Alternative sequenceiVSP_044971953 – 1013Missing in isoform 4. CuratedAdd BLAST61

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB002317 mRNA. Translation: BAA20777.2. Different initiation.
AK295008 mRNA. Translation: BAG58068.1.
AK296310 mRNA. Translation: BAG59008.1.
AK296426 mRNA. Translation: BAG59087.1.
AL512385, AL031230 Genomic DNA. Translation: CAH71730.1.
AL031230, AL512385 Genomic DNA. Translation: CAI22601.1.
BC140821 mRNA. Translation: AAI40822.1.
BC144628 mRNA. Translation: AAI44629.1.
BC152460 mRNA. Translation: AAI52461.1.
CCDSiCCDS34348.1. [Q5VV43-1]
CCDS54969.1. [Q5VV43-3]
CCDS54970.1. [Q5VV43-2]
CCDS54971.1. [Q5VV43-4]
RefSeqiNP_001161846.1. NM_001168374.1. [Q5VV43-2]
NP_001161847.1. NM_001168375.1. [Q5VV43-1]
NP_001161848.1. NM_001168376.1. [Q5VV43-3]
NP_001161849.1. NM_001168377.1. [Q5VV43-4]
NP_055624.2. NM_014809.3. [Q5VV43-1]
XP_011513327.1. XM_011515025.2. [Q5VV43-1]
XP_011513328.1. XM_011515026.2. [Q5VV43-3]
XP_016867030.1. XM_017011541.1. [Q5VV43-2]
XP_016867034.1. XM_017011545.1. [Q5VV43-3]
UniGeneiHs.26441.

Genome annotation databases

EnsembliENST00000378214; ENSP00000367459; ENSG00000137261. [Q5VV43-1]
ENST00000430948; ENSP00000401086; ENSG00000137261. [Q5VV43-3]
ENST00000535378; ENSP00000442403; ENSG00000137261. [Q5VV43-2]
ENST00000537886; ENSP00000439700; ENSG00000137261. [Q5VV43-4]
ENST00000543707; ENSP00000437656; ENSG00000137261. [Q5VV43-1]
GeneIDi9856.
KEGGihsa:9856.
UCSCiuc003neh.2. human. [Q5VV43-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Web resourcesi

Protein Spotlight

The twisted way of things - Issue 125 of January 2011

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB002317 mRNA. Translation: BAA20777.2. Different initiation.
AK295008 mRNA. Translation: BAG58068.1.
AK296310 mRNA. Translation: BAG59008.1.
AK296426 mRNA. Translation: BAG59087.1.
AL512385, AL031230 Genomic DNA. Translation: CAH71730.1.
AL031230, AL512385 Genomic DNA. Translation: CAI22601.1.
BC140821 mRNA. Translation: AAI40822.1.
BC144628 mRNA. Translation: AAI44629.1.
BC152460 mRNA. Translation: AAI52461.1.
CCDSiCCDS34348.1. [Q5VV43-1]
CCDS54969.1. [Q5VV43-3]
CCDS54970.1. [Q5VV43-2]
CCDS54971.1. [Q5VV43-4]
RefSeqiNP_001161846.1. NM_001168374.1. [Q5VV43-2]
NP_001161847.1. NM_001168375.1. [Q5VV43-1]
NP_001161848.1. NM_001168376.1. [Q5VV43-3]
NP_001161849.1. NM_001168377.1. [Q5VV43-4]
NP_055624.2. NM_014809.3. [Q5VV43-1]
XP_011513327.1. XM_011515025.2. [Q5VV43-1]
XP_011513328.1. XM_011515026.2. [Q5VV43-3]
XP_016867030.1. XM_017011541.1. [Q5VV43-2]
XP_016867034.1. XM_017011545.1. [Q5VV43-3]
UniGeneiHs.26441.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
2E7MNMR-A329-428[»]
ProteinModelPortaliQ5VV43.
SMRiQ5VV43.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi115190. 2 interactors.
IntActiQ5VV43. 3 interactors.
STRINGi9606.ENSP00000367459.

PTM databases

iPTMnetiQ5VV43.
PhosphoSitePlusiQ5VV43.

Polymorphism and mutation databases

BioMutaiKIAA0319.
DMDMi74747200.

Proteomic databases

MaxQBiQ5VV43.
PaxDbiQ5VV43.
PeptideAtlasiQ5VV43.
PRIDEiQ5VV43.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000378214; ENSP00000367459; ENSG00000137261. [Q5VV43-1]
ENST00000430948; ENSP00000401086; ENSG00000137261. [Q5VV43-3]
ENST00000535378; ENSP00000442403; ENSG00000137261. [Q5VV43-2]
ENST00000537886; ENSP00000439700; ENSG00000137261. [Q5VV43-4]
ENST00000543707; ENSP00000437656; ENSG00000137261. [Q5VV43-1]
GeneIDi9856.
KEGGihsa:9856.
UCSCiuc003neh.2. human. [Q5VV43-1]

Organism-specific databases

CTDi9856.
DisGeNETi9856.
GeneCardsiKIAA0319.
HGNCiHGNC:21580. KIAA0319.
HPAiHPA015607.
MalaCardsiKIAA0319.
MIMi600202. phenotype.
609269. gene.
neXtProtiNX_Q5VV43.
OpenTargetsiENSG00000137261.
PharmGKBiPA134936721.
HUGEiSearch...
GenAtlasiSearch...

Phylogenomic databases

eggNOGiENOG410IFQB. Eukaryota.
ENOG410XQ5Y. LUCA.
GeneTreeiENSGT00860000133813.
HOGENOMiHOG000043880.
HOVERGENiHBG057130.
InParanoidiQ5VV43.
OMAiEGRTYSN.
OrthoDBiEOG091G016C.
PhylomeDBiQ5VV43.
TreeFamiTF323356.

Enzyme and pathway databases

ReactomeiR-HSA-8856825. Cargo recognition for clathrin-mediated endocytosis.
R-HSA-8856828. Clathrin-mediated endocytosis.

Miscellaneous databases

ChiTaRSiKIAA0319. human.
EvolutionaryTraceiQ5VV43.
GeneWikiiKIAA0319.
GenomeRNAii9856.
PROiQ5VV43.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000137261.
CleanExiHS_KIAA0319.
ExpressionAtlasiQ5VV43. baseline and differential.
GenevisibleiQ5VV43. HS.

Family and domain databases

Gene3Di2.60.40.670. 4 hits.
InterProiIPR003961. FN3_dom.
IPR013980. MANSC_dom.
IPR011106. MANSC_N.
IPR022409. PKD/Chitinase_dom.
IPR002859. PKD/REJ-like.
IPR000601. PKD_dom.
[Graphical view]
PfamiPF02010. REJ. 1 hit.
[Graphical view]
SMARTiSM00060. FN3. 4 hits.
SM00765. MANEC. 1 hit.
SM00089. PKD. 5 hits.
[Graphical view]
SUPFAMiSSF49299. SSF49299. 4 hits.
PROSITEiPS50986. MANSC. 1 hit.
PS50093. PKD. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiK0319_HUMAN
AccessioniPrimary (citable) accession number: Q5VV43
Secondary accession number(s): A7MD37
, B2RTU7, B4DHA7, B4DK75, B7ZML3, F5H123, Q9UJC8, Q9Y4G7
Entry historyi
Integrated into UniProtKB/Swiss-Prot: November 22, 2005
Last sequence update: December 7, 2004
Last modified: November 30, 2016
This is version 125 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Reference proteome

Documents

  1. Human chromosome 6
    Human chromosome 6: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  6. Protein Spotlight
    Protein Spotlight articles and cited UniProtKB/Swiss-Prot entries
  7. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.