Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Clathrin heavy chain 2

Gene

CLTCL1

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Clathrin is the major protein of the polyhedral coat of coated pits and vesicles. Two different adapter protein complexes link the clathrin lattice either to the plasma membrane or to the trans-Golgi network (By similarity).By similarity

GO - Molecular functioni

  • signal transducer activity Source: ProtInc
  • structural molecule activity Source: InterPro

GO - Biological processi

  • anatomical structure morphogenesis Source: ProtInc
  • intracellular protein transport Source: InterPro
  • mitotic nuclear division Source: UniProtKB
  • positive regulation of glucose import Source: UniProtKB
  • receptor-mediated endocytosis Source: UniProtKB
  • retrograde transport, endosome to Golgi Source: UniProtKB
Complete GO annotation...

Enzyme and pathway databases

BioCyciZFISH:ENSG00000070371-MONOMER.
ReactomeiR-HSA-190873. Gap junction degradation.
R-HSA-196025. Formation of annular gap junctions.
R-HSA-3928665. EPH-ephrin mediated repulsion of cells.
R-HSA-8856825. Cargo recognition for clathrin-mediated endocytosis.
R-HSA-8856828. Clathrin-mediated endocytosis.
SIGNORiP53675.

Names & Taxonomyi

Protein namesi
Recommended name:
Clathrin heavy chain 2
Alternative name(s):
Clathrin heavy chain on chromosome 22
Short name:
CLH-22
Gene namesi
Name:CLTCL1
Synonyms:CLH22, CLTCL, CLTD
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 22

Organism-specific databases

HGNCiHGNC:2093. CLTCL1.

Subcellular locationi

GO - Cellular componenti

  • clathrin-coated pit Source: UniProtKB
  • clathrin-coated vesicle Source: UniProtKB
  • clathrin coat of trans-Golgi network vesicle Source: InterPro
  • coated vesicle Source: UniProtKB
  • cytosol Source: GOC
  • extracellular exosome Source: UniProtKB
  • late endosome Source: UniProtKB
  • membrane Source: UniProtKB
  • sorting endosome Source: UniProtKB
  • spindle Source: UniProtKB
  • trans-Golgi network Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Coated pit, Cytoplasmic vesicle, Membrane

Pathology & Biotechi

Organism-specific databases

DisGeNETi8218.
OpenTargetsiENSG00000070371.
PharmGKBiPA26619.

Polymorphism and mutation databases

BioMutaiCLTCL1.
DMDMi2506298.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Initiator methionineiRemovedBy similarity
ChainiPRO_00002057862 – 1640Clathrin heavy chain 2Add BLAST1639

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Modified residuei2N-acetylalanineBy similarity1
Modified residuei67PhosphoserineBy similarity1
Modified residuei184PhosphotyrosineBy similarity1
Modified residuei394PhosphothreonineBy similarity1
Modified residuei634PhosphotyrosineBy similarity1
Modified residuei737N6-succinyllysineBy similarity1
Modified residuei856N6-acetyllysineBy similarity1
Modified residuei899PhosphotyrosineBy similarity1
Modified residuei1167PhosphoserineBy similarity1
Modified residuei1206PhosphotyrosineBy similarity1
Modified residuei1229PhosphoserineBy similarity1
Modified residuei1441N6-acetyllysine; alternateBy similarity1
Modified residuei1441N6-succinyllysine; alternateBy similarity1
Modified residuei1477PhosphotyrosineBy similarity1
Modified residuei1487PhosphotyrosineBy similarity1
Modified residuei1494PhosphoserineBy similarity1
Modified residuei1501N6-acetyllysineBy similarity1

Keywords - PTMi

Acetylation, Phosphoprotein

Proteomic databases

EPDiP53675.
MaxQBiP53675.
PaxDbiP53675.
PeptideAtlasiP53675.
PRIDEiP53675.

PTM databases

iPTMnetiP53675.
PhosphoSitePlusiP53675.
SwissPalmiP53675.

Expressioni

Tissue specificityi

Maximal levels in skeletal muscle. High levels in heart and testis. Low expression detected in all other tissues.

Gene expression databases

BgeeiENSG00000070371.
CleanExiHS_CLTCL1.
ExpressionAtlasiP53675. baseline and differential.
GenevisibleiP53675. HS.

Interactioni

Subunit structurei

Clathrin triskelions, composed of 3 heavy chains and 3 light chains, are the basic subunits of the clathrin coat. In the presence of light chains, hub assembly is influenced by both the pH and the concentration of calcium (By similarity). May interact with OCRL (By similarity).By similarity

Protein-protein interaction databases

BioGridi113854. 73 interactors.
IntActiP53675. 27 interactors.
MINTiMINT-208273.
STRINGi9606.ENSP00000445677.

Structurei

3D structure databases

ProteinModelPortaliP53675.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Repeati537 – 683CHCR 1Add BLAST147
Repeati686 – 828CHCR 2Add BLAST143
Repeati833 – 972CHCR 3Add BLAST140
Repeati979 – 1124CHCR 4Add BLAST146
Repeati1128 – 1269CHCR 5Add BLAST142
Repeati1274 – 1420CHCR 6Add BLAST147
Repeati1423 – 1566CHCR 7Add BLAST144

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni2 – 479Globular terminal domainAdd BLAST478
Regioni24 – 67WD40-like repeat 1Add BLAST44
Regioni68 – 107WD40-like repeat 2Add BLAST40
Regioni108 – 149WD40-like repeat 3Add BLAST42
Regioni150 – 195WD40-like repeat 4Add BLAST46
Regioni196 – 257WD40-like repeat 5Add BLAST62
Regioni258 – 301WD40-like repeat 6Add BLAST44
Regioni302 – 330WD40-like repeat 7Add BLAST29
Regioni449 – 465Binding site for the uncoating ATPase, involved in lattice disassemblySequence analysisAdd BLAST17
Regioni480 – 523Flexible linkerAdd BLAST44
Regioni524 – 1640Heavy chain armAdd BLAST1117
Regioni524 – 634Distal segmentAdd BLAST111
Regioni639 – 1640Proximal segmentAdd BLAST1002
Regioni1213 – 1522Involved in binding clathrin light chainBy similarityAdd BLAST310
Regioni1551 – 1640TrimerizationBy similarityAdd BLAST90

Domaini

The C-terminal third of the heavy chains forms the hub of the triskelion. This region contains the trimerization domain and the light-chain binding domain involved in the assembly of the clathrin lattice.
The N-terminal seven-bladed beta-propeller is formed by WD40-like repeats, and projects inward from the polyhedral outer clathrin coat. It constitutes a major protein-protein interaction node (By similarity).By similarity

Sequence similaritiesi

Belongs to the clathrin heavy chain family.Curated
Contains 7 CHCR (clathrin heavy-chain) repeats.PROSITE-ProRule annotation

Keywords - Domaini

Repeat

Phylogenomic databases

eggNOGiKOG0985. Eukaryota.
ENOG410XPH1. LUCA.
GeneTreeiENSGT00400000022107.
HOGENOMiHOG000188877.
HOVERGENiHBG005344.
InParanoidiP53675.
KOiK04646.
OMAiGVMKISP.
OrthoDBiEOG091G009O.
PhylomeDBiP53675.
TreeFamiTF300059.

Family and domain databases

Gene3Di1.25.40.10. 4 hits.
2.130.10.110. 1 hit.
InterProiIPR016024. ARM-type_fold.
IPR000547. Clathrin_H-chain/VPS_repeat.
IPR016025. Clathrin_H-chain_link/propller.
IPR015348. Clathrin_H-chain_linker_core.
IPR001473. Clathrin_H-chain_propeller_N.
IPR022365. Clathrin_H-chain_propeller_rpt.
IPR016341. Clathrin_heavy_chain.
IPR011990. TPR-like_helical_dom.
[Graphical view]
PfamiPF00637. Clathrin. 7 hits.
PF09268. Clathrin-link. 1 hit.
PF01394. Clathrin_propel. 4 hits.
[Graphical view]
PIRSFiPIRSF002290. Clathrin_H_chain. 1 hit.
SMARTiSM00299. CLH. 7 hits.
[Graphical view]
SUPFAMiSSF48371. SSF48371. 6 hits.
SSF50989. SSF50989. 1 hit.
PROSITEiPS50236. CHCR. 7 hits.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: P53675-1) [UniParc]FASTAAdd to basket
Also known as: Long, Brain

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MAQILPVRFQ EHFQLQNLGI NPANIGFSTL TMESDKFICI REKVGEQAQV
60 70 80 90 100
TIIDMSDPMA PIRRPISAES AIMNPASKVI ALKAGKTLQI FNIEMKSKMK
110 120 130 140 150
AHTMAEEVIF WKWVSVNTVA LVTETAVYHW SMEGDSQPMK MFDRHTSLVG
160 170 180 190 200
CQVIHYRTDE YQKWLLLVGI SAQQNRVVGA MQLYSVDRKV SQPIEGHAAA
210 220 230 240 250
FAEFKMEGNA KPATLFCFAV RNPTGGKLHI IEVGQPAAGN QPFVKKAVDV
260 270 280 290 300
FFPPEAQNDF PVAMQIGAKH GVIYLITKYG YLHLYDLESG VCICMNRISA
310 320 330 340 350
DTIFVTAPHK PTSGIIGVNK KGQVLSVCVE EDNIVNYATN VLQNPDLGLR
360 370 380 390 400
LAVRSNLAGA EKLFVRKFNT LFAQGSYAEA AKVAASAPKG ILRTRETVQK
410 420 430 440 450
FQSIPAQSGQ ASPLLQYFGI LLDQGQLNKL ESLELCHLVL QQGRKQLLEK
460 470 480 490 500
WLKEDKLECS EELGDLVKTT DPMLALSVYL RANVPSKVIQ CFAETGQFQK
510 520 530 540 550
IVLYAKKVGY TPDWIFLLRG VMKISPEQGL QFSRMLVQDE EPLANISQIV
560 570 580 590 600
DIFMENSLIQ QCTSFLLDAL KNNRPAEGLL QTWLLEMNLV HAPQVADAIL
610 620 630 640 650
GNKMFTHYDR AHIAQLCEKA GLLQQALEHY TDLYDIKRAV VHTHLLNPEW
660 670 680 690 700
LVNFFGSLSV EDSVECLHAM LSANIRQNLQ LCVQVASKYH EQLGTQALVE
710 720 730 740 750
LFESFKSYKG LFYFLGSIVN FSQDPDVHLK YIQAACKTGQ IKEVERICRE
760 770 780 790 800
SSCYNPERVK NFLKEAKLTD QLPLIIVCDR FGFVHDLVLY LYRNNLQRYI
810 820 830 840 850
EIYVQKVNPS RTPAVIGGLL DVDCSEEVIK HLIMAVRGQF STDELVAEVE
860 870 880 890 900
KRNRLKLLLP WLESQIQEGC EEPATHNALA KIYIDSNNSP ECFLRENAYY
910 920 930 940 950
DSSVVGRYCE KRDPHLACVA YERGQCDLEL IKVCNENSLF KSEARYLVCR
960 970 980 990 1000
KDPELWAHVL EETNPSRRQL IDQVVQTALS ETRDPEEISV TVKAFMTADL
1010 1020 1030 1040 1050
PNELIELLEK IVLDNSVFSE HRNLQNLLIL TAIKADRTRV MEYISRLDNY
1060 1070 1080 1090 1100
DALDIASIAV SSALYEEAFT VFHKFDMNAS AIQVLIEHIG NLDRAYEFAE
1110 1120 1130 1140 1150
RCNEPAVWSQ LAQAQLQKDL VKEAINSYIR GDDPSSYLEV VQSASRSNNW
1160 1170 1180 1190 1200
EDLVKFLQMA RKKGRESYIE TELIFALAKT SRVSELEDFI NGPNNAHIQQ
1210 1220 1230 1240 1250
VGDRCYEEGM YEAAKLLYSN VSNFARLAST LVHLGEYQAA VDNSRKASST
1260 1270 1280 1290 1300
RTWKEVCFAC MDGQEFRFAQ LCGLHIVIHA DELEELMCYY QDRGYFEELI
1310 1320 1330 1340 1350
LLLEAALGLE RAHMGMFTEL AILYSKFKPQ KMLEHLELFW SRVNIPKVLR
1360 1370 1380 1390 1400
AAEQAHLWAE LVFLYDKYEE YDNAVLTMMS HPTEAWKEGQ FKDIITKVAN
1410 1420 1430 1440 1450
VELCYRALQF YLDYKPLLIN DLLLVLSPRL DHTWTVSFFS KAGQLPLVKP
1460 1470 1480 1490 1500
YLRSVQSHNN KSVNEALNHL LTEEEDYQGL RASIDAYDNF DNISLAQQLE
1510 1520 1530 1540 1550
KHQLMEFRCI AAYLYKGNNW WAQSVELCKK DHLYKDAMQH AAESRDAELA
1560 1570 1580 1590 1600
QKLLQWFLEE GKRECFAACL FTCYDLLRPD MVLELAWRHN LVDLAMPYFI
1610 1620 1630 1640
QVMREYLSKV DKLDALESLR KQEEHVTEPA PLVFDFDGHE
Length:1,640
Mass (Da):187,030
Last modified:November 1, 1997 - v2
Checksum:iC661E1AB989D8E7F
GO
Isoform 2 (identifier: P53675-2) [UniParc]FASTAAdd to basket
Also known as: Short, Muscle

The sequence of this isoform differs from the canonical sequence as follows:
     1479-1535: Missing.

Show »
Length:1,583
Mass (Da):180,296
Checksum:i5DB1E3D4A77607D8
GO

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti193P → H in AAC50494 (PubMed:8733128).Curated1
Sequence conflicti215L → H in AAC50494 (PubMed:8733128).Curated1
Sequence conflicti320K → T in CAA64752 (PubMed:8733129).Curated1
Sequence conflicti530L → Q in AAC50494 (PubMed:8733128).Curated1
Sequence conflicti1474E → K in AAB40908 (PubMed:8844170).Curated1
Sequence conflicti1474E → K in AAB40909 (PubMed:8844170).Curated1
Sequence conflicti1620 – 1640RKQEE…FDGHE → PPSKRSM in AAB40908 (PubMed:8844170).CuratedAdd BLAST21
Sequence conflicti1620 – 1640RKQEE…FDGHE → PPSKRSM in AAB40909 (PubMed:8844170).CuratedAdd BLAST21

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_05565361P → L.Corresponds to variant rs3747059dbSNPEnsembl.1
Natural variantiVAR_055654205K → R.Corresponds to variant rs5746697dbSNPEnsembl.1
Natural variantiVAR_055655279Y → C.Corresponds to variant rs807459dbSNPEnsembl.1
Natural variantiVAR_055656691E → K.1 PublicationCorresponds to variant rs1060374dbSNPEnsembl.1
Natural variantiVAR_055657941K → R.Corresponds to variant rs35398725dbSNPEnsembl.1
Natural variantiVAR_055658945R → H.Corresponds to variant rs36077768dbSNPEnsembl.1
Natural variantiVAR_0556591046R → C.Corresponds to variant rs712952dbSNPEnsembl.1
Natural variantiVAR_0592141195N → S.Corresponds to variant rs807547dbSNPEnsembl.1
Natural variantiVAR_0592151316M → V.Corresponds to variant rs1061325dbSNPEnsembl.1
Natural variantiVAR_0592161394I → T.Corresponds to variant rs1633399dbSNPEnsembl.1
Natural variantiVAR_0592171592V → M.Corresponds to variant rs2073738dbSNPEnsembl.1
Natural variantiVAR_0592181620R → H.Corresponds to variant rs5748024dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0011001479 – 1535Missing in isoform 2. 3 PublicationsAdd BLAST57

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U41763 mRNA. Translation: AAC50494.1.
X95486 mRNA. Translation: CAA64752.1.
X95487 mRNA. Translation: CAA64753.1.
U60802 mRNA. Translation: AAB40908.1.
U60803 mRNA. Translation: AAB40909.1.
AK302506 mRNA. Translation: BAH13731.1.
CH471176 Genomic DNA. Translation: EAX03047.1.
CCDSiCCDS46662.2. [P53675-1]
CCDS54497.2. [P53675-2]
PIRiG02757.
T09522.
RefSeqiNP_001826.3. NM_001835.3. [P53675-2]
NP_009029.3. NM_007098.3. [P53675-1]
UniGeneiHs.368266.

Genome annotation databases

EnsembliENST00000427926; ENSP00000441158; ENSG00000070371. [P53675-1]
ENST00000621271; ENSP00000485020; ENSG00000070371. [P53675-2]
GeneIDi8218.
KEGGihsa:8218.
UCSCiuc032qgb.2. human. [P53675-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Web resourcesi

Atlas of Genetics and Cytogenetics in Oncology and Haematology

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U41763 mRNA. Translation: AAC50494.1.
X95486 mRNA. Translation: CAA64752.1.
X95487 mRNA. Translation: CAA64753.1.
U60802 mRNA. Translation: AAB40908.1.
U60803 mRNA. Translation: AAB40909.1.
AK302506 mRNA. Translation: BAH13731.1.
CH471176 Genomic DNA. Translation: EAX03047.1.
CCDSiCCDS46662.2. [P53675-1]
CCDS54497.2. [P53675-2]
PIRiG02757.
T09522.
RefSeqiNP_001826.3. NM_001835.3. [P53675-2]
NP_009029.3. NM_007098.3. [P53675-1]
UniGeneiHs.368266.

3D structure databases

ProteinModelPortaliP53675.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi113854. 73 interactors.
IntActiP53675. 27 interactors.
MINTiMINT-208273.
STRINGi9606.ENSP00000445677.

PTM databases

iPTMnetiP53675.
PhosphoSitePlusiP53675.
SwissPalmiP53675.

Polymorphism and mutation databases

BioMutaiCLTCL1.
DMDMi2506298.

Proteomic databases

EPDiP53675.
MaxQBiP53675.
PaxDbiP53675.
PeptideAtlasiP53675.
PRIDEiP53675.

Protocols and materials databases

DNASUi8218.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000427926; ENSP00000441158; ENSG00000070371. [P53675-1]
ENST00000621271; ENSP00000485020; ENSG00000070371. [P53675-2]
GeneIDi8218.
KEGGihsa:8218.
UCSCiuc032qgb.2. human. [P53675-1]

Organism-specific databases

CTDi8218.
DisGeNETi8218.
GeneCardsiCLTCL1.
H-InvDBHIX0016234.
HGNCiHGNC:2093. CLTCL1.
MIMi601273. gene.
neXtProtiNX_P53675.
OpenTargetsiENSG00000070371.
PharmGKBiPA26619.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG0985. Eukaryota.
ENOG410XPH1. LUCA.
GeneTreeiENSGT00400000022107.
HOGENOMiHOG000188877.
HOVERGENiHBG005344.
InParanoidiP53675.
KOiK04646.
OMAiGVMKISP.
OrthoDBiEOG091G009O.
PhylomeDBiP53675.
TreeFamiTF300059.

Enzyme and pathway databases

BioCyciZFISH:ENSG00000070371-MONOMER.
ReactomeiR-HSA-190873. Gap junction degradation.
R-HSA-196025. Formation of annular gap junctions.
R-HSA-3928665. EPH-ephrin mediated repulsion of cells.
R-HSA-8856825. Cargo recognition for clathrin-mediated endocytosis.
R-HSA-8856828. Clathrin-mediated endocytosis.
SIGNORiP53675.

Miscellaneous databases

ChiTaRSiCLTCL1. human.
GenomeRNAii8218.
PROiP53675.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000070371.
CleanExiHS_CLTCL1.
ExpressionAtlasiP53675. baseline and differential.
GenevisibleiP53675. HS.

Family and domain databases

Gene3Di1.25.40.10. 4 hits.
2.130.10.110. 1 hit.
InterProiIPR016024. ARM-type_fold.
IPR000547. Clathrin_H-chain/VPS_repeat.
IPR016025. Clathrin_H-chain_link/propller.
IPR015348. Clathrin_H-chain_linker_core.
IPR001473. Clathrin_H-chain_propeller_N.
IPR022365. Clathrin_H-chain_propeller_rpt.
IPR016341. Clathrin_heavy_chain.
IPR011990. TPR-like_helical_dom.
[Graphical view]
PfamiPF00637. Clathrin. 7 hits.
PF09268. Clathrin-link. 1 hit.
PF01394. Clathrin_propel. 4 hits.
[Graphical view]
PIRSFiPIRSF002290. Clathrin_H_chain. 1 hit.
SMARTiSM00299. CLH. 7 hits.
[Graphical view]
SUPFAMiSSF48371. SSF48371. 6 hits.
SSF50989. SSF50989. 1 hit.
PROSITEiPS50236. CHCR. 7 hits.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiCLH2_HUMAN
AccessioniPrimary (citable) accession number: P53675
Secondary accession number(s): B7Z7U5
, Q14017, Q15808, Q15809
Entry historyi
Integrated into UniProtKB/Swiss-Prot: October 1, 1996
Last sequence update: November 1, 1997
Last modified: November 30, 2016
This is version 156 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 22
    Human chromosome 22: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.