Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Centrosomal protein of 162 kDa

Gene

CEP162

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Required to promote assembly of the transition zone in primary cilia. Acts by specifically recognizing and binding the axonemal microtubule. Localizes to the distal ends of centrioles before ciliogenesis and directly binds to axonemal microtubule, thereby promoting and restricting transition zone formation specifically at the cilia base. Required to mediate CEP290 association with microtubules.1 Publication

GO - Biological processi

  • cilium assembly Source: UniProtKB
Complete GO annotation...

Keywords - Biological processi

Cilium biogenesis/degradation

Enzyme and pathway databases

ReactomeiR-HSA-5620912. Anchoring of the basal body to the plasma membrane.

Names & Taxonomyi

Protein namesi
Recommended name:
Centrosomal protein of 162 kDa
Short name:
Cep162
Alternative name(s):
Protein QN1 homolog
Gene namesi
Name:CEP162
Synonyms:C6orf84, KIAA1009, QN1
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 6

Organism-specific databases

HGNCiHGNC:21107. CEP162.

Subcellular locationi

GO - Cellular componenti

  • axonemal microtubule Source: UniProtKB
  • centriole Source: UniProtKB
  • centrosome Source: UniProtKB
  • cytoplasm Source: HPA
  • cytosol Source: Reactome
  • nucleoplasm Source: HPA
  • spindle Source: UniProtKB-SubCell
Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Cytoskeleton, Microtubule, Nucleus

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA134972331.

Polymorphism and mutation databases

BioMutaiKIAA1009.
DMDMi156630849.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 14031403Centrosomal protein of 162 kDaPRO_0000295628Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei157 – 1571PhosphoserineBy similarity
Modified residuei160 – 1601PhosphoserineBy similarity
Modified residuei474 – 4741PhosphoserineCombined sources
Modified residuei475 – 4751PhosphoserineCombined sources
Cross-linki908 – 908Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in ubiquitin)1 Publication

Keywords - PTMi

Isopeptide bond, Phosphoprotein, Ubl conjugation

Proteomic databases

EPDiQ5TB80.
MaxQBiQ5TB80.
PaxDbiQ5TB80.
PeptideAtlasiQ5TB80.
PRIDEiQ5TB80.

PTM databases

iPTMnetiQ5TB80.
PhosphoSiteiQ5TB80.

Expressioni

Gene expression databases

BgeeiQ5TB80.
CleanExiHS_KIAA1009.
ExpressionAtlasiQ5TB80. baseline and differential.
GenevisibleiQ5TB80. HS.

Organism-specific databases

HPAiHPA030170.
HPA030171.
HPA030172.
HPA030173.

Interactioni

Subunit structurei

Interacts with CEP290 (PubMed:23644468). Interacts with CPNE4 (By similarity). Interacts with alpha-tubulin (By similarity).By similarity1 Publication

Binary interactionsi

WithEntry#Exp.IntActNotes
CEP120Q8N9606EBI-1059012,EBI-2563015
CEP135Q66GS93EBI-1059012,EBI-1046993

Protein-protein interaction databases

BioGridi116506. 142 interactions.
DIPiDIP-50714N.
IntActiQ5TB80. 143 interactions.
MINTiMINT-8417677.
STRINGi9606.ENSP00000385215.

Structurei

3D structure databases

ProteinModelPortaliQ5TB80.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Coiled coil

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Coiled coili617 – 67054Sequence analysisAdd
BLAST
Coiled coili698 – 1121424Sequence analysisAdd
BLAST
Coiled coili1171 – 120636Sequence analysisAdd
BLAST
Coiled coili1235 – 1386152Sequence analysisAdd
BLAST

Sequence similaritiesi

Belongs to the CEP162 family.Curated

Keywords - Domaini

Coiled coil

Phylogenomic databases

eggNOGiENOG410IG8C. Eukaryota.
ENOG410XQ44. LUCA.
GeneTreeiENSGT00390000009631.
HOGENOMiHOG000090261.
HOVERGENiHBG108387.
InParanoidiQ5TB80.
KOiK16809.
OMAiQKMKIQY.
OrthoDBiEOG7MPRDZ.
PhylomeDBiQ5TB80.
TreeFamiTF330884.

Sequences (2)i

Sequence statusi: Complete.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q5TB80-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MANCSQEELD EEFEQFMKEL SDDSFENSDK TARQSKKEMK KKDTVPWWIT
60 70 80 90 100
EDDFKDDGLL GTNVSYLKTK KTSQPVMEIE EESAEKIQFL KSSGTSLLST
110 120 130 140 150
DSLETNELVV SELNHSSLGV GLDTLEEQEE KEQFFARLEK GLTSSIDYSR
160 170 180 190 200
LNKELDSNDS THFKALHSNQ ANAELTDDEH ENESKHEELA ENYSDDFEDE
210 220 230 240 250
YVGAPLTTKD EEMPSKENSK SEKISVPKQE EEKTGMLANV VLLDSLDSVA
260 270 280 290 300
EVNLDEQDKI TPKPRCLPEM TENEMTGTGV SYGQSSSDVE ALHQAYCHIA
310 320 330 340 350
HSLGDEDKQK IESNTVEDIK SSVKGHPQEN EENSKNISTM ESDLPTVEEL
360 370 380 390 400
MKPIRIDSFG ISGFDLQPVS SEKVAERKET EFFSSLPLKM NPNILSQDSQ
410 420 430 440 450
HVNLFFDKND ENVILQKTTN ESMENSCPQV TEVTATEEHV DKMYLNILRK
460 470 480 490 500
KITVNSSSLS QDDKINKTYR SQLSSEEEGA VMGKQVPYKK ARSAPPLLKR
510 520 530 540 550
KPQSGLYASV RSSGYGKPSS PLKMFSTLEK KTSEDIIKSK NLRSISTSNQ
560 570 580 590 600
PRKKEILSGT KLIKPAALDK PAHKTESCLS TRKKSENPTE TDSCIQFQTD
610 620 630 640 650
SLGYCGENKE KKLLMFKRVQ EAEDKWRGAQ ALIEQIKATF SEKEKELENK
660 670 680 690 700
LEELKKQQEK ELFKLNQDNY ILQAKLSSFE ETNKKQRWLH FGEAADPVTG
710 720 730 740 750
EKLKQIQKEI QEQETLLQGY QQENERLYNQ VKDLQEQNKK NEERMFKENQ
760 770 780 790 800
SLFSEVASLK EQMHKSRFLS QVVEDSEPTR NQNFTDLLAE LRMAQKEKDS
810 820 830 840 850
LLEDIKRLKQ DKQALEVDFE KMKKERDQAK DQIAYVTGEK LYEIKILEET
860 870 880 890 900
HKQEISRLQK RLQWYAENQE LLDKDALRLR EANEEIEKLK LEIEKLKAES
910 920 930 940 950
GNPSIRQKIR LKDKAADAKK IQDLERQVKE MEGILKRRYP NSLPALILAA
960 970 980 990 1000
SAAGDTVDKN TVEFMEKRIK KLEADLEGKD EDAKKSLRTM EQQFQKMKIQ
1010 1020 1030 1040 1050
YEQRLEQQEQ LLACKLNQHD SPRIKALEKE LDDIKEAHQI TVRNLEAEID
1060 1070 1080 1090 1100
VLKHQNAELD VKKNDKDDED FQSIEFQVEQ AHAKAKLVRL NEELAAKKRE
1110 1120 1130 1140 1150
IQDLSKTVER LQKDRRMMLS NQNSKGREEM SAKRAKKDVL HSSKGNANSF
1160 1170 1180 1190 1200
PGTLDSKLYQ PHTFTDSHVS EVLQENYRLK NELEGLISEK NELKMKSEAV
1210 1220 1230 1240 1250
MNQFENSMRR VKEDTAAHIA SLKASHQREI EKLLCQNAVE NSSSKVAELN
1260 1270 1280 1290 1300
RKIATQEVLI RHFQSQVNEL QSKQESLVVS EVREEILQKE ITKLLEELRE
1310 1320 1330 1340 1350
AKENHTPEMK HFVGLEKKIK QMEMRHAQRE QELQQIIQQT HQVVETEQNK
1360 1370 1380 1390 1400
EVEKWKRLAQ LKNRELEKFR TELDSILDVL RELHRQGVVV PVAFADEMNA

PEY
Note: No experimental confirmation available.
Length:1,403
Mass (Da):161,943
Last modified:July 24, 2007 - v2
Checksum:i93638BE4BC3A4B29
GO
Isoform 2 (identifier: Q5TB80-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-76: Missing.

Note: No experimental confirmation available.
Show »
Length:1,327
Mass (Da):153,060
Checksum:i6F094643A2DE802B
GO

Sequence cautioni

The sequence BAA76853.2 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti168 – 1681S → I in BAA76853 (PubMed:10231032).Curated

Natural variant

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti266 – 2661C → S.
Corresponds to variant rs17790493 [ dbSNP | Ensembl ].
VAR_033301
Natural varianti272 – 2721E → Q.
Corresponds to variant rs16874323 [ dbSNP | Ensembl ].
VAR_033302
Natural varianti342 – 3421S → C.
Corresponds to variant rs17790493 [ dbSNP | Ensembl ].
VAR_051293
Natural varianti348 – 3481E → Q.
Corresponds to variant rs16874323 [ dbSNP | Ensembl ].
VAR_051294

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei1 – 7676Missing in isoform 2. 1 PublicationVSP_026953Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB023226 mRNA. Translation: BAA76853.2. Different initiation.
AL138742 Genomic DNA. Translation: CAI22698.2.
AL138742 Genomic DNA. Translation: CAO03537.1.
CCDSiCCDS34494.2. [Q5TB80-1]
CCDS69149.1. [Q5TB80-2]
RefSeqiNP_001273135.1. NM_001286206.1. [Q5TB80-2]
NP_055710.2. NM_014895.3. [Q5TB80-1]
XP_006715443.1. XM_006715380.2. [Q5TB80-2]
XP_011533893.1. XM_011535591.1. [Q5TB80-2]
UniGeneiHs.485865.

Genome annotation databases

EnsembliENST00000257766; ENSP00000257766; ENSG00000135315. [Q5TB80-2]
ENST00000403245; ENSP00000385215; ENSG00000135315. [Q5TB80-1]
ENST00000617909; ENSP00000481760; ENSG00000135315. [Q5TB80-2]
GeneIDi22832.
KEGGihsa:22832.
UCSCiuc003pkj.6. human. [Q5TB80-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB023226 mRNA. Translation: BAA76853.2. Different initiation.
AL138742 Genomic DNA. Translation: CAI22698.2.
AL138742 Genomic DNA. Translation: CAO03537.1.
CCDSiCCDS34494.2. [Q5TB80-1]
CCDS69149.1. [Q5TB80-2]
RefSeqiNP_001273135.1. NM_001286206.1. [Q5TB80-2]
NP_055710.2. NM_014895.3. [Q5TB80-1]
XP_006715443.1. XM_006715380.2. [Q5TB80-2]
XP_011533893.1. XM_011535591.1. [Q5TB80-2]
UniGeneiHs.485865.

3D structure databases

ProteinModelPortaliQ5TB80.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi116506. 142 interactions.
DIPiDIP-50714N.
IntActiQ5TB80. 143 interactions.
MINTiMINT-8417677.
STRINGi9606.ENSP00000385215.

PTM databases

iPTMnetiQ5TB80.
PhosphoSiteiQ5TB80.

Polymorphism and mutation databases

BioMutaiKIAA1009.
DMDMi156630849.

Proteomic databases

EPDiQ5TB80.
MaxQBiQ5TB80.
PaxDbiQ5TB80.
PeptideAtlasiQ5TB80.
PRIDEiQ5TB80.

Protocols and materials databases

DNASUi22832.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000257766; ENSP00000257766; ENSG00000135315. [Q5TB80-2]
ENST00000403245; ENSP00000385215; ENSG00000135315. [Q5TB80-1]
ENST00000617909; ENSP00000481760; ENSG00000135315. [Q5TB80-2]
GeneIDi22832.
KEGGihsa:22832.
UCSCiuc003pkj.6. human. [Q5TB80-1]

Organism-specific databases

CTDi22832.
GeneCardsiCEP162.
H-InvDBHIX0006041.
HGNCiHGNC:21107. CEP162.
HPAiHPA030170.
HPA030171.
HPA030172.
HPA030173.
MIMi610201. gene.
neXtProtiNX_Q5TB80.
PharmGKBiPA134972331.
HUGEiSearch...
GenAtlasiSearch...

Phylogenomic databases

eggNOGiENOG410IG8C. Eukaryota.
ENOG410XQ44. LUCA.
GeneTreeiENSGT00390000009631.
HOGENOMiHOG000090261.
HOVERGENiHBG108387.
InParanoidiQ5TB80.
KOiK16809.
OMAiQKMKIQY.
OrthoDBiEOG7MPRDZ.
PhylomeDBiQ5TB80.
TreeFamiTF330884.

Enzyme and pathway databases

ReactomeiR-HSA-5620912. Anchoring of the basal body to the plasma membrane.

Miscellaneous databases

GenomeRNAii22832.
PROiQ5TB80.
SOURCEiSearch...

Gene expression databases

BgeeiQ5TB80.
CleanExiHS_KIAA1009.
ExpressionAtlasiQ5TB80. baseline and differential.
GenevisibleiQ5TB80. HS.

Family and domain databases

ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Prediction of the coding sequences of unidentified human genes. XIII. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro."
    Nagase T., Ishikawa K., Suyama M., Kikuno R., Hirosawa M., Miyajima N., Tanaka A., Kotani H., Nomura N., Ohara O.
    DNA Res. 6:63-70(1999) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2).
    Tissue: Brain.
  2. Ohara O., Nagase T., Kikuno R.
    Submitted (JAN-2004) to the EMBL/GenBank/DDBJ databases
    Cited for: SEQUENCE REVISION.
  3. "The DNA sequence and analysis of human chromosome 6."
    Mungall A.J., Palmer S.A., Sims S.K., Edwards C.A., Ashurst J.L., Wilming L., Jones M.C., Horton R., Hunt S.E., Scott C.E., Gilbert J.G.R., Clamp M.E., Bethel G., Milne S., Ainscough R., Almeida J.P., Ambrose K.D., Andrews T.D.
    , Ashwell R.I.S., Babbage A.K., Bagguley C.L., Bailey J., Banerjee R., Barker D.J., Barlow K.F., Bates K., Beare D.M., Beasley H., Beasley O., Bird C.P., Blakey S.E., Bray-Allen S., Brook J., Brown A.J., Brown J.Y., Burford D.C., Burrill W., Burton J., Carder C., Carter N.P., Chapman J.C., Clark S.Y., Clark G., Clee C.M., Clegg S., Cobley V., Collier R.E., Collins J.E., Colman L.K., Corby N.R., Coville G.J., Culley K.M., Dhami P., Davies J., Dunn M., Earthrowl M.E., Ellington A.E., Evans K.A., Faulkner L., Francis M.D., Frankish A., Frankland J., French L., Garner P., Garnett J., Ghori M.J., Gilby L.M., Gillson C.J., Glithero R.J., Grafham D.V., Grant M., Gribble S., Griffiths C., Griffiths M.N.D., Hall R., Halls K.S., Hammond S., Harley J.L., Hart E.A., Heath P.D., Heathcott R., Holmes S.J., Howden P.J., Howe K.L., Howell G.R., Huckle E., Humphray S.J., Humphries M.D., Hunt A.R., Johnson C.M., Joy A.A., Kay M., Keenan S.J., Kimberley A.M., King A., Laird G.K., Langford C., Lawlor S., Leongamornlert D.A., Leversha M., Lloyd C.R., Lloyd D.M., Loveland J.E., Lovell J., Martin S., Mashreghi-Mohammadi M., Maslen G.L., Matthews L., McCann O.T., McLaren S.J., McLay K., McMurray A., Moore M.J.F., Mullikin J.C., Niblett D., Nickerson T., Novik K.L., Oliver K., Overton-Larty E.K., Parker A., Patel R., Pearce A.V., Peck A.I., Phillimore B.J.C.T., Phillips S., Plumb R.W., Porter K.M., Ramsey Y., Ranby S.A., Rice C.M., Ross M.T., Searle S.M., Sehra H.K., Sheridan E., Skuce C.D., Smith S., Smith M., Spraggon L., Squares S.L., Steward C.A., Sycamore N., Tamlyn-Hall G., Tester J., Theaker A.J., Thomas D.W., Thorpe A., Tracey A., Tromans A., Tubby B., Wall M., Wallis J.M., West A.P., White S.S., Whitehead S.L., Whittaker H., Wild A., Willey D.J., Wilmer T.E., Wood J.M., Wray P.W., Wyatt J.C., Young L., Younger R.M., Bentley D.R., Coulson A., Durbin R.M., Hubbard T., Sulston J.E., Dunham I., Rogers J., Beck S.
    Nature 425:805-811(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  4. "Expression, cellular distribution and protein binding of the glioma amplified sequence (GAS41), a highly conserved putative transcription factor."
    Munnia A., Schuetz N., Romeike B.F.M., Maldener E., Glass B., Maas R., Nastainczyk W., Feiden W., Fischer U., Meese E.
    Oncogene 20:4853-4863(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: SUBCELLULAR LOCATION.
  5. "QN1/KIAA1009: a new essential protein for chromosome segregation and mitotic spindle assembly."
    Leon A., Omri B., Gely A., Klein C., Crisanti P.
    Oncogene 25:1887-1895(2006) [PubMed] [Europe PMC] [Abstract]
    Cited for: SUBCELLULAR LOCATION.
  6. "Tryptic digestion of ubiquitin standards reveals an improved strategy for identifying ubiquitinated proteins by mass spectrometry."
    Denis N.J., Vasilescu J., Lambert J.-P., Smith J.C., Figeys D.
    Proteomics 7:868-874(2007) [PubMed] [Europe PMC] [Abstract]
    Cited for: UBIQUITINATION [LARGE SCALE ANALYSIS] AT LYS-908.
    Tissue: Mammary cancer.
  7. "Quantitative phosphoproteomic analysis of T cell receptor signaling reveals system-wide modulation of protein-protein interactions."
    Mayya V., Lundgren D.H., Hwang S.-I., Rezaul K., Wu L., Eng J.K., Rodionov V., Han D.K.
    Sci. Signal. 2:RA46-RA46(2009) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-474 AND SER-475, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Leukemic T-cell.
  8. "Toward a comprehensive characterization of a human cancer cell phosphoproteome."
    Zhou H., Di Palma S., Preisinger C., Peng M., Polat A.N., Heck A.J., Mohammed S.
    J. Proteome Res. 12:260-271(2013) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-474, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Erythroleukemia.
  9. "CEP162 is an axoneme-recognition protein promoting ciliary transition zone assembly at the cilia base."
    Wang W.J., Tay H.G., Soni R., Perumal G.S., Goll M.G., Macaluso F.P., Asara J.M., Amack J.D., Bryan Tsou M.F.
    Nat. Cell Biol. 15:591-601(2013) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, SUBCELLULAR LOCATION, INTERACTION WITH CEP290.

Entry informationi

Entry nameiCE162_HUMAN
AccessioniPrimary (citable) accession number: Q5TB80
Secondary accession number(s): A6PVL7
, A6PVL8, Q6P475, Q9Y2L2
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 24, 2007
Last sequence update: July 24, 2007
Last modified: July 6, 2016
This is version 105 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Miscellaneous

Promotes ectopic assembly of transition zone components at cilia tips when targeted outside distal ends of centrioles, generating extra-long cilia with strikingly swollen tips.1 Publication

Caution

Was initially thought to regulate chromosome segregation and mitotic spindle assembly (PubMed:16302001). However, it was later shown that its absence neither affect mitosis nor centriole duplication (PubMed:23644468).2 Publications

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 6
    Human chromosome 6: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.