Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Transposon Ty3-G Gag-Pol polyprotein

Gene

TY3B-G

Organism
Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Capsid protein (CA) is the structural component of the virus-like particle (VLP), forming the shell that encapsulates the genomic RNA-nucleocapsid complex.
Nucleocapsid protein p11 (NC) forms the nucleocore that coats the retro-elements dimeric RNA. Binds these RNAs through its zinc fingers (By similarity). Promotes primer tRNA(i)-Met annealing to the multipartite primer-binding site (PBS), dimerization of Ty3 RNA and initiation of reverse transcription.By similarity4 Publications
The aspartyl protease (PR) mediates the proteolytic cleavages of the Gag and Gag-Pol polyproteins after assembly of the VLP.
Reverse transcriptase/ribonuclease H (RT) is a multifunctional enzyme that catalyzes the conversion of the retro-elements RNA genome into dsDNA within the VLP. The enzyme displays a DNA polymerase activity that can copy either DNA or RNA templates, and a ribonuclease H (RNase H) activity that cleaves the RNA strand of RNA-DNA heteroduplexes during plus-strand synthesis and hydrolyzes RNA primers. The conversion leads to a linear dsDNA copy of the retrotransposon that includes long terminal repeats (LTRs) at both ends.
Integrase (IN) targets the VLP to the nucleus, where a subparticle preintegration complex (PIC) containing at least integrase and the newly synthesized dsDNA copy of the retrotransposon must transit the nuclear membrane. Once in the nucleus, integrase performs the integration of the dsDNA into the host genome.

Catalytic activityi

Deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1).PROSITE-ProRule annotation
Endonucleolytic cleavage to 5'-phosphomonoester.

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Active sitei336For protease activity; shared with dimeric partnerBy similarity1
Metal bindingi686Magnesium; catalytic; for reverse transcriptase activityBy similarity1
Metal bindingi748Magnesium; catalytic; for reverse transcriptase activityBy similarity1
Metal bindingi749Magnesium; catalytic; for reverse transcriptase activityBy similarity1
Metal bindingi893Magnesium; catalytic; for RNase H activityBy similarity1
Metal bindingi936Magnesium; catalytic; for RNase H activityBy similarity1
Metal bindingi961Magnesium; catalytic; for RNase H activityBy similarity1
Metal bindingi1175Magnesium; catalytic; for integrase activityBy similarity1
Metal bindingi1236Magnesium; catalytic; for integrase activityBy similarity1

Regions

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Zinc fingeri265 – 282CCHC-typePROSITE-ProRule annotationAdd BLAST18

GO - Molecular functioni

  • aspartic-type endopeptidase activity Source: UniProtKB-KW
  • ATP binding Source: UniProtKB-KW
  • DNA binding Source: UniProtKB-KW
  • DNA-directed DNA polymerase activity Source: SGD
  • peptidase activity Source: SGD
  • ribonuclease activity Source: SGD
  • RNA binding Source: SGD
  • RNA-directed DNA polymerase activity Source: SGD
  • RNA-DNA hybrid ribonuclease activity Source: UniProtKB-EC
  • zinc ion binding Source: InterPro

GO - Biological processi

  • DNA integration Source: UniProtKB-KW
  • DNA recombination Source: UniProtKB-KW
  • transposition, RNA-mediated Source: SGD
Complete GO annotation...

Keywords - Molecular functioni

Aspartyl protease, DNA-directed DNA polymerase, Endonuclease, Hydrolase, Nuclease, Nucleotidyltransferase, Protease, RNA-directed DNA polymerase, Transferase

Keywords - Biological processi

DNA integration, DNA recombination, Virion maturation, Virus exit from host cell

Keywords - Ligandi

ATP-binding, DNA-binding, Magnesium, Metal-binding, Nucleotide-binding, RNA-binding, Zinc

Protein family/group databases

MEROPSiA02.022.

Names & Taxonomyi

Protein namesi
Recommended name:
Transposon Ty3-G Gag-Pol polyprotein
Alternative name(s):
Gag3-Pol3
Transposon Ty3-1 TYA-TYB polyprotein
Cleaved into the following 8 chains:
Capsid protein
Short name:
CA
Alternative name(s):
p24
Ty3 protease (EC:3.4.23.-)
Short name:
PR
Alternative name(s):
p16
Reverse transcriptase/ribonuclease H (EC:2.7.7.49, EC:2.7.7.7, EC:3.1.26.4)
Short name:
RT
Short name:
RT-RH
Alternative name(s):
p55
Integrase p61
Short name:
IN
Integrase p58
Short name:
IN
Gene namesi
Name:TY3B-G
Synonyms:YGRWTy3-1 POL
Ordered Locus Names:YGR109W-B
ORF Names:G5984
OrganismiSaccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)
Taxonomic identifieri559292 [NCBI]
Taxonomic lineageiEukaryotaFungiDikaryaAscomycotaSaccharomycotinaSaccharomycetesSaccharomycetalesSaccharomycetaceaeSaccharomyces
Proteomesi
  • UP000002311 Componenti: Chromosome VII

Organism-specific databases

EuPathDBiFungiDB:YGR109W-B.
SGDiS000007347. YGR109W-B.

Subcellular locationi

  • Cytoplasm 1 Publication
  • Nucleus 1 Publication

GO - Cellular componenti

  • cytoplasm Source: UniProtKB-SubCell
  • nucleus Source: SGD
  • retrotransposon nucleocapsid Source: SGD
Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Nucleus

Pathology & Biotechi

Mutagenesis

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Mutagenesisi267C → S: Reduces level of VLP formation and maturation. 1 Publication1
Mutagenesisi275H → Q: Reduces level of VLP formation and maturation. 1 Publication1

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Initiator methionineiRemoved1 Publication
ChainiPRO_00002793562 – 1547Transposon Ty3-G Gag-Pol polyproteinAdd BLAST1546
ChainiPRO_00002793572 – 207Capsid proteinAdd BLAST206
PeptideiPRO_0000279358208 – 233Spacer peptide p3Add BLAST26
ChainiPRO_0000279359234 – 309Nucleocapsid protein p11Add BLAST76
ChainiPRO_0000279360310 – 442Ty3 proteaseAdd BLAST133
PeptideiPRO_0000279361443 – 535Spacer peptide JSequence analysisAdd BLAST93
ChainiPRO_0000279362536 – 1011Reverse transcriptase/ribonuclease HAdd BLAST476
ChainiPRO_00002793631012 – 1547Integrase p61Add BLAST536
ChainiPRO_00002793641038 – 1547Integrase p58Add BLAST510

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Modified residuei2N-acetylserine1 Publication1

Post-translational modificationi

Initially, virus-like particles (VLPs) are composed of the structural unprocessed proteins Gag and Gag-Pol, and contain also the host initiator methionine tRNA (tRNA(i)-Met) which serves as a primer for minus-strand DNA synthesis, and a dimer of genomic Ty RNA. Processing of the polyproteins occurs within the particle and proceeds by an ordered pathway, called maturation. First, the protease (PR) is released by autocatalytic cleavage of the Gag-Pol polyprotein, and this cleavage is a prerequisite for subsequent processing at the remaining sites to release the mature structural and catalytic proteins. Maturation takes place prior to the RT reaction and is required to produce transposition-competent VLPs.2 Publications

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sitei207 – 208Cleavage; by Ty3 protease2
Sitei233 – 234Cleavage; by Ty3 protease2
Sitei309 – 310Cleavage; by Ty3 protease2
Sitei442 – 443Cleavage; by Ty3 proteaseSequence analysis2
Sitei535 – 536Cleavage; by Ty3 protease2
Sitei1011 – 1012Cleavage; by Ty3 protease2
Sitei1037 – 1038Cleavage; by Ty3 protease; partial2

Keywords - PTMi

Acetylation

Proteomic databases

PRIDEiQ99315.

PTM databases

iPTMnetiQ99315.

Miscellaneous databases

PMAP-CutDBQ99315.

Interactioni

Subunit structurei

The protease is a homodimer, whose active site consists of two apposed aspartic acid residues.

Protein-protein interaction databases

BioGridi33356. 1 interactor.

Structurei

Secondary structure

11547
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Turni552 – 555Combined sources4
Helixi558 – 563Combined sources6
Turni565 – 567Combined sources3
Helixi608 – 621Combined sources14
Beta strandi636 – 640Combined sources5
Beta strandi646 – 650Combined sources5
Helixi653 – 656Combined sources4
Helixi669 – 673Combined sources5
Beta strandi681 – 687Combined sources7
Helixi690 – 693Combined sources4
Beta strandi694 – 696Combined sources3
Helixi698 – 704Combined sources7
Beta strandi712 – 717Combined sources6
Helixi725 – 737Combined sources13
Beta strandi743 – 746Combined sources4
Beta strandi749 – 756Combined sources8
Helixi757 – 773Combined sources17
Helixi780 – 782Combined sources3
Beta strandi795 – 797Combined sources3
Beta strandi802 – 804Combined sources3
Helixi806 – 809Combined sources4
Turni810 – 814Combined sources5
Helixi821 – 832Combined sources12
Turni833 – 837Combined sources5
Helixi841 – 844Combined sources4
Helixi849 – 853Combined sources5
Helixi863 – 871Combined sources9
Turni876 – 878Combined sources3
Beta strandi884 – 891Combined sources8
Beta strandi899 – 906Combined sources8
Beta strandi908 – 911Combined sources4
Beta strandi913 – 921Combined sources9
Helixi932 – 944Combined sources13
Helixi948 – 951Combined sources4
Beta strandi957 – 959Combined sources3
Helixi965 – 969Combined sources5
Beta strandi970 – 972Combined sources3
Helixi976 – 986Combined sources11
Beta strandi997 – 1000Combined sources4
Helixi1001 – 1008Combined sources8

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
4OL8X-ray3.10A/B/E/F536-1011[»]
ProteinModelPortaliQ99315.
SMRiQ99315.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini620 – 797Reverse transcriptasePROSITE-ProRule annotationAdd BLAST178
Domaini893 – 1011RNase H Ty3/gyspy-typeAdd BLAST119
Domaini1159 – 1324Integrase catalyticPROSITE-ProRule annotationAdd BLAST166

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni1106 – 1145Integrase-type zinc finger-likeAdd BLAST40

Domaini

Integrase core domain contains the D-x(n)-D-x(35)-E motif, named for the phylogenetically conserved glutamic acid and aspartic acid residues and the invariant 35 amino acid spacing between the second and third acidic residues. Each acidic residue of the D,D(35)E motif is independently essential for the 3'-processing and strand transfer activities of purified integrase protein.

Sequence similaritiesi

Contains 1 CCHC-type zinc finger.PROSITE-ProRule annotation
Contains 1 integrase catalytic domain.PROSITE-ProRule annotation
Contains 1 peptidase A2 domain.Curated
Contains 1 reverse transcriptase domain.PROSITE-ProRule annotation

Zinc finger

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Zinc fingeri265 – 282CCHC-typePROSITE-ProRule annotationAdd BLAST18

Keywords - Domaini

Zinc-finger

Phylogenomic databases

HOGENOMiHOG000172599.
InParanoidiQ99315.
KOiK07497.
OrthoDBiEOG092C0KYC.

Family and domain databases

Gene3Di3.30.420.10. 1 hit.
InterProiIPR001584. Integrase_cat-core.
IPR024650. Peptidase_A2B.
IPR021109. Peptidase_aspartic_dom.
IPR012337. RNaseH-like_dom.
IPR000477. RT_dom.
IPR001878. Znf_CCHC.
[Graphical view]
PfamiPF12384. Peptidase_A2B. 1 hit.
PF00665. rve. 1 hit.
PF00078. RVT_1. 1 hit.
[Graphical view]
SMARTiSM00343. ZnF_C2HC. 1 hit.
[Graphical view]
SUPFAMiSSF50630. SSF50630. 1 hit.
SSF53098. SSF53098. 1 hit.
SSF57756. SSF57756. 1 hit.
PROSITEiPS50994. INTEGRASE. 1 hit.
PS50878. RT_POL. 1 hit.
PS50158. ZF_CCHC. 1 hit.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 2 isoformsi produced by ribosomal frameshifting. AlignAdd to basket

Note: The Gag-Pol polyprotein is generated by a +1 ribosomal frameshift.1 Publication
Isoform Transposon Ty3-G Gag-Pol polyprotein (identifier: Q99315-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MSFMDQIPGG GNYPKLPVEC LPNFPIQPSL TFRGRNDSHK LKNFISEIML
60 70 80 90 100
NMSMISWPND ASRIVYCRRH LLNPAAQWAN DFVQEQGILE ITFDTFIQGL
110 120 130 140 150
YQHFYKPPDI NKIFNAITQL SEAKLGIERL NQRFRKIWDR MPPDFMTEKA
160 170 180 190 200
AIMTYTRLLT KETYNIVRMH KPETLKDAME EAYQTTALTE RFFPGFELDA
210 220 230 240 250
DGDTIIGATT HLQEEYDSDY DSEDNLTQNG YVHTVRTRRS YNKPMSNHRN
260 270 280 290 300
RRNNNPSREE CIKNRLCFYC KKEGHRLNEC RARKAVLTDL ELESKDQQTP
310 320 330 340 350
FIKTLPIVHY IAIPEMDNTA EKTIKIQNTK VKTLFDSGSP TSFIRRDIVE
360 370 380 390 400
LLKYEIYETP PLRFRGFVAT KSAVTSEAVT IDLKINDLHI TLAAYILDNM
410 420 430 440 450
DYQLLIGNPI LRRYPKILHT VLNTRESPDS LKPKTYRSET VNNVRTYSAG
460 470 480 490 500
NRGNPRNIKL SFAPTILEAT DPKSAGNRGD SRTKTLSLAT TTPAAIDPLT
510 520 530 540 550
TLDNPGSTQS TFAQFPIPEE ASILEEDGKY SNVVSTIQSV EPNATDHSNK
560 570 580 590 600
DTFCTLPVWL QQKYREIIRN DLPPRPADIN NIPVKHDIEI KPGARLPRLQ
610 620 630 640 650
PYHVTEKNEQ EINKIVQKLL DNKFIVPSKS PCSSPVVLVP KKDGTFRLCV
660 670 680 690 700
DYRTLNKATI SDPFPLPRID NLLSRIGNAQ IFTTLDLHSG YHQIPMEPKD
710 720 730 740 750
RYKTAFVTPS GKYEYTVMPF GLVNAPSTFA RYMADTFRDL RFVNVYLDDI
760 770 780 790 800
LIFSESPEEH WKHLDTVLER LKNENLIVKK KKCKFASEET EFLGYSIGIQ
810 820 830 840 850
KIAPLQHKCA AIRDFPTPKT VKQAQRFLGM INYYRRFIPN CSKIAQPIQL
860 870 880 890 900
FICDKSQWTE KQDKAIDKLK DALCNSPVLV PFNNKANYRL TTDASKDGIG
910 920 930 940 950
AVLEEVDNKN KLVGVVGYFS KSLESAQKNY PAGELELLGI IKALHHFRYM
960 970 980 990 1000
LHGKHFTLRT DHISLLSLQN KNEPARRVQR WLDDLATYDF TLEYLAGPKN
1010 1020 1030 1040 1050
VVADAISRAV YTITPETSRP IDTESWKSYY KSDPLCSAVL IHMKELTQHN
1060 1070 1080 1090 1100
VTPEDMSAFR SYQKKLELSE TFRKNYSLED EMIYYQDRLV VPIKQQNAVM
1110 1120 1130 1140 1150
RLYHDHTLFG GHFGVTVTLA KISPIYYWPK LQHSIIQYIR TCVQCQLIKS
1160 1170 1180 1190 1200
HRPRLHGLLQ PLPIAEGRWL DISMDFVTGL PPTSNNLNMI LVVVDRFSKR
1210 1220 1230 1240 1250
AHFIATRKTL DATQLIDLLF RYIFSYHGFP RTITSDRDVR MTADKYQELT
1260 1270 1280 1290 1300
KRLGIKSTMS SANHPQTDGQ SERTIQTLNR LLRAYASTNI QNWHVYLPQI
1310 1320 1330 1340 1350
EFVYNSTPTR TLGKSPFEID LGYLPNTPAI KSDDEVNARS FTAVELAKHL
1360 1370 1380 1390 1400
KALTIQTKEQ LEHAQIEMET NNNQRRKPLL LNIGDHVLVH RDAYFKKGAY
1410 1420 1430 1440 1450
MKVQQIYVGP FRVVKKINDN AYELDLNSHK KKHRVINVQF LKKFVYRPDA
1460 1470 1480 1490 1500
YPKNKPISST ERIKRAHEVT ALIGIDTTHK TYLCHMQDVD PTLSVEYSEA
1510 1520 1530 1540
EFCQIPERTR RSILANFRQL YETQDNPERE EDVVSQNEIC QYDNTSP
Note: Produced by +1 ribosomal frameshifting between codon Ala-285 and Val-286 of the YGR109W-A ORF.
Length:1,547
Mass (Da):178,307
Last modified:March 6, 2007 - v3
Checksum:i0E327D91E0575F78
GO
Isoform Transposon Ty3-G Gag polyprotein (identifier: Q12173-1) [UniParc]FASTAAdd to basket
The sequence of this isoform can be found in the external entry Q12173.
Isoforms of the same protein are often annotated in two different entries if their sequences differ significantly.
Note: Produced by conventional translation.
Length:290
Mass (Da):34,027
GO

Sequence cautioni

The sequence AAA98435 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence CAA97115 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence CAA97117 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence DAA08202 differs from that shown. Reason: Erroneous gene model prediction.Curated

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M34549 Genomic DNA. Translation: AAA98435.1. Sequence problems.
Z72894 Genomic DNA. Translation: CAA97115.1. Sequence problems.
Z72895 Genomic DNA. Translation: CAA97117.1. Sequence problems.
M18353 Genomic DNA. Translation: AAA66936.1.
BK006941 Genomic DNA. Translation: DAA08202.1. Sequence problems.
PIRiS22875.
S69842.
RefSeqiNP_011624.1. NM_001184381.2.

Genome annotation databases

GeneIDi853006.
KEGGisce:YGR109W-B.

Keywords - Coding sequence diversityi

Ribosomal frameshifting

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M34549 Genomic DNA. Translation: AAA98435.1. Sequence problems.
Z72894 Genomic DNA. Translation: CAA97115.1. Sequence problems.
Z72895 Genomic DNA. Translation: CAA97117.1. Sequence problems.
M18353 Genomic DNA. Translation: AAA66936.1.
BK006941 Genomic DNA. Translation: DAA08202.1. Sequence problems.
PIRiS22875.
S69842.
RefSeqiNP_011624.1. NM_001184381.2.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
4OL8X-ray3.10A/B/E/F536-1011[»]
ProteinModelPortaliQ99315.
SMRiQ99315.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi33356. 1 interactor.

Protein family/group databases

MEROPSiA02.022.

PTM databases

iPTMnetiQ99315.

Proteomic databases

PRIDEiQ99315.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

GeneIDi853006.
KEGGisce:YGR109W-B.

Organism-specific databases

EuPathDBiFungiDB:YGR109W-B.
SGDiS000007347. YGR109W-B.

Phylogenomic databases

HOGENOMiHOG000172599.
InParanoidiQ99315.
KOiK07497.
OrthoDBiEOG092C0KYC.

Miscellaneous databases

PMAP-CutDBQ99315.

Family and domain databases

Gene3Di3.30.420.10. 1 hit.
InterProiIPR001584. Integrase_cat-core.
IPR024650. Peptidase_A2B.
IPR021109. Peptidase_aspartic_dom.
IPR012337. RNaseH-like_dom.
IPR000477. RT_dom.
IPR001878. Znf_CCHC.
[Graphical view]
PfamiPF12384. Peptidase_A2B. 1 hit.
PF00665. rve. 1 hit.
PF00078. RVT_1. 1 hit.
[Graphical view]
SMARTiSM00343. ZnF_C2HC. 1 hit.
[Graphical view]
SUPFAMiSSF50630. SSF50630. 1 hit.
SSF53098. SSF53098. 1 hit.
SSF57756. SSF57756. 1 hit.
PROSITEiPS50994. INTEGRASE. 1 hit.
PS50878. RT_POL. 1 hit.
PS50158. ZF_CCHC. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiYG31B_YEAST
AccessioniPrimary (citable) accession number: Q99315
Secondary accession number(s): D6VUP1, Q07096
Entry historyi
Integrated into UniProtKB/Swiss-Prot: March 6, 2007
Last sequence update: March 6, 2007
Last modified: November 30, 2016
This is version 124 of the entry and version 3 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programFungal Protein Annotation Program

Miscellaneousi

Miscellaneous

Retrotransposons are mobile genetic entities that are able to replicate via an RNA intermediate and a reverse transcription step. In contrast to retroviruses, retrotransposons are non-infectious, lack an envelope and remain intracellular. Ty3 retrotransposons belong to the gypsy-like elements (metaviridae).

Keywords - Technical termi

3D-structure, Complete proteome, Multifunctional enzyme, Reference proteome, Transposable element

Documents

  1. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  2. Peptidase families
    Classification of peptidase families and list of entries
  3. SIMILARITY comments
    Index of protein domains and families
  4. Yeast
    Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD
  5. Yeast chromosome VII
    Yeast (Saccharomyces cerevisiae) chromosome VII: entries and gene names

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.