Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

RecBCD enzyme subunit RecB

Gene

recB

Organism
Escherichia coli (strain K12)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

A helicase/nuclease that prepares dsDNA breaks (DSB) for recombinational DNA repair. Binds to DSBs and unwinds DNA via a rapid (>1 kb/second) and highly processive (>30 kb) ATP-dependent bidirectional helicase. Unwinds dsDNA until it encounters a Chi (crossover hotspot instigator, 5'-GCTGGTGG-3') sequence from the 3' direction. Cuts ssDNA a few nucleotides 3' to Chi site, by nicking one strand or switching the strand degraded (depending on the reaction conditions). The properties and activities of the enzyme are changed at Chi. The Chi-altered holoenzyme produces a long 3'-ssDNA overhang which facilitates RecA-binding to the ssDNA for homologous DNA recombination and repair. Holoenzyme degrades any linearized DNA that is unable to undergo homologous recombination (PubMed:4562392, PubMed:4552016, PubMed:123277). In the holoenzyme this subunit contributes ATPase, 3'-5' helicase, exonuclease activity and loads RecA onto ssDNA. The RecBC complex requires the RecD subunit for nuclease activity, but can translocate along ssDNA in both directions.22 Publications

Catalytic activityi

Exonucleolytic cleavage (in the presence of ATP) in either 5'- to 3'- or 3'- to 5'-direction to yield 5'-phosphooligonucleotides.UniRule annotation

Cofactori

Mg2+UniRule annotationCurated3 PublicationsNote: Magnesium is required for both helicase and nuclease activity; its relative concentration alters helicase speed and nuclease activity in a complicated fashion.UniRule annotationCurated3 Publications

Enzyme regulationi

After reacting with DNA bearing a Chi site the holoenzyme is disassembled and loses exonuclease activity, DNA unwinding and Chi-directed DNA cleavage; RecB remains complexed with ssDNA, which may prevent holoenzyme reassembly (PubMed:10197988). High levels of Mg2+ (13 mM MgCl2+) or incubation with DNase allow holoenyzme reassembly, suggesting it is DNA bound to RecB that prevents reassembly (PubMed:10197988).2 Publications

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Metal bindingi956Magnesium; via tele nitrogenCurated1
Metal bindingi1067MagnesiumCurated1
Active sitei1080For nuclease activity2 Publications1
Metal bindingi1080MagnesiumCurated1
Metal bindingi1081Magnesium; via carbonyl oxygenCurated1

Regions

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Nucleotide bindingi23 – 30ATP8
DNA bindingi252 – 2543
DNA bindingi511 – 5122
DNA bindingi560 – 5612
DNA bindingi7611

GO - Molecular functioni

  • ATP binding Source: EcoliWiki
  • ATP-dependent DNA helicase activity Source: EcoCyc
  • DNA binding Source: UniProtKB-HAMAP
  • endonuclease activity Source: EcoCyc
  • exodeoxyribonuclease V activity Source: EcoCyc
  • magnesium ion binding Source: UniProtKB-HAMAP

GO - Biological processi

  • clearance of foreign intracellular DNA Source: UniProtKB
  • DNA recombination Source: EcoCyc
  • double-strand break repair Source: EcoCyc
  • double-strand break repair via homologous recombination Source: UniProtKB-HAMAP
Complete GO annotation...

Keywords - Molecular functioni

Exonuclease, Helicase, Hydrolase, Nuclease

Keywords - Biological processi

DNA damage, DNA repair

Keywords - Ligandi

ATP-binding, DNA-binding, Magnesium, Metal-binding, Nucleotide-binding

Enzyme and pathway databases

BioCyciEcoCyc:EG10824-MONOMER.
ECOL316407:JW2788-MONOMER.
MetaCyc:EG10824-MONOMER.
BRENDAi3.1.11.5. 2026.

Names & Taxonomyi

Protein namesi
Recommended name:
RecBCD enzyme subunit RecBUniRule annotation (EC:3.1.11.5UniRule annotation)
Alternative name(s):
Exodeoxyribonuclease V 135 kDa polypeptide
Exodeoxyribonuclease V beta chain
Exonuclease V subunit RecBUniRule annotation
Short name:
ExoV subunit RecBUniRule annotation
Helicase/nuclease RecBCD subunit RecBUniRule annotation
Gene namesi
Name:recBUniRule annotation
Synonyms:ior, rorA
Ordered Locus Names:b2820, JW2788
OrganismiEscherichia coli (strain K12)
Taxonomic identifieri83333 [NCBI]
Taxonomic lineageiBacteriaProteobacteriaGammaproteobacteriaEnterobacteralesEnterobacteriaceaeEscherichia
Proteomesi
  • UP000000318 Componenti: Chromosome
  • UP000000625 Componenti: Chromosome

Organism-specific databases

EcoGeneiEG10824. recB.

Subcellular locationi

GO - Cellular componenti

  • exodeoxyribonuclease V complex Source: EcoCyc
Complete GO annotation...

Pathology & Biotechi

Disruption phenotypei

Decreased degradation of DNA with free ends that is unable to undergo homologous recombination, which can fortuitously lead to more efficient viral infection (PubMed:4562392, PubMed:123277). Cells are deficient in DNA recombination repair and have increased sensitivity to UV light. The cultures have many inviable cells.3 Publications

Mutagenesis

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Mutagenesisi29K → Q: Subunit loses ATPase and 3'-5' helicase activity, holoenzyme has 3-5 fold less helicase activity, 20-fold less processivity. 3 Publications1
Mutagenesisi803Y → H: Large decrease in recombination, loss of Chi hotspot activity, decreased RecB helicase rate, retains nuclease activity but not Chi-sequence specificity, does not load RecA. 1 Publication1
Mutagenesisi804V → E: Large decrease in recombination, loss of Chi hotspot activity, decreased RecB helicase rate, retains nuclease activity but not Chi-sequence specificity, does not load RecA. 1 Publication1
Mutagenesisi807T → I in recB-2109; absence of nuclease modification at Chi sites. 1 Publication1
Mutagenesisi1067D → A: Subunit loses nuclease activity. 1 Publication1
Mutagenesisi1080D → A: Loss of holoenzyme nuclease activity, retains full helicase activity, does not act at Chi, no loading of RecA on ssDNA and no recombinational repair. 3 Publications1

Chemistry databases

ChEMBLiCHEMBL2095232.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Initiator methionineiRemoved1 Publication
ChainiPRO_00001020462 – 1180RecBCD enzyme subunit RecBAdd BLAST1179

Proteomic databases

PaxDbiP08394.
PRIDEiP08394.

Interactioni

Subunit structurei

Heterotrimer of RecB, RecC and RecD. All subunits contribute to DNA-binding. The C-terminus interacts with RecA (PubMed:16483938). Interacts with YgbT (Cas1) (PubMed:21219465).8 Publications

Protein-protein interaction databases

BioGridi4262307. 585 interactors.
DIPiDIP-540N.
IntActiP08394. 19 interactors.
MINTiMINT-1224378.
STRINGi511145.b2820.

Structurei

Secondary structure

11180
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Helixi10 – 12Combined sources3
Beta strandi19 – 22Combined sources4
Helixi29 – 41Combined sources13
Beta strandi45 – 49Combined sources5
Helixi56 – 58Combined sources3
Beta strandi59 – 64Combined sources6
Helixi66 – 88Combined sources23
Helixi95 – 103Combined sources9
Helixi107 – 120Combined sources14
Helixi121 – 123Combined sources3
Beta strandi125 – 128Combined sources4
Helixi129 – 139Combined sources11
Helixi141 – 144Combined sources4
Helixi157 – 172Combined sources16
Helixi177 – 186Combined sources10
Helixi190 – 197Combined sources8
Turni198 – 201Combined sources4
Beta strandi202 – 204Combined sources3
Beta strandi207 – 210Combined sources4
Helixi218 – 234Combined sources17
Beta strandi249 – 251Combined sources3
Helixi255 – 263Combined sources9
Helixi283 – 287Combined sources5
Helixi312 – 316Combined sources5
Helixi324 – 346Combined sources23
Helixi351 – 363Combined sources13
Helixi367 – 377Combined sources11
Beta strandi379 – 383Combined sources5
Helixi386 – 388Combined sources3
Helixi391 – 401Combined sources11
Beta strandi408 – 413Combined sources6
Helixi415 – 417Combined sources3
Turni421 – 424Combined sources4
Helixi427 – 436Combined sources10
Beta strandi440 – 442Combined sources3
Helixi451 – 462Combined sources12
Beta strandi463 – 466Combined sources4
Helixi482 – 484Combined sources3
Beta strandi487 – 491Combined sources5
Beta strandi494 – 496Combined sources3
Beta strandi498 – 503Combined sources6
Helixi513 – 534Combined sources22
Beta strandi538 – 542Combined sources5
Beta strandi545 – 548Combined sources4
Helixi551 – 553Combined sources3
Beta strandi554 – 560Combined sources7
Helixi561 – 572Combined sources12
Turni573 – 575Combined sources3
Beta strandi578 – 580Combined sources3
Helixi587 – 589Combined sources3
Helixi592 – 603Combined sources12
Helixi609 – 617Combined sources9
Helixi619 – 621Combined sources3
Helixi625 – 633Combined sources9
Helixi635 – 655Combined sources21
Helixi657 – 667Combined sources11
Helixi670 – 676Combined sources7
Beta strandi677 – 679Combined sources3
Helixi680 – 698Combined sources19
Helixi704 – 716Combined sources13
Helixi732 – 734Combined sources3
Beta strandi735 – 740Combined sources6
Turni741 – 744Combined sources4
Beta strandi749 – 754Combined sources6
Turni755 – 758Combined sources4
Beta strandi767 – 769Combined sources3
Turni771 – 773Combined sources3
Beta strandi776 – 781Combined sources6
Helixi784 – 806Combined sources23
Beta strandi809 – 817Combined sources9
Helixi832 – 835Combined sources4
Helixi837 – 842Combined sources6
Helixi850 – 859Combined sources10
Beta strandi865 – 869Combined sources5
Beta strandi901 – 906Combined sources6
Beta strandi908 – 910Combined sources3
Helixi917 – 920Combined sources4
Helixi942 – 944Combined sources3
Helixi949 – 958Combined sources10
Beta strandi964 – 966Combined sources3
Helixi970 – 979Combined sources10
Helixi984 – 986Combined sources3
Helixi987 – 998Combined sources12
Beta strandi1002 – 1006Combined sources5
Helixi1009 – 1011Combined sources3
Helixi1014 – 1016Combined sources3
Beta strandi1017 – 1027Combined sources11
Helixi1033 – 1043Combined sources11
Beta strandi1059 – 1074Combined sources16
Beta strandi1079 – 1082Combined sources4
Helixi1090 – 1092Combined sources3
Helixi1095 – 1104Combined sources10
Turni1105 – 1107Combined sources3
Helixi1108 – 1125Combined sources18
Beta strandi1126 – 1128Combined sources3
Helixi1131 – 1134Combined sources4
Beta strandi1139 – 1145Combined sources7
Helixi1164 – 1173Combined sources10

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1W36X-ray3.10B/E1-1180[»]
3K70X-ray3.59B/E1-1180[»]
5LD2electron microscopy3.83B1-912[»]
B938-1180[»]
ProteinModelPortaliP08394.
SMRiP08394.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiP08394.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini2 – 450UvrD-like helicase ATP-bindingUniRule annotationAdd BLAST449
Domaini480 – 746UvrD-like helicase C-terminalUniRule annotationAdd BLAST267

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni2 – 927ATPase, DNA-binding and helicase activity, interacts with RecCAdd BLAST926
Regioni928 – 1180Nuclease activity, interacts with RecD and RecAAdd BLAST253

Domaini

The N-terminal DNA-binding domain (residues 1-929) is a ssDNA-dependent ATPase and has ATP-dependent 3'-5' helicase function; both are stimulated in the presence of RecC, suggesting this domain interacts with RecC. Holoenzyme reconstituted with this RecB N-terminal fragment has no nuclease activity (PubMed:9448271). The C-terminal domain (residues 928-1180) has weak ssDNA endonuclease activity as an isolated fragment (PubMed:9790841) (PubMed:10518611). RecD probably interacts with this domain. The C-terminal domain interacts with RecA, facilitating its loading onto ssDNA. Interaction is decreased by ATP (PubMed:16483938).4 Publications
The holoenzyme may undergo conformational shifts upon DNA binding: the nuclease domain of RecB may swing away from the DNA exit tunnel in RecC. When Chi DNA binds to the RecC tunnel, the nuclease domain may then swing back to its original position (that seen in crystal structures), allowing it to nick the DNA 3' of the Chi site and then rotate to load RecA. At high Mg2+ the nuclease domain may swing back more frequently, explaining differences seen in assays performed at high Mg2+.1 Publication

Sequence similaritiesi

Belongs to the helicase family. UvrD subfamily.UniRule annotation
Contains 1 uvrD-like helicase ATP-binding domain.UniRule annotation
Contains 1 uvrD-like helicase C-terminal domain.UniRule annotation

Phylogenomic databases

eggNOGiENOG4107QKA. Bacteria.
COG1074. LUCA.
HOGENOMiHOG000258330.
InParanoidiP08394.
KOiK03582.
OMAiIMIGDPK.
PhylomeDBiP08394.

Family and domain databases

Gene3Di3.40.50.300. 5 hits.
3.90.320.10. 1 hit.
HAMAPiMF_01485. RecB. 1 hit.
InterProiIPR014017. DNA_helicase_UvrD-like_C.
IPR000212. DNA_helicase_UvrD/REP.
IPR011604. Exonuc_phg/RecB_C.
IPR027417. P-loop_NTPase.
IPR004586. RecB.
IPR011335. Restrct_endonuc-II-like.
IPR014016. UvrD-like_ATP-bd.
[Graphical view]
PANTHERiPTHR11070. PTHR11070. 4 hits.
PfamiPF00580. UvrD-helicase. 1 hit.
PF13361. UvrD_C. 1 hit.
[Graphical view]
SUPFAMiSSF52540. SSF52540. 3 hits.
SSF52980. SSF52980. 1 hit.
TIGRFAMsiTIGR00609. recB. 1 hit.
PROSITEiPS51198. UVRD_HELICASE_ATP_BIND. 1 hit.
PS51217. UVRD_HELICASE_CTER. 1 hit.
[Graphical view]

Sequencei

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

P08394-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MSDVAETLDP LRLPLQGERL IEASAGTGKT FTIAALYLRL LLGLGGSAAF
60 70 80 90 100
PRPLTVEELL VVTFTEAATA ELRGRIRSNI HELRIACLRE TTDNPLYERL
110 120 130 140 150
LEEIDDKAQA AQWLLLAERQ MDEAAVFTIH GFCQRMLNLN AFESGMLFEQ
160 170 180 190 200
QLIEDESLLR YQACADFWRR HCYPLPREIA QVVFETWKGP QALLRDINRY
210 220 230 240 250
LQGEAPVIKA PPPDDETLAS RHAQIVARID TVKQQWRDAV GELDALIESS
260 270 280 290 300
GIDRRKFNRS NQAKWIDKIS AWAEEETNSY QLPESLEKFS QRFLEDRTKA
310 320 330 340 350
GGETPRHPLF EAIDQLLAEP LSIRDLVITR ALAEIRETVA REKRRRGELG
360 370 380 390 400
FDDMLSRLDS ALRSESGEVL AAAIRTRFPV AMIDEFQDTD PQQYRIFRRI
410 420 430 440 450
WHHQPETALL LIGDPKQAIY AFRGADIFTY MKARSEVHAH YTLDTNWRSA
460 470 480 490 500
PGMVNSVNKL FSQTDDAFMF REIPFIPVKS AGKNQALRFV FKGETQPAMK
510 520 530 540 550
MWLMEGESCG VGDYQSTMAQ VCAAQIRDWL QAGQRGEALL MNGDDARPVR
560 570 580 590 600
ASDISVLVRS RQEAAQVRDA LTLLEIPSVY LSNRDSVFET LEAQEMLWLL
610 620 630 640 650
QAVMTPEREN TLRSALATSM MGLNALDIET LNNDEHAWDV VVEEFDGYRQ
660 670 680 690 700
IWRKRGVMPM LRALMSARNI AENLLATAGG ERRLTDILHI SELLQEAGTQ
710 720 730 740 750
LESEHALVRW LSQHILEPDS NASSQQMRLE SDKHLVQIVT IHKSKGLEYP
760 770 780 790 800
LVWLPFITNF RVQEQAFYHD RHSFEAVLDL NAAPESVDLA EAERLAEDLR
810 820 830 840 850
LLYVALTRSV WHCSLGVAPL VRRRGDKKGD TDVHQSALGR LLQKGEPQDA
860 870 880 890 900
AGLRTCIEAL CDDDIAWQTA QTGDNQPWQV NDVSTAELNA KTLQRLPGDN
910 920 930 940 950
WRVTSYSGLQ QRGHGIAQDL MPRLDVDAAG VASVVEEPTL TPHQFPRGAS
960 970 980 990 1000
PGTFLHSLFE DLDFTQPVDP NWVREKLELG GFESQWEPVL TEWITAVLQA
1010 1020 1030 1040 1050
PLNETGVSLS QLSARNKQVE MEFYLPISEP LIASQLDTLI RQFDPLSAGC
1060 1070 1080 1090 1100
PPLEFMQVRG MLKGFIDLVF RHEGRYYLLD YKSNWLGEDS SAYTQQAMAA
1110 1120 1130 1140 1150
AMQAHRYDLQ YQLYTLALHR YLRHRIADYD YEHHFGGVIY LFLRGVDKEH
1160 1170 1180
PQQGIYTTRP NAGLIALMDE MFAGMTLEEA
Length:1,180
Mass (Da):133,959
Last modified:August 1, 1988 - v1
Checksum:iF9AC331808E8F281
GO

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
X04581 Genomic DNA. Translation: CAA28250.1.
AF179304 Genomic DNA. Translation: AAD56369.1.
U29581 Genomic DNA. Translation: AAB40467.1.
U00096 Genomic DNA. Translation: AAC75859.1.
AP009048 Genomic DNA. Translation: BAE76889.1.
X06227 Genomic DNA. Translation: CAA29577.1.
X04582 Genomic DNA. Translation: CAA28252.1.
PIRiA25532. NCECX5.
RefSeqiNP_417297.1. NC_000913.3.
WP_001285993.1. NZ_LN832404.1.

Genome annotation databases

EnsemblBacteriaiAAC75859; AAC75859; b2820.
BAE76889; BAE76889; BAE76889.
GeneIDi947286.
KEGGiecj:JW2788.
eco:b2820.
PATRICi32121058. VBIEscCol129921_2918.

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
X04581 Genomic DNA. Translation: CAA28250.1.
AF179304 Genomic DNA. Translation: AAD56369.1.
U29581 Genomic DNA. Translation: AAB40467.1.
U00096 Genomic DNA. Translation: AAC75859.1.
AP009048 Genomic DNA. Translation: BAE76889.1.
X06227 Genomic DNA. Translation: CAA29577.1.
X04582 Genomic DNA. Translation: CAA28252.1.
PIRiA25532. NCECX5.
RefSeqiNP_417297.1. NC_000913.3.
WP_001285993.1. NZ_LN832404.1.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1W36X-ray3.10B/E1-1180[»]
3K70X-ray3.59B/E1-1180[»]
5LD2electron microscopy3.83B1-912[»]
B938-1180[»]
ProteinModelPortaliP08394.
SMRiP08394.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi4262307. 585 interactors.
DIPiDIP-540N.
IntActiP08394. 19 interactors.
MINTiMINT-1224378.
STRINGi511145.b2820.

Chemistry databases

ChEMBLiCHEMBL2095232.

Proteomic databases

PaxDbiP08394.
PRIDEiP08394.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsemblBacteriaiAAC75859; AAC75859; b2820.
BAE76889; BAE76889; BAE76889.
GeneIDi947286.
KEGGiecj:JW2788.
eco:b2820.
PATRICi32121058. VBIEscCol129921_2918.

Organism-specific databases

EchoBASEiEB0817.
EcoGeneiEG10824. recB.

Phylogenomic databases

eggNOGiENOG4107QKA. Bacteria.
COG1074. LUCA.
HOGENOMiHOG000258330.
InParanoidiP08394.
KOiK03582.
OMAiIMIGDPK.
PhylomeDBiP08394.

Enzyme and pathway databases

BioCyciEcoCyc:EG10824-MONOMER.
ECOL316407:JW2788-MONOMER.
MetaCyc:EG10824-MONOMER.
BRENDAi3.1.11.5. 2026.

Miscellaneous databases

EvolutionaryTraceiP08394.
PROiP08394.

Family and domain databases

Gene3Di3.40.50.300. 5 hits.
3.90.320.10. 1 hit.
HAMAPiMF_01485. RecB. 1 hit.
InterProiIPR014017. DNA_helicase_UvrD-like_C.
IPR000212. DNA_helicase_UvrD/REP.
IPR011604. Exonuc_phg/RecB_C.
IPR027417. P-loop_NTPase.
IPR004586. RecB.
IPR011335. Restrct_endonuc-II-like.
IPR014016. UvrD-like_ATP-bd.
[Graphical view]
PANTHERiPTHR11070. PTHR11070. 4 hits.
PfamiPF00580. UvrD-helicase. 1 hit.
PF13361. UvrD_C. 1 hit.
[Graphical view]
SUPFAMiSSF52540. SSF52540. 3 hits.
SSF52980. SSF52980. 1 hit.
TIGRFAMsiTIGR00609. recB. 1 hit.
PROSITEiPS51198. UVRD_HELICASE_ATP_BIND. 1 hit.
PS51217. UVRD_HELICASE_CTER. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiRECB_ECOLI
AccessioniPrimary (citable) accession number: P08394
Secondary accession number(s): Q2MA17
Entry historyi
Integrated into UniProtKB/Swiss-Prot: August 1, 1988
Last sequence update: August 1, 1988
Last modified: November 30, 2016
This is version 149 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programProkaryotic Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Direct protein sequencing, Reference proteome

Documents

  1. Escherichia coli
    Escherichia coli (strain K12): entries and cross-references to EcoGene
  2. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  3. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.