Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Transcription termination/antitermination protein NusA

Gene

nusA

Organism
Escherichia coli (strain K12)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Participates in both transcription termination and antitermination. Involved in a variety of cellular and viral termination and antitermination processes, such as Rho-dependent transcriptional termination, intrinsic termination, and phage lambda N-mediated transcriptional antitermination. Also important for coordinating the cellular responses to DNA damage by coupling the processes of nucleotide excision repair and translesion synthesis to transcription.UniRule annotation9 Publications

GO - Molecular functioni

GO - Biological processi

  • DNA-templated transcription, termination Source: UniProtKB-HAMAP
  • transcription antitermination Source: EcoliWiki
Complete GO annotation...

Keywords - Biological processi

Stress response, Transcription, Transcription antitermination, Transcription regulation, Transcription termination

Keywords - Ligandi

RNA-binding

Enzyme and pathway databases

BioCyciEcoCyc:EG10665-MONOMER.
ECOL316407:JW3138-MONOMER.

Names & Taxonomyi

Protein namesi
Recommended name:
Transcription termination/antitermination protein NusAUniRule annotation
Alternative name(s):
N utilization substance protein A
Transcription termination/antitermination L factor
Gene namesi
Name:nusAUniRule annotation
Ordered Locus Names:b3169, JW3138
OrganismiEscherichia coli (strain K12)
Taxonomic identifieri83333 [NCBI]
Taxonomic lineageiBacteriaProteobacteriaGammaproteobacteriaEnterobacterialesEnterobacteriaceaeEscherichia
Proteomesi
  • UP000000318 Componenti: Chromosome
  • UP000000625 Componenti: Chromosome

Organism-specific databases

EcoGeneiEG10665. nusA.

Subcellular locationi

  • Cytoplasm UniRule annotation1 Publication

  • Note: Colocalizes with nucleoids.

GO - Cellular componenti

  • cytosol Source: EcoCyc
Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm

Pathology & Biotechi

Disruption phenotypei

Mutants are sensitive to DNA-damaging agents.1 Publication

Mutagenesis

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Mutagenesisi104 – 1041R → H in nusA10-1. 1 Publication
Mutagenesisi181 – 1811G → D in nusa11; inability to terminate transcription normally at termination sites. 1 Publication
Mutagenesisi183 – 1831L → R in nusA1; restricts lambda growth by preventing antitermination activity of lambda N protein. 1 Publication
Mutagenesisi212 – 2121E → K in nusA10-2. 1 Publication

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 495495Transcription termination/antitermination protein NusAPRO_0000181965Add
BLAST

Proteomic databases

EPDiP0AFF6.
PaxDbiP0AFF6.
PRIDEiP0AFF6.

2D gel databases

SWISS-2DPAGEP0AFF6.

Expressioni

Inductioni

In response to low temperature. Negatively autoregulated.2 Publications

Interactioni

Subunit structurei

Monomer. Binds directly to the core enzyme of the DNA-dependent RNA polymerase and to nascent RNA. Also interacts with the termination Rho factor and the phage lambda N protein.UniRule annotation5 Publications

Binary interactionsi

WithEntry#Exp.IntActNotes
dinBQ471553EBI-551571,EBI-1037359

Protein-protein interaction databases

BioGridi4261878. 15 interactions.
DIPiDIP-47857N.
IntActiP0AFF6. 42 interactions.
MINTiMINT-1220515.
STRINGi511145.b3169.

Structurei

Secondary structure

1
495
Legend: HelixTurnBeta strand
Show more details
Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Helixi3 – 1614Combined sources
Helixi20 – 3920Combined sources
Beta strandi45 – 506Combined sources
Turni51 – 544Combined sources
Beta strandi55 – 6511Combined sources
Turni71 – 733Combined sources
Beta strandi74 – 763Combined sources
Helixi77 – 848Combined sources
Beta strandi92 – 965Combined sources
Helixi102 – 1043Combined sources
Helixi107 – 12317Combined sources
Helixi354 – 36310Combined sources
Helixi367 – 3759Combined sources
Helixi381 – 3866Combined sources
Helixi389 – 3924Combined sources
Turni395 – 3973Combined sources
Helixi400 – 41718Combined sources
Helixi432 – 4354Combined sources
Helixi442 – 4498Combined sources
Turni450 – 4523Combined sources
Helixi456 – 4605Combined sources
Helixi464 – 4685Combined sources
Beta strandi470 – 4723Combined sources
Helixi475 – 48915Combined sources
Turni491 – 4933Combined sources

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
1U9LX-ray1.90A/B352-421[»]
1WCLNMR-A351-426[»]
1WCNNMR-A426-495[»]
2JZBNMR-B424-495[»]
2KWPNMR-A1-125[»]
ProteinModelPortaliP0AFF6.
SMRiP0AFF6. Positions 1-495.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiP0AFF6.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini135 – 20066S1 motifUniRule annotationAdd
BLAST
Domaini230 – 29364KH 1UniRule annotationAdd
BLAST
Domaini302 – 36867KH 2UniRule annotationAdd
BLAST
Repeati364 – 414511Add
BLAST
Repeati439 – 489512Add
BLAST

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni364 – 4891262 X 51 AA approximate repeatsAdd
BLAST

Domaini

The N-terminal region interacts with RNA polymerase. The central region is composed of 3 RNA binding domains, S1, KH 1 and KH 2. The C-terminal region contains 2 acidic repeats, AR1 and AR2, which bind to protein N from phage lambda during antitermination.3 Publications

Sequence similaritiesi

Belongs to the NusA family.UniRule annotation
Contains 2 KH domains.UniRule annotation
Contains 1 S1 motif domain.UniRule annotation

Keywords - Domaini

Repeat

Phylogenomic databases

eggNOGiENOG4105CHV. Bacteria.
COG0195. LUCA.
HOGENOMiHOG000006394.
InParanoidiP0AFF6.
KOiK02600.
OMAiRAMIVEQ.
OrthoDBiEOG6NSGHW.
PhylomeDBiP0AFF6.

Family and domain databases

Gene3Di2.40.50.140. 1 hit.
3.30.1480.10. 1 hit.
3.30.300.20. 2 hits.
HAMAPiMF_00945_B. NusA_B.
InterProiIPR010995. DNA_repair_Rad51/TF_NusA_a-hlx.
IPR015946. KH_dom-like_a/b.
IPR025249. KH_dom_NusA-like.
IPR004088. KH_dom_type_1.
IPR009019. KH_prok-type.
IPR012340. NA-bd_OB-fold.
IPR030842. NusA_bac.
IPR022967. S1_dom.
IPR003029. S1_domain.
IPR013735. TF_NusA_N.
IPR010214. Tscrpt_termin_fac_NusA_C_rpt.
IPR010213. Tscrpt_termination_fac_NusA.
[Graphical view]
PfamiPF13184. KH_5. 1 hit.
PF08529. NusA_N. 1 hit.
PF00575. S1. 1 hit.
[Graphical view]
SMARTiSM00316. S1. 1 hit.
[Graphical view]
SUPFAMiSSF47794. SSF47794. 2 hits.
SSF50249. SSF50249. 1 hit.
SSF54814. SSF54814. 2 hits.
SSF69705. SSF69705. 1 hit.
TIGRFAMsiTIGR01953. NusA. 1 hit.
TIGR01954. nusA_Cterm_rpt. 2 hits.
PROSITEiPS50084. KH_TYPE_1. 1 hit.
PS50126. S1. 1 hit.
[Graphical view]

Sequencei

Sequence statusi: Complete.

P0AFF6-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MNKEILAVVE AVSNEKALPR EKIFEALESA LATATKKKYE QEIDVRVQID
60 70 80 90 100
RKSGDFDTFR RWLVVDEVTQ PTKEITLEAA RYEDESLNLG DYVEDQIESV
110 120 130 140 150
TFDRITTQTA KQVIVQKVRE AERAMVVDQF REHEGEIITG VVKKVNRDNI
160 170 180 190 200
SLDLGNNAEA VILREDMLPR ENFRPGDRVR GVLYSVRPEA RGAQLFVTRS
210 220 230 240 250
KPEMLIELFR IEVPEIGEEV IEIKAAARDP GSRAKIAVKT NDKRIDPVGA
260 270 280 290 300
CVGMRGARVQ AVSTELGGER IDIVLWDDNP AQFVINAMAP ADVASIVVDE
310 320 330 340 350
DKHTMDIAVE AGNLAQAIGR NGQNVRLASQ LSGWELNVMT VDDLQAKHQA
360 370 380 390 400
EAHAAIDTFT KYLDIDEDFA TVLVEEGFST LEELAYVPMK ELLEIEGLDE
410 420 430 440 450
PTVEALRERA KNALATIAQA QEESLGDNKP ADDLLNLEGV DRDLAFKLAA
460 470 480 490
RGVCTLEDLA EQGIDDLADI EGLTDEKAGA LIMAARNICW FGDEA
Length:495
Mass (Da):54,871
Last modified:December 20, 2005 - v1
Checksum:i7D4DD019172FBAD0
GO

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
X00513 Genomic DNA. Translation: CAA25200.1. Sequence problems.
U18997 Genomic DNA. Translation: AAA57972.1.
U00096 Genomic DNA. Translation: AAC76203.1.
AP009048 Genomic DNA. Translation: BAE77215.1.
PIRiE65107. FJEC.
RefSeqiNP_417638.1. NC_000913.3.
WP_001031057.1. NZ_LN832404.1.

Genome annotation databases

EnsemblBacteriaiAAC76203; AAC76203; b3169.
BAE77215; BAE77215; BAE77215.
GeneIDi947682.
KEGGiecj:JW3138.
eco:b3169.
PATRICi32121756. VBIEscCol129921_3264.

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
X00513 Genomic DNA. Translation: CAA25200.1. Sequence problems.
U18997 Genomic DNA. Translation: AAA57972.1.
U00096 Genomic DNA. Translation: AAC76203.1.
AP009048 Genomic DNA. Translation: BAE77215.1.
PIRiE65107. FJEC.
RefSeqiNP_417638.1. NC_000913.3.
WP_001031057.1. NZ_LN832404.1.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
1U9LX-ray1.90A/B352-421[»]
1WCLNMR-A351-426[»]
1WCNNMR-A426-495[»]
2JZBNMR-B424-495[»]
2KWPNMR-A1-125[»]
ProteinModelPortaliP0AFF6.
SMRiP0AFF6. Positions 1-495.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi4261878. 15 interactions.
DIPiDIP-47857N.
IntActiP0AFF6. 42 interactions.
MINTiMINT-1220515.
STRINGi511145.b3169.

2D gel databases

SWISS-2DPAGEP0AFF6.

Proteomic databases

EPDiP0AFF6.
PaxDbiP0AFF6.
PRIDEiP0AFF6.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsemblBacteriaiAAC76203; AAC76203; b3169.
BAE77215; BAE77215; BAE77215.
GeneIDi947682.
KEGGiecj:JW3138.
eco:b3169.
PATRICi32121756. VBIEscCol129921_3264.

Organism-specific databases

EchoBASEiEB0659.
EcoGeneiEG10665. nusA.

Phylogenomic databases

eggNOGiENOG4105CHV. Bacteria.
COG0195. LUCA.
HOGENOMiHOG000006394.
InParanoidiP0AFF6.
KOiK02600.
OMAiRAMIVEQ.
OrthoDBiEOG6NSGHW.
PhylomeDBiP0AFF6.

Enzyme and pathway databases

BioCyciEcoCyc:EG10665-MONOMER.
ECOL316407:JW3138-MONOMER.

Miscellaneous databases

EvolutionaryTraceiP0AFF6.
PROiP0AFF6.

Family and domain databases

Gene3Di2.40.50.140. 1 hit.
3.30.1480.10. 1 hit.
3.30.300.20. 2 hits.
HAMAPiMF_00945_B. NusA_B.
InterProiIPR010995. DNA_repair_Rad51/TF_NusA_a-hlx.
IPR015946. KH_dom-like_a/b.
IPR025249. KH_dom_NusA-like.
IPR004088. KH_dom_type_1.
IPR009019. KH_prok-type.
IPR012340. NA-bd_OB-fold.
IPR030842. NusA_bac.
IPR022967. S1_dom.
IPR003029. S1_domain.
IPR013735. TF_NusA_N.
IPR010214. Tscrpt_termin_fac_NusA_C_rpt.
IPR010213. Tscrpt_termination_fac_NusA.
[Graphical view]
PfamiPF13184. KH_5. 1 hit.
PF08529. NusA_N. 1 hit.
PF00575. S1. 1 hit.
[Graphical view]
SMARTiSM00316. S1. 1 hit.
[Graphical view]
SUPFAMiSSF47794. SSF47794. 2 hits.
SSF50249. SSF50249. 1 hit.
SSF54814. SSF54814. 2 hits.
SSF69705. SSF69705. 1 hit.
TIGRFAMsiTIGR01953. NusA. 1 hit.
TIGR01954. nusA_Cterm_rpt. 2 hits.
PROSITEiPS50084. KH_TYPE_1. 1 hit.
PS50126. S1. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "The nucleotide sequence of the cloned nusA gene and its flanking region of Escherichia coli."
    Ishii S., Ihara M., Maekawa T., Nakamura Y., Uchida H., Imamoto F.
    Nucleic Acids Res. 12:3333-3342(1984) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA].
  2. "Revised sequence of the nusA gene of Escherichia coli and identification of nusA11 (ts) and nusA1 mutations which cause changes in a hydrophobic amino acid cluster."
    Saito M., Tsugawa A., Egawa K., Nakamura Y.
    Mol. Gen. Genet. 205:380-382(1986) [PubMed] [Europe PMC] [Abstract]
    Cited for: SEQUENCE REVISION.
  3. "Genetic interaction between the beta' subunit of RNA polymerase and the arginine-rich domain of Escherichia coli nusA protein."
    Ito K., Egawa K., Nakamura Y.
    J. Bacteriol. 173:1492-1501(1991) [PubMed] [Europe PMC] [Abstract]
    Cited for: SEQUENCE REVISION, PARTIAL PROTEIN SEQUENCE, INDUCTION, MUTAGENESIS OF ARG-104; GLY-181; LEU-183 AND GLU-212.
  4. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
    Strain: K12 / MG1655 / ATCC 47076.
  5. "Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110."
    Hayashi K., Morooka N., Yamamoto Y., Fujita K., Isono K., Choi S., Ohtsubo E., Baba T., Wanner B.L., Mori H., Horiuchi T.
    Mol. Syst. Biol. 2:E1-E5(2006) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
    Strain: K12 / W3110 / ATCC 27325 / DSM 5911.
  6. "Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12."
    Link A.J., Robison K., Church G.M.
    Electrophoresis 18:1259-1313(1997) [PubMed] [Europe PMC] [Abstract]
    Cited for: PROTEIN SEQUENCE OF 1-13.
    Strain: K12 / EMG2.
  7. "L factor that is required for beta-galactosidase synthesis is the nusA gene product involved in transcription termination."
    Greenblatt J., Li J., Adhya S., Friedman D.I., Baron L.S., Redfield B., Kung H.F., Weissbach H.
    Proc. Natl. Acad. Sci. U.S.A. 77:1991-1994(1980) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION AS L FACTOR, INTERACTION WITH N PROTEIN.
  8. "Interaction of the sigma factor and the nusA gene protein of E. coli with RNA polymerase in the initiation-termination cycle of transcription."
    Greenblatt J., Li J.
    Cell 24:421-428(1981) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, INTERACTION WITH RNA POLYMERASE.
  9. "Termination of transcription by nusA gene protein of Escherichia coli."
    Greenblatt J., McLimont M., Hanly S.
    Nature 292:215-220(1981) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
  10. "Amplification and isolation of Escherichia coli nusA protein and studies of its effects on in vitro RNA chain elongation."
    Schmidt M.C., Chamberlin M.J.
    Biochemistry 23:197-203(1984) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
    Strain: K12.
  11. "Binding of rho factor to Escherichia coli RNA polymerase mediated by nusA protein."
    Schmidt M.C., Chamberlin M.J.
    J. Biol. Chem. 259:15000-15002(1984) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH RHO.
  12. "Effect of NusA protein on expression of the nusA,infB operon in E. coli."
    Plumbridge J.A., Dondon J., Nakamura Y., Grunberg-Manago M.
    Nucleic Acids Res. 13:3371-3388(1985) [PubMed] [Europe PMC] [Abstract]
    Cited for: INDUCTION.
  13. "nusA protein of Escherichia coli is an efficient transcription termination factor for certain terminator sites."
    Schmidt M.C., Chamberlin M.J.
    J. Mol. Biol. 195:809-818(1987) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
    Strain: K12.
  14. "Escherichia coli sigma 70 and NusA proteins. I. Binding interactions with core RNA polymerase in solution and within the transcription complex."
    Gill S.C., Weitzel S.E., von Hippel P.H.
    J. Mol. Biol. 220:307-324(1991) [PubMed] [Europe PMC] [Abstract]
    Cited for: SUBUNIT, INTERACTION WITH RNA POLYMERASE.
  15. "NusA contacts nascent RNA in Escherichia coli transcription complexes."
    Liu K., Hanna M.M.
    J. Mol. Biol. 247:547-558(1995) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, INTERACTION WITH NASCENT RNA.
  16. "Escherichia coli proteome analysis using the gene-protein database."
    VanBogelen R.A., Abshire K.Z., Moldover B., Olson E.R., Neidhardt F.C.
    Electrophoresis 18:1243-1251(1997) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY 2D-GEL.
  17. "NusA is required for ribosomal antitermination and for modulation of the transcription elongation rate of both antiterminated RNA and mRNA."
    Vogel U., Jensen K.F.
    J. Biol. Chem. 272:12265-12271(1997) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
    Strain: K12 / MC4100 / ATCC 35695 / DSM 6574.
  18. "Control of intrinsic transcription termination by N and NusA: the basic mechanisms."
    Gusarov I., Nudler E.
    Cell 107:437-449(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
  19. "Visualizing the proteome of Escherichia coli: an efficient and versatile method for labeling chromosomal coding DNA sequences (CDSs) with fluorescent protein genes."
    Watt R.M., Wang J., Leong M., Kung H.F., Cheah K.S., Liu D., Danchin A., Huang J.D.
    Nucleic Acids Res. 35:E37-E37(2007) [PubMed] [Europe PMC] [Abstract]
    Cited for: SUBCELLULAR LOCATION.
  20. "Roles for the transcription elongation factor NusA in both DNA repair and damage tolerance pathways in Escherichia coli."
    Cohen S.E., Lewis C.A., Mooney R.A., Kohanski M.A., Collins J.J., Landick R., Walker G.C.
    Proc. Natl. Acad. Sci. U.S.A. 107:15517-15522(2010) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION IN DNA REPAIR, DISRUPTION PHENOTYPE.
  21. "The role of E. coli Nus-factors in transcription regulation and transcription:translation coupling: From structure to mechanism."
    Burmann B.M., Rosch P.
    Transcription 2:130-134(2011) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, DOMAIN.
  22. "Structural basis for the interaction of Escherichia coli NusA with protein N of phage lambda."
    Bonin I., Muhlberger R., Bourenkov G.P., Huber R., Bacher A., Richter G., Wahl M.C.
    Proc. Natl. Acad. Sci. U.S.A. 101:13762-13767(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: X-RAY CRYSTALLOGRAPHY (1.90 ANGSTROMS) OF 352-421, DOMAIN.
  23. "The E. coli NusA carboxy-terminal domains are structurally similar and show specific RNAP- and lambdaN interaction."
    Eisenmann A., Schwarz S., Prasch S., Schweimer K., Rosch P.
    Protein Sci. 14:2018-2029(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: STRUCTURE BY NMR OF 351-426, DOMAIN.
  24. "structural basis of transcription elongation control: the NusA-aCTD complex."
    Prasch S., Schweimer K., Roesch P.
    Submitted (JAN-2008) to the PDB data bank
    Cited for: STRUCTURE BY NMR OF 424-495.
  25. "Solution structure of the aminoterminal domain of E. coli NusA."
    Jurk M., Schweimer K., Roesch P.
    Submitted (APR-2010) to the PDB data bank
    Cited for: STRUCTURE BY NMR OF 1-125.

Entry informationi

Entry nameiNUSA_ECOLI
AccessioniPrimary (citable) accession number: P0AFF6
Secondary accession number(s): P03003, Q2M941
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 21, 1986
Last sequence update: December 20, 2005
Last modified: June 8, 2016
This is version 101 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programProkaryotic Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Direct protein sequencing, Reference proteome

Documents

  1. Escherichia coli
    Escherichia coli (strain K12): entries and cross-references to EcoGene
  2. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  3. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.