Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Runt-related transcription factor 1

Gene

Runx1

Organism
Mus musculus (Mouse)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

CBF binds to the core site, 5'-PYGPYGGT-3', of a number of enhancers and promoters, including murine leukemia virus, polyomavirus enhancer, T-cell receptor enhancers, LCK, IL-3 and GM-CSF promoters. Essential for the development of normal hematopoiesis. Isoform 4 shows higher binding activities for target genes and binds TCR-beta-E2 and RAG-1 target site with threefold higher affinity than other isoforms. It is less effective in the context of neutrophil terminal differentiation. Acts synergistically with ELF4 to transactivate the IL-3 promoter and with ELF2 to transactivate the BLK promoter. Inhibits KAT6B-dependent transcriptional activation (By similarity). Controls the anergy and suppressive function of regulatory T-cells (Treg) by associating with FOXP3. Activates the expression of IL2 and IFNG and down-regulates the expression of TNFRSF18, IL2RA and CTLA4, in conventional T-cells (PubMed:17377532).By similarity1 Publication

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Binding sitei112Chloride 1PROSITE-ProRule annotation1
Binding sitei116Chloride 1; via amide nitrogenPROSITE-ProRule annotation1
Binding sitei139Chloride 2PROSITE-ProRule annotation1
Binding sitei170Chloride 2; via amide nitrogenPROSITE-ProRule annotation1

GO - Molecular functioni

GO - Biological processi

  • behavioral response to pain Source: MGI
  • cellular response to transforming growth factor beta stimulus Source: MGI
  • central nervous system development Source: MGI
  • chondrocyte differentiation Source: GO_Central
  • definitive hemopoiesis Source: MGI
  • embryonic hemopoiesis Source: MGI
  • hair follicle morphogenesis Source: MGI
  • hemopoiesis Source: UniProtKB
  • in utero embryonic development Source: MGI
  • liver development Source: MGI
  • myeloid progenitor cell differentiation Source: MGI
  • negative regulation of cell proliferation Source: MGI
  • negative regulation of transcription, DNA-templated Source: UniProtKB
  • neuron development Source: MGI
  • neuron differentiation Source: MGI
  • neuron fate commitment Source: MGI
  • ossification Source: GO_Central
  • positive regulation of angiogenesis Source: UniProtKB
  • positive regulation of cell maturation Source: MGI
  • positive regulation of granulocyte differentiation Source: UniProtKB
  • positive regulation of interferon-gamma production Source: UniProtKB
  • positive regulation of interleukin-2 production Source: UniProtKB
  • positive regulation of transcription, DNA-templated Source: UniProtKB
  • positive regulation of transcription from RNA polymerase II promoter Source: UniProtKB
  • regulation of hair follicle cell proliferation Source: MGI
  • regulation of signal transduction Source: MGI
  • regulation of T cell anergy Source: UniProtKB
  • regulation of transcription, DNA-templated Source: MGI
  • response to retinoic acid Source: BHF-UCL
  • skeletal system development Source: MGI
Complete GO annotation...

Keywords - Molecular functioni

Activator, Repressor

Keywords - Biological processi

Transcription, Transcription regulation

Keywords - Ligandi

DNA-binding

Names & Taxonomyi

Protein namesi
Recommended name:
Runt-related transcription factor 1
Alternative name(s):
Acute myeloid leukemia 1 protein
Core-binding factor subunit alpha-2
Short name:
CBF-alpha-2
Oncogene AML-1
Polyomavirus enhancer-binding protein 2 alpha B subunit
Short name:
PEA2-alpha B
Short name:
PEBP2-alpha B
SL3-3 enhancer factor 1 alpha B subunit
SL3/AKV core-binding factor alpha B subunit
Gene namesi
Name:Runx1
Synonyms:Aml1, Cbfa2, Pebp2ab
OrganismiMus musculus (Mouse)
Taxonomic identifieri10090 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeMusMus
Proteomesi
  • UP000000589 Componenti: Unplaced

Organism-specific databases

MGIiMGI:99852. Runx1.

Subcellular locationi

GO - Cellular componenti

  • basement membrane Source: MGI
  • cytoplasm Source: MGI
  • intracellular membrane-bounded organelle Source: MGI
  • nucleoplasm Source: MGI
  • nucleus Source: UniProtKB
  • protein complex Source: MGI
Complete GO annotation...

Keywords - Cellular componenti

Nucleus

Pathology & Biotechi

Involvement in diseasei

Mice with an Runx1 lacking the DNA-binding region are found to die between embryonic days 11.5 to 12.5 due to hemorrhaging in the central nervous system. This hemorrhaging is preceded by necrosis and hematopoiesis is blocked (PubMed:8622955).

Mutagenesis

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Mutagenesisi80R → A: Interferes with DNA-binding. 1 Publication1
Mutagenesisi109N → A: Interferes with heterodimerization. 1 Publication1
Mutagenesisi113Y → A: Interferes with heterodimerization. 1 Publication1
Mutagenesisi142R → A: Interferes with DNA-binding. 1 Publication1
Mutagenesisi144K → M: Interferes with DNA-binding. 1 Publication1
Mutagenesisi149T → A: Interferes with heterodimerization. 1 Publication1
Mutagenesisi170V → A: No effect. 1 Publication1
Mutagenesisi171D → A: Interferes with DNA-binding. 1 Publication1
Mutagenesisi174R → A: Interferes with DNA-binding. 1 Publication1
Mutagenesisi177R → A: Interferes with DNA-binding. 1 Publication1
Mutagenesisi249S → A: Reduced phosphorylation. 1 Publication1
Mutagenesisi276S → A: Reduced phosphorylation. 1 Publication1

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00001746561 – 451Runt-related transcription factor 1Add BLAST451

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Modified residuei14PhosphothreonineBy similarity1
Modified residuei21PhosphoserineBy similarity1
Modified residuei24N6-acetyllysineBy similarity1
Modified residuei43N6-acetyllysineBy similarity1
Modified residuei193PhosphoserineBy similarity1
Modified residuei212PhosphoserineBy similarity1
Modified residuei249Phosphoserine; by HIPK21 Publication1
Modified residuei266PhosphoserineBy similarity1
Modified residuei268PhosphoserineBy similarity1
Modified residuei273Phosphothreonine; by HIPK2By similarity1
Modified residuei276Phosphoserine; by HIPK21 Publication1
Modified residuei296PhosphothreonineBy similarity1
Modified residuei434PhosphoserineBy similarity1

Post-translational modificationi

Phosphorylated in its C-terminus upon IL-6 treatment. Phosphorylation enhances interaction with KAT6A (By similarity).By similarity
Methylated.By similarity
Phosphorylated in Ser-249 Thr-273 and Ser-276 by HIPK2 when associated with CBFB and DNA. This phosphorylation promotes subsequent EP300 phosphorylation.1 Publication

Keywords - PTMi

Acetylation, Methylation, Phosphoprotein

Proteomic databases

MaxQBiQ03347.
PaxDbiQ03347.
PeptideAtlasiQ03347.
PRIDEiQ03347.

PTM databases

iPTMnetiQ03347.
PhosphoSitePlusiQ03347.

Expressioni

Tissue specificityi

Isoform 4 is expressed at high levels in thymus, spleen and T-cell lines and at lower levels in myeloid cell lines and nonhematopoietic cells. Isoform 5 is expressed ubiquitously in lumbar vertebrae, brain, kidney, heart, muscle, ovary and osteoblast-like cell line MC3T3-E1.

Developmental stagei

Differentially expressed during hematopoietic differentiation. Isoform AML1-B is readily detectable in undifferentiated embryonic stem (es) cells and peak expression is seen on day 6 of differentiation, followed by a gradual decline thereafter. Isoform AML1-C is undetectable in undifferentiated es cells, but is gradually up-regulated along with differentiation and reaches its highest levels on day 8 and this expression is maintained through day 12. Isoform 5 is expressed at high levels at 6-8 dpc and then levels decrease on 12 dpc. Isoform 4 expression is high throughout T-cell development, declines with natural killer cell maturation, and appears to be transiently reduced and then restored during B-cell development.

Gene expression databases

CleanExiMM_RUNX1.

Interactioni

Subunit structurei

Heterodimer with CBFB. RUNX1 binds DNA as a monomer and through the Runt domain. DNA-binding is increased by heterodimerization. Interacts with TLE1 and ALYREF/THOC4. Interacts with HIPK2, ELF1, ELF2 and SPI1. Interacts via its Runt domain with the ELF4 N-terminal region. Interaction with ELF2 isoform 2 (NERF-1a) may act to repress RUNX1-mediated transactivation. Interacts with KAT6A and KAT6B. Interacts with SUV39H1, leading to abrogation of transactivating and DNA-binding properties of RUNX1 (By similarity). Interacts with YAP1. Interaction with CDK6 prevents myeloid differentiation, reducing its transcription transactivation activity (By similarity). Found in a complex with PRMT5, RUNX1 and CBFB. Interacts with FOXP3.By similarity5 Publications

Binary interactionsi

WithEntry#Exp.IntActNotes
Foxp3Q99JB65EBI-3863873,EBI-10956246
Tbx21Q9JKD83EBI-3863873,EBI-3863870

GO - Molecular functioni

  • repressing transcription factor binding Source: MGI

Protein-protein interaction databases

DIPiDIP-40723N.
IntActiQ03347. 4 interactors.
MINTiMINT-91306.
STRINGi10090.ENSMUSP00000023673.

Structurei

Secondary structure

1451
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Helixi51 – 57Combined sources7
Turni59 – 61Combined sources3
Beta strandi62 – 64Combined sources3
Beta strandi70 – 72Combined sources3
Beta strandi77 – 80Combined sources4
Beta strandi90 – 95Combined sources6
Beta strandi102 – 109Combined sources8
Beta strandi112 – 115Combined sources4
Beta strandi117 – 119Combined sources3
Beta strandi121 – 125Combined sources5
Beta strandi128 – 130Combined sources3
Beta strandi146 – 152Combined sources7
Beta strandi154 – 156Combined sources3
Beta strandi158 – 161Combined sources4
Beta strandi166 – 171Combined sources6
Helixi194 – 203Combined sources10
Helixi205 – 210Combined sources6

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1EANX-ray1.70A46-185[»]
1EAOX-ray1.40A/B46-185[»]
1EAQX-ray1.25A/B46-185[»]
1HJBX-ray3.00C/F60-182[»]
1HJCX-ray2.65A/D60-182[»]
1IO4X-ray3.00C60-182[»]
2J6WX-ray2.60A/B46-185[»]
3WTSX-ray2.35A/F60-263[»]
3WTTX-ray2.35A/F60-263[»]
3WTUX-ray2.70A/F60-263[»]
3WTVX-ray2.70A/F60-263[»]
3WTWX-ray2.90A/F60-263[»]
3WTXX-ray2.80A/F60-263[»]
3WTYX-ray2.70A/F60-263[»]
3WU1X-ray2.40A55-177[»]
4L0YX-ray2.50A1-242[»]
4L0ZX-ray2.70A1-242[»]
4L18X-ray2.30A/E48-214[»]
ProteinModelPortaliQ03347.
SMRiQ03347.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiQ03347.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini50 – 178RuntPROSITE-ProRule annotationAdd BLAST129

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni80 – 84Interaction with DNABy similarity5
Regioni135 – 143Interaction with DNABy similarity9
Regioni168 – 177Interaction with DNABy similarity10
Regioni291 – 370Interaction with KAT6ABy similarityAdd BLAST80
Regioni307 – 399Interaction with KAT6BBy similarityAdd BLAST93
Regioni361 – 401Interaction with FOXP31 PublicationAdd BLAST41

Compositional bias

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Compositional biasi187 – 451Pro/Ser/Thr-richAdd BLAST265

Domaini

A proline/serine/threonine rich region at the C-terminus is necessary for transcriptional activation of target genes.

Sequence similaritiesi

Contains 1 Runt domain.PROSITE-ProRule annotation

Phylogenomic databases

eggNOGiKOG3982. Eukaryota.
ENOG4111J4Y. LUCA.
HOGENOMiHOG000045616.
HOVERGENiHBG060268.
InParanoidiQ03347.
PhylomeDBiQ03347.

Family and domain databases

Gene3Di2.60.40.720. 1 hit.
4.10.770.10. 1 hit.
InterProiIPR000040. AML1_Runt.
IPR008967. p53-like_TF_DNA-bd.
IPR012346. p53/RUNT-type_TF_DNA-bd.
IPR013524. Runt_dom.
IPR027384. Runx_central_dom.
IPR013711. RunxI_C_dom.
IPR016554. TF_Runt-rel_RUNX.
[Graphical view]
PANTHERiPTHR11950. PTHR11950. 1 hit.
PfamiPF00853. Runt. 1 hit.
PF08504. RunxI. 1 hit.
[Graphical view]
PIRSFiPIRSF009374. TF_Runt-rel_RUNX. 1 hit.
PRINTSiPR00967. ONCOGENEAML1.
SUPFAMiSSF49417. SSF49417. 1 hit.
PROSITEiPS51062. RUNT. 1 hit.
[Graphical view]

Sequences (5)i

Sequence statusi: Complete.

This entry describes 5 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q03347-1) [UniParc]FASTAAdd to basket
Also known as: AML1-B, PEB2-alpha B1

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MRIPVDASTS RRFTPPSTAL SPGKMSEALP LGAPDGGPAL ASKLRSGDRS
60 70 80 90 100
MVEVLADHPG ELVRTDSPNF LCSVLPTHWR CNKTLPIAFK VVALGDVPDG
110 120 130 140 150
TLVTVMAGND ENYSAELRNA TAAMKNQVAR FNDLRFVGRS GRGKSFTLTI
160 170 180 190 200
TVFTNPPQVA TYHRAIKITV DGPREPRRHR QKLDDQTKPG SLSFSERLSE
210 220 230 240 250
LEQLRRTAMR VSPHHPAPTP NPRASLNHST AFNPQPQSQM QDARQIQPSP
260 270 280 290 300
PWSYDQSYQY LGSITSSSVH PATPISPGRA SGMTSLSAEL SSRLSTAPDL
310 320 330 340 350
TAFGDPRQFP TLPSISDPRM HYPGAFTYSP PVTSGIGIGM SAMSSASRYH
360 370 380 390 400
TYLPPPYPGS SQAQAGPFQT GSPSYHLYYG ASAGSYQFSM VGGERSPPRI
410 420 430 440 450
LPPCTNASTG AALLNPSLPS QSDVVETEGS HSNSPTNMPP ARLEEAVWRP

Y
Length:451
Mass (Da):48,610
Last modified:November 1, 1996 - v1
Checksum:i06B9E9BA01A6469C
GO
Isoform 2 (identifier: Q03347-2) [UniParc]FASTAAdd to basket
Also known as: PEB2-alpha B2

The sequence of this isoform differs from the canonical sequence as follows:
     178-178: R → N
     179-242: Missing.

Show »
Length:387
Mass (Da):41,299
Checksum:iCB505325F9D4ED81
GO
Isoform 3 (identifier: Q03347-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     92-182: VALGDVPDGT...PREPRRHRQK → LLPEEGGGRS...ASEVSKREFF
     183-451: Missing.

Show »
Length:163
Mass (Da):17,571
Checksum:iAE48F53C5FF9A096
GO
Isoform 4 (identifier: Q03347-4) [UniParc]FASTAAdd to basket
Also known as: AML1-C

The sequence of this isoform differs from the canonical sequence as follows:
     1-5: MRIPV → MASDSIFESFPSYPQCFMR

Show »
Length:465
Mass (Da):50,238
Checksum:i8C7E357BAA92E2E5
GO
Isoform 5 (identifier: Q03347-5) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     242-303: DARQIQPSPP...LSTAPDLTAF → KNPTEPTTLC...EYLYSEKCGC
     304-451: Missing.

Show »
Length:303
Mass (Da):33,485
Checksum:iED89962B59A0EC94
GO

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti37 – 38GP → A in CAA65976 (Ref. 3) Curated2

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0059301 – 5MRIPV → MASDSIFESFPSYPQCFMR in isoform 4. Curated5
Alternative sequenceiVSP_00593192 – 182VALGD…RHRQK → LLPEEGGGRSRWRSADGQSE PRGQRLRRLLKGAACSRSLW SFSLSLGWGGDAALPWRPSG GSASEVSKREFF in isoform 3. 1 PublicationAdd BLAST91
Alternative sequenceiVSP_005932178R → N in isoform 2. 1 Publication1
Alternative sequenceiVSP_005933179 – 242Missing in isoform 2. 1 PublicationAdd BLAST64
Alternative sequenceiVSP_005934183 – 451Missing in isoform 3. 1 PublicationAdd BLAST269
Alternative sequenceiVSP_005935242 – 303DARQI…DLTAF → KNPTEPTTLCLCWSPRRRKH RGCQAFLGALRELLKPRSIS WEPNEENAVPSAEYLYSEKC GC in isoform 5. CuratedAdd BLAST62
Alternative sequenceiVSP_005936304 – 451Missing in isoform 5. CuratedAdd BLAST148

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
D26532 mRNA. Translation: BAA05535.1.
D13802 mRNA. Translation: BAA02960.1.
X97306 mRNA. Translation: CAA65976.1.
AB046930 mRNA. Translation: BAB08105.1.
AF345649 mRNA. Translation: AAK29784.1.
AF193030 Genomic DNA. Translation: AAG32957.1.
CCDSiCCDS28339.1. [Q03347-2]
CCDS49916.1. [Q03347-1]
CCDS49917.1. [Q03347-4]
PIRiA56017.
UniGeneiMm.4081.
Mm.470227.

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
D26532 mRNA. Translation: BAA05535.1.
D13802 mRNA. Translation: BAA02960.1.
X97306 mRNA. Translation: CAA65976.1.
AB046930 mRNA. Translation: BAB08105.1.
AF345649 mRNA. Translation: AAK29784.1.
AF193030 Genomic DNA. Translation: AAG32957.1.
CCDSiCCDS28339.1. [Q03347-2]
CCDS49916.1. [Q03347-1]
CCDS49917.1. [Q03347-4]
PIRiA56017.
UniGeneiMm.4081.
Mm.470227.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1EANX-ray1.70A46-185[»]
1EAOX-ray1.40A/B46-185[»]
1EAQX-ray1.25A/B46-185[»]
1HJBX-ray3.00C/F60-182[»]
1HJCX-ray2.65A/D60-182[»]
1IO4X-ray3.00C60-182[»]
2J6WX-ray2.60A/B46-185[»]
3WTSX-ray2.35A/F60-263[»]
3WTTX-ray2.35A/F60-263[»]
3WTUX-ray2.70A/F60-263[»]
3WTVX-ray2.70A/F60-263[»]
3WTWX-ray2.90A/F60-263[»]
3WTXX-ray2.80A/F60-263[»]
3WTYX-ray2.70A/F60-263[»]
3WU1X-ray2.40A55-177[»]
4L0YX-ray2.50A1-242[»]
4L0ZX-ray2.70A1-242[»]
4L18X-ray2.30A/E48-214[»]
ProteinModelPortaliQ03347.
SMRiQ03347.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

DIPiDIP-40723N.
IntActiQ03347. 4 interactors.
MINTiMINT-91306.
STRINGi10090.ENSMUSP00000023673.

PTM databases

iPTMnetiQ03347.
PhosphoSitePlusiQ03347.

Proteomic databases

MaxQBiQ03347.
PaxDbiQ03347.
PeptideAtlasiQ03347.
PRIDEiQ03347.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Organism-specific databases

MGIiMGI:99852. Runx1.

Phylogenomic databases

eggNOGiKOG3982. Eukaryota.
ENOG4111J4Y. LUCA.
HOGENOMiHOG000045616.
HOVERGENiHBG060268.
InParanoidiQ03347.
PhylomeDBiQ03347.

Miscellaneous databases

EvolutionaryTraceiQ03347.
PROiQ03347.
SOURCEiSearch...

Gene expression databases

CleanExiMM_RUNX1.

Family and domain databases

Gene3Di2.60.40.720. 1 hit.
4.10.770.10. 1 hit.
InterProiIPR000040. AML1_Runt.
IPR008967. p53-like_TF_DNA-bd.
IPR012346. p53/RUNT-type_TF_DNA-bd.
IPR013524. Runt_dom.
IPR027384. Runx_central_dom.
IPR013711. RunxI_C_dom.
IPR016554. TF_Runt-rel_RUNX.
[Graphical view]
PANTHERiPTHR11950. PTHR11950. 1 hit.
PfamiPF00853. Runt. 1 hit.
PF08504. RunxI. 1 hit.
[Graphical view]
PIRSFiPIRSF009374. TF_Runt-rel_RUNX. 1 hit.
PRINTSiPR00967. ONCOGENEAML1.
SUPFAMiSSF49417. SSF49417. 1 hit.
PROSITEiPS51062. RUNT. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiRUNX1_MOUSE
AccessioniPrimary (citable) accession number: Q03347
Secondary accession number(s): O08598
, Q62049, Q9ESB9, Q9ET65
Entry historyi
Integrated into UniProtKB/Swiss-Prot: November 1, 1997
Last sequence update: November 1, 1996
Last modified: November 30, 2016
This is version 169 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Reference proteome

Documents

  1. MGD cross-references
    Mouse Genome Database (MGD) cross-references in UniProtKB/Swiss-Prot
  2. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  3. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.