Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

CCAAT/enhancer-binding protein beta

Gene

Cebpb

Organism
Rattus norvegicus (Rat)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Important transcription factor regulating the expression of genes involved in immune and inflammatory responses (PubMed:8336793). Plays also a significant role in adipogenesis, as well as in the gluconeogenic pathway, liver regeneration, and hematopoiesis (PubMed:10635333). The consensus recognition site is 5'-T[TG]NNGNAA[TG]-3'. Its functional capacity is governed by protein interactions and post-translational protein modifications. During early embryogenesis, plays essential and redundant functions with CEBPA (By similarity). Has a promitotic effect on many cell types such as hepatocytes and adipocytes but has an antiproliferative effect on T-cells by repressing MYC expression, facilitating differentiation along the T-helper 2 lineage (PubMed:10635333). Binds to regulatory regions of several acute-phase and cytokines genes and plays a role in the regulation of acute-phase reaction and inflammation. Plays also a role in intracellular bacteria killing (By similarity). During adipogenesis, is rapidly expressed and, after activation by phosphorylation, induces CEBPA and PPARG, which turn on the series of adipocyte genes that give rise to the adipocyte phenotype. The delayed transactivation of the CEBPA and PPARG genes by CEBPB appears necessary to allow mitotic clonal expansion and thereby progression of terminal differentiation (By similarity). Essential for female reproduction because of a critical role in ovarian follicle development (By similarity). Restricts osteoclastogenesis (By similarity).By similarity3 Publications
Isoform 2: Essential for gene expression induction in activated macrophages. Plays a major role in immune responses such as CD4+ T-cell response, granuloma formation and endotoxin shock. Not essential for intracellular bacteria killing.By similarity
Isoform 3: Acts as a dominant negative through heterodimerization with isoform 2 (PubMed:1934061). Promotes osteoblast differentiation and osteoclastogenesis (By similarity).By similarity1 Publication

GO - Molecular functioni

  • chromatin binding Source: Ensembl
  • DNA binding Source: UniProtKB
  • glucocorticoid receptor binding Source: RGD
  • protein heterodimerization activity Source: UniProtKB
  • protein homodimerization activity Source: UniProtKB
  • RNA polymerase II core promoter proximal region sequence-specific DNA binding Source: NTNU_SB
  • sequence-specific DNA binding Source: RGD
  • transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding Source: NTNU_SB
  • transcription factor activity, RNA polymerase II distal enhancer sequence-specific binding Source: UniProtKB
  • transcription factor binding Source: ParkinsonsUK-UCL

GO - Biological processi

Complete GO annotation...

Keywords - Molecular functioni

Activator

Keywords - Biological processi

Differentiation, Transcription, Transcription regulation

Keywords - Ligandi

DNA-binding

Enzyme and pathway databases

ReactomeiR-RNO-2559582. Senescence-Associated Secretory Phenotype (SASP).

Names & Taxonomyi

Protein namesi
Recommended name:
CCAAT/enhancer-binding protein betaImported
Short name:
C/EBP betaImported
Alternative name(s):
C/EBP-related protein 2
Interleukin-6-dependent-binding protein
Short name:
IL-6DBP
Liver-enriched inhibitory protein
Short name:
LIP
Liver-enriched transcriptional activator
Short name:
LAP
Silencer factor B
Short name:
SF-B
Gene namesi
Name:CebpbImported
Synonyms:Crp2, Nf-il61 Publication, Sfb
OrganismiRattus norvegicus (Rat)
Taxonomic identifieri10116 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeRattus
Proteomesi
  • UP000002494 Componenti: Chromosome 3

Organism-specific databases

RGDi2327. Cebpb.

Subcellular locationi

  • Nucleus By similarity
  • Cytoplasm By similarity

  • Note: Translocates to the nucleus when phosphorylated at Ser-288. In T-cells when sumoylated drawn to pericentric heterochromatin thereby allowing proliferation (By similarity).By similarity

GO - Cellular componenti

  • CHOP-C/EBP complex Source: ParkinsonsUK-UCL
  • condensed chromosome, centromeric region Source: Ensembl
  • cytoplasm Source: UniProtKB-SubCell
  • nuclear chromatin Source: Ensembl
  • nuclear matrix Source: RGD
  • nucleoplasm Source: Reactome
  • nucleus Source: ParkinsonsUK-UCL
Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Nucleus

Pathology & Biotechi

Mutagenesis

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Mutagenesisi105 – 1051S → A: No effect on DNA-binding. Loss of transactivation activity. Loss of hepatocyte proliferation induction by TGFA. 2 Publications
Mutagenesisi105 – 1051S → D: No effect on DNA-binding. Increases transactivation activity. 1 Publication

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 297297CCAAT/enhancer-binding protein betaPRO_0000076619Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei3 – 31Omega-N-methylated arginine; by CARM1By similarity
Modified residuei39 – 391N6-acetyllysineBy similarity
Modified residuei99 – 991N6-acetyllysine; by KAT2A and KAT2BBy similarity
Modified residuei102 – 1021N6-acetyllysine; by KAT2A and KAT2BBy similarity
Modified residuei103 – 1031N6-acetyllysine; by KAT2A and KAT2B; alternateBy similarity
Cross-linki103 – 103Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2); alternateBy similarity
Modified residuei105 – 1051Phosphoserine; by RPS6KA1 and PKC/PRKCA2 Publications
Cross-linki134 – 134Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO); alternateBy similarity
Cross-linki134 – 134Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2); alternateBy similarity
Modified residuei180 – 1801Phosphothreonine; by GSK3-betaBy similarity
Glycosylationi181 – 1811O-linked (GlcNAc)By similarity
Glycosylationi182 – 1821O-linked (GlcNAc)By similarity
Modified residuei185 – 1851Phosphoserine; by GSK3-betaBy similarity
Modified residuei189 – 1891Phosphothreonine; by RPS6KA1, CDK2 and MAPKBy similarity
Cross-linki212 – 212Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)By similarity
Modified residuei240 – 2401Phosphoserine; by PKC/PRKCABy similarity
Modified residuei277 – 2771Phosphoserine; by CaMK2By similarity

Post-translational modificationi

Phosphorylated at Thr-189 by MAPK and CDK2, serves to prime phosphorylation at Thr-180 and Ser-185 by GSK3B and acquire DNA-binding as well as transactivation activities, required to induce adipogenesis. MAPK and CDK2 act sequentially to maintain Thr-189 in the primed phosphorylated state during mitotical cloning expansion and thereby progression of terminal differentiation (By similarity). Phosphorylation at Ser-105 enhances transactivation activity (PubMed:8336793). Phosphorylation at Ser-277 in response to calcium increases transactivation activity. Phosphorylated at Thr-189 by RPS6KA1 (By similarity).By similarity1 Publication
Methylated. Methylation at Arg-3 by CARM1 and at Lys-39 by EHMT2 inhibit transactivation activity. Methylation is probably inhibited by phosphorylation at Thr-189.By similarity
Sumoylated by polymeric chains of SUMO2 or SUMO3 (By similarity). Sumoylation at Lys-134 is required for inhibition of T-cells proliferation. In adipocytes, sumoylation at Lys-134 by PIAS1 leads to ubiquitination and subsequent proteasomal degradation. Desumoylated by SENP2, which abolishes ubiquitination and stabilizes protein levels (By similarity).By similarity
Ubiquitinated, leading to proteasomal degradation.By similarity
O-glycosylated, glycosylation at Ser-181 and Ser-182 prevents phosphorylation on Thr-189, Ser-185 and Thr-180 and DNA binding activity which delays the adipocyte differentiation program.By similarity
Acetylated. Acetylation at Lys-39 is an important and dynamic regulatory event that contributes to its ability to transactivate target genes, including those associated with adipogenesis and adipocyte function. Deacetylation by HDAC1 represses its transactivation activity. Acetylated by KAT2A and KAT2B within a cluster of lysine residues between amino acids 99-103, this acetylation is strongly induced by glucocorticoid treatment and enhances transactivation activity.By similarity

Keywords - PTMi

Acetylation, Glycoprotein, Isopeptide bond, Methylation, Phosphoprotein, Ubl conjugation

Proteomic databases

PaxDbiP21272.
PRIDEiP21272.

PTM databases

iPTMnetiP21272.
PhosphoSiteiP21272.

Expressioni

Tissue specificityi

Liver and lung.

Gene expression databases

GenevisibleiP21272. RN.

Interactioni

Subunit structurei

Binds DNA as a homodimer and as a heterodimer (PubMed:1934061). Interacts with MYB; within the complex, MYB and CEBPB bind to different promoter regions. Interacts with ATF4. Binds DNA as a heterodimer with ATF4 (By similarity). Can form stable heterodimers with CEBPA, CEBPD, CEBPE and CEBPG (PubMed:1377818, PubMed:1884998). Isoform 2 and isoform 3 also form heterodimers (PubMed:1934061). Interacts with TRIM28 and PTGES2. Interacts with PRDM16. Interacts with CCDC85B. Forms a complex with THOC5. Interacts with ZNF638; this interaction increases transcriptional activation. Interacts with CIDEA and CIDEC; these interactions increase transcriptional activation of a subset of CEBPB downstream target genes. Interacts with DDIT3/CHOP.Interacts with EP300; recruits EP300 to chromatin. Interacts with RORA; the interaction disrupts interaction with EP300. Interacts (not methylated) with MED23, MED26, SMARCA2, SMARCB1 and SMARCC1 (By similarity). Interacts with KAT2A and KAT2B (By similarity). Interacts with ATF5; EP300 is required for ATF5 and CEBPB interaction and DNA binding (By similarity).By similarity3 Publications

GO - Molecular functioni

  • glucocorticoid receptor binding Source: RGD
  • protein heterodimerization activity Source: UniProtKB
  • protein homodimerization activity Source: UniProtKB
  • transcription factor binding Source: ParkinsonsUK-UCL

Protein-protein interaction databases

BioGridi246438. 53 interactions.
DIPiDIP-28139N.
IntActiP21272. 2 interactions.
MINTiMINT-146630.
STRINGi10116.ENSRNOP00000065222.

Structurei

3D structure databases

ProteinModelPortaliP21272.
SMRiP21272. Positions 240-286.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini223 – 28664bZIPPROSITE-ProRule annotationAdd
BLAST

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni1 – 2222Required for Lys-134 sumoylationBy similarityAdd
BLAST
Regioni22 – 10584Required for MYC transcriptional repressionBy similarityAdd
BLAST
Regioni227 – 24721Basic motifPROSITE-ProRule annotationAdd
BLAST
Regioni249 – 2568Leucine-zipperPROSITE-ProRule annotation

Compositional bias

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Compositional biasi121 – 13010Pro-rich
Compositional biasi171 – 19222Pro/Ser-richAdd
BLAST

Sequence similaritiesi

Belongs to the bZIP family. C/EBP subfamily.Curated
Contains 1 bZIP (basic-leucine zipper) domain.PROSITE-ProRule annotation

Phylogenomic databases

eggNOGiKOG3119. Eukaryota.
ENOG410YJ8G. LUCA.
GeneTreeiENSGT00530000063192.
HOGENOMiHOG000013112.
HOVERGENiHBG050879.
InParanoidiP21272.
KOiK10048.
PhylomeDBiP21272.
TreeFamiTF105008.

Family and domain databases

InterProiIPR004827. bZIP.
IPR031106. C/EBP.
IPR016468. C/EBP_chordates.
[Graphical view]
PANTHERiPTHR23334. PTHR23334. 1 hit.
PfamiPF07716. bZIP_2. 1 hit.
[Graphical view]
PIRSFiPIRSF005879. CCAAT/enhancer-binding. 1 hit.
SMARTiSM00338. BRLZ. 1 hit.
[Graphical view]
PROSITEiPS50217. BZIP. 1 hit.
[Graphical view]

Sequences (3)i

Sequence statusi: Complete.

This entry describes 3 isoformsi produced by alternative initiation. AlignAdd to basket

Isoform 1 (identifier: P21272-1) [UniParc]FASTAAdd to basket

Also known as: FL

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MHRLLAWDAA CLPPPPAAFR PMEVANFYYE PDCLAYGAKA ARAAPRAPAA
60 70 80 90 100
EPAIGEHERA IDFSPYLEPL APAAADFAAP APAHHDFLSD LFADDYGAKP
110 120 130 140 150
SKKPSDYGYV SLGRAGAKAA PPACFPPPPP AALKAEPGFE PADCKRADDA
160 170 180 190 200
PAMAAGFPFA LRAYLGYQAT PSGSSGSLST SSSSSPPGTP SPADAKAAPA
210 220 230 240 250
ACFAGPPAAP AKAKAKKAVD KLSDEYKMRR ERNNIAVRKS RDKAKMRNLE
260 270 280 290
TQHKVLELTA ENERLQKKVE QLSRELSTLR NLFKQLPEPL LASAGHC
Note: Not detected in rat liver.
Length:297
Mass (Da):31,503
Last modified:May 1, 1991 - v1
Checksum:iC2511FDB65527789
GO
Isoform 2 (identifier: P21272-2) [UniParc]FASTAAdd to basket

Also known as: LAP

The sequence of this isoform differs from the canonical sequence as follows:
     1-21: Missing.

Note: Major form in.
Show »
Length:276
Mass (Da):29,190
Checksum:i2A621B294E8E3652
GO
Isoform 3 (identifier: P21272-3) [UniParc]FASTAAdd to basket

Also known as: LIP

The sequence of this isoform differs from the canonical sequence as follows:
     1-152: Missing.

Show »
Length:145
Mass (Da):15,567
Checksum:iDD1E45FE483F5968
GO

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei1 – 152152Missing in isoform 3. CuratedVSP_053315Add
BLAST
Alternative sequencei1 – 2121Missing in isoform 2. CuratedVSP_053316Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M57235 mRNA. Translation: AAA19669.1.
X54626 Genomic DNA. Translation: CAA38443.1.
BC129071 mRNA. Translation: AAI29072.1.
X60769 mRNA. Translation: CAA43179.1.
AY056052 Genomic DNA. Translation: AAA40972.1.
PIRiA35914.
RefSeqiNP_001288644.1. NM_001301715.1. [P21272-2]
NP_001288649.1. NM_001301720.1. [P21272-3]
NP_077039.3. NM_024125.5. [P21272-1]
UniGeneiRn.6479.

Genome annotation databases

EnsembliENSRNOT00000083876; ENSRNOP00000071427; ENSRNOG00000057347. [P21272-1]
GeneIDi24253.
KEGGirno:24253.
UCSCiRGD:2327. rat. [P21272-1]

Keywords - Coding sequence diversityi

Alternative initiation

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M57235 mRNA. Translation: AAA19669.1.
X54626 Genomic DNA. Translation: CAA38443.1.
BC129071 mRNA. Translation: AAI29072.1.
X60769 mRNA. Translation: CAA43179.1.
AY056052 Genomic DNA. Translation: AAA40972.1.
PIRiA35914.
RefSeqiNP_001288644.1. NM_001301715.1. [P21272-2]
NP_001288649.1. NM_001301720.1. [P21272-3]
NP_077039.3. NM_024125.5. [P21272-1]
UniGeneiRn.6479.

3D structure databases

ProteinModelPortaliP21272.
SMRiP21272. Positions 240-286.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi246438. 53 interactions.
DIPiDIP-28139N.
IntActiP21272. 2 interactions.
MINTiMINT-146630.
STRINGi10116.ENSRNOP00000065222.

PTM databases

iPTMnetiP21272.
PhosphoSiteiP21272.

Proteomic databases

PaxDbiP21272.
PRIDEiP21272.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENSRNOT00000083876; ENSRNOP00000071427; ENSRNOG00000057347. [P21272-1]
GeneIDi24253.
KEGGirno:24253.
UCSCiRGD:2327. rat. [P21272-1]

Organism-specific databases

CTDi1051.
RGDi2327. Cebpb.

Phylogenomic databases

eggNOGiKOG3119. Eukaryota.
ENOG410YJ8G. LUCA.
GeneTreeiENSGT00530000063192.
HOGENOMiHOG000013112.
HOVERGENiHBG050879.
InParanoidiP21272.
KOiK10048.
PhylomeDBiP21272.
TreeFamiTF105008.

Enzyme and pathway databases

ReactomeiR-RNO-2559582. Senescence-Associated Secretory Phenotype (SASP).

Miscellaneous databases

NextBioi602773.
PROiP21272.

Gene expression databases

GenevisibleiP21272. RN.

Family and domain databases

InterProiIPR004827. bZIP.
IPR031106. C/EBP.
IPR016468. C/EBP_chordates.
[Graphical view]
PANTHERiPTHR23334. PTHR23334. 1 hit.
PfamiPF07716. bZIP_2. 1 hit.
[Graphical view]
PIRSFiPIRSF005879. CCAAT/enhancer-binding. 1 hit.
SMARTiSM00338. BRLZ. 1 hit.
[Graphical view]
PROSITEiPS50217. BZIP. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "IL-6DBP, a nuclear protein involved in interleukin-6 signal transduction, defines a new family of leucine zipper proteins related to C/EBP."
    Poli V., Mancini F.P., Cortese R.
    Cell 63:643-653(1990) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1).
  2. "LAP, a novel member of the C/EBP gene family, encodes a liver-enriched transcriptional activator protein."
    Descombes P., Chojkier M., Lichtsteiner S., Falvey E., Schibler U.
    Genes Dev. 4:1541-1551(1990) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA].
    Strain: Lewis.
    Tissue: Liver.
  3. "Molecular cloning of two C/EBP-related proteins that bind to the promoter and the enhancer of the alpha 1-fetoprotein gene. Further analysis of C/EBP beta and C/EBP gamma."
    Thomassin H., Hamel D., Bernier D., Guertin M., Belanger L.
    Nucleic Acids Res. 20:3091-3098(1992) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), SUBUNIT.
    Strain: Sprague-Dawley.
    Tissue: Liver.
  4. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
    Tissue: Placenta.
  5. "SF-B (Silencer Factor B) that binds to a negative element in glutathione transferase P gene is most likely identical to an inducible trans-activator LAP/IL6-DBP."
    Imagawa M., Osada S., Koyama Y., Suzuki T., Hirom P.C., Diccianni M.B., Morimura S., Muramatsu M.
    Submitted (JUL-1991) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 77-297 (ISOFORM 1).
    Tissue: Liver.
  6. "A family of C/EBP-related proteins capable of forming covalently linked leucine zipper dimers in vitro."
    Williams S.C., Cantwell C.A., Johnson P.F.
    Genes Dev. 5:1553-1567(1991) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 22-297, SUBUNIT, DNA-BINDING.
    Strain: Sprague-Dawley.
    Tissue: Adipose tissue, Liver and Lung.
  7. "A liver-enriched transcriptional activator protein, LAP, and a transcriptional inhibitory protein, LIP, are translated from the same mRNA."
    Descombes P., Schibler U.
    Cell 67:569-579(1991) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, ALTERNATIVE SPLICING (ISOFORMS 1; 2 AND 3), DNA-BINDING, DIMERIZATION, SUBUNIT.
  8. "Transactivation by NF-IL6/LAP is enhanced by phosphorylation of its activation domain."
    Trautwein C., Caelles C., van der Geer P., Hunter T., Karin M., Chojkier M.
    Nature 364:544-547(1993) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, PHOSPHORYLATION AT SER-105, MUTAGENESIS OF SER-105.
  9. "Phosphorylation of rat serine 105 or mouse threonine 217 in C/EBP beta is required for hepatocyte proliferation induced by TGF alpha."
    Buck M., Poli V., van der Geer P., Chojkier M., Hunter T.
    Mol. Cell 4:1087-1092(1999) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, PHOSPHORYLATION AT SER-105, MUTAGENESIS OF SER-105, TISSUE SPECIFICITY.
  10. "Quantitative maps of protein phosphorylation sites across 14 different rat organs and tissues."
    Lundby A., Secher A., Lage K., Nordsborg N.B., Dmytriyev A., Lundby C., Olsen J.V.
    Nat. Commun. 3:876-876(2012) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].

Entry informationi

Entry nameiCEBPB_RAT
AccessioniPrimary (citable) accession number: P21272
Secondary accession number(s): A2VD03
Entry historyi
Integrated into UniProtKB/Swiss-Prot: May 1, 1991
Last sequence update: May 1, 1991
Last modified: February 17, 2016
This is version 144 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.