Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Splicing factor, proline- and glutamine-rich

Gene

Sfpq

Organism
Mus musculus (Mouse)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

DNA- and RNA binding protein, involved in several nuclear processes. Essential pre-mRNA splicing factor required early in spliceosome formation and for splicing catalytic step II, probably as a heteromer with NONO. Binds to pre-mRNA in spliceosome C complex, and specifically binds to intronic polypyrimidine tracts. Involved in regulation of signal-induced alternative splicing. During splicing of PTPRC/CD45, a phosphorylated form is sequestered by THRAP3 from the pre-mRNA in resting T-cells; T-cell activation and subsequent reduced phosphorylation is proposed to lead to release from THRAP3 allowing binding to pre-mRNA splicing regulatotry elements which represses exon inclusion. Interacts with U5 snRNA, probably by binding to a purine-rich sequence located on the 3' side of U5 snRNA stem 1b. May be involved in a pre-mRNA coupled splicing and polyadenylation process as component of a snRNP-free complex with SNRPA/U1A. The SFPQ-NONO heteromer associated with MATR3 may play a role in nuclear retention of defective RNAs. SFPQ may be involved in homologous DNA pairing; in vitro, promotes the invasion of ssDNA between a duplex DNA and produces a D-loop formation. The SFPQ-NONO heteromer may be involved in DNA unwinding by modulating the function of topoisomerase I/TOP1; in vitro, stimulates dissociation of TOP1 from DNA after cleavage and enhances its jumping between separate DNA helices. The SFPQ-NONO heteromer may be involved in DNA non-homologous end joining (NHEJ) required for double-strand break repair and V(D)J recombination and may stabilize paired DNA ends; in vitro, the complex strongly stimulates DNA end joining, binds directly to the DNA substrates and cooperates with the Ku70/G22P1-Ku80/XRCC5 (Ku) dimer to establish a functional preligation complex. SFPQ is involved in transcriptional regulation. Transcriptional repression is mediated by an interaction of SFPQ with SIN3A and subsequent recruitment of histone deacetylases (HDACs). The SFPQ-NONO-NR5A1 complex binds to the CYP17 promoter and regulates basal and cAMP-dependent transcriptional avtivity. SFPQ isoform Long binds to the DNA binding domains (DBD) of nuclear hormone receptors, like RXRA and probably THRA, and acts as transcriptional corepressor in absence of hormone ligands. Binds the DNA sequence 5'-CTGAGTC-3' in the insulin-like growth factor response element (IGFRE) and inhibits IGF-I-stimulated transcriptional activity. Regulates the circadian clock by repressing the transcriptional activator activity of the CLOCK-ARNTL/BMAL1 heterodimer. Required for the transcriptional repression of circadian target genes, such as PER1, mediated by the large PER complex through histone deacetylation.2 Publications

GO - Molecular functioni

  • chromatin binding Source: BHF-UCL
  • core promoter binding Source: UniProtKB
  • histone deacetylase binding Source: MGI
  • nucleotide binding Source: InterPro
  • poly(A) RNA binding Source: MGI
  • RNA polymerase II distal enhancer sequence-specific DNA binding Source: BHF-UCL
  • transcription regulatory region DNA binding Source: MGI
  • transcription regulatory region sequence-specific DNA binding Source: UniProtKB

GO - Biological processi

  • alternative mRNA splicing, via spliceosome Source: MGI
  • cellular response to DNA damage stimulus Source: MGI
  • chromosome organization Source: MGI
  • double-strand break repair via homologous recombination Source: MGI
  • histone H3 deacetylation Source: UniProtKB
  • negative regulation of circadian rhythm Source: UniProtKB
  • negative regulation of transcription, DNA-templated Source: UniProtKB
  • negative regulation of transcription from RNA polymerase II promoter Source: MGI
  • positive regulation of oxidative stress-induced intrinsic apoptotic signaling pathway Source: MGI
  • positive regulation of sister chromatid cohesion Source: MGI
  • regulation of cell cycle Source: MGI
  • regulation of circadian rhythm Source: UniProtKB
  • rhythmic process Source: UniProtKB-KW
  • transcription, DNA-templated Source: UniProtKB-KW
Complete GO annotation...

Keywords - Molecular functioni

Activator, Repressor

Keywords - Biological processi

Biological rhythms, DNA damage, DNA repair, mRNA processing, mRNA splicing, Transcription, Transcription regulation

Keywords - Ligandi

DNA-binding, RNA-binding

Names & Taxonomyi

Protein namesi
Recommended name:
Splicing factor, proline- and glutamine-rich
Alternative name(s):
DNA-binding p52/p100 complex, 100 kDa subunit
Polypyrimidine tract-binding protein-associated-splicing factor
Short name:
PSF
Short name:
PTB-associated-splicing factor
Gene namesi
Name:Sfpq
Synonyms:Psf
OrganismiMus musculus (Mouse)
Taxonomic identifieri10090 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeMusMus
Proteomesi
  • UP000000589 Componenti: Chromosome 4

Organism-specific databases

MGIiMGI:1918764. Sfpq.

Subcellular locationi

  • Nucleus matrix 1 Publication
  • Cytoplasm By similarity

  • Note: Predominantly in nuclear matrix.By similarity

GO - Cellular componenti

  • chromatin Source: MGI
  • cytoplasm Source: UniProtKB-SubCell
  • nuclear matrix Source: MGI
  • nucleoplasm Source: MGI
  • nucleus Source: MGI
  • paraspeckles Source: MGI
  • RNA polymerase II transcription factor complex Source: BHF-UCL
Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Nucleus

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 699699Splicing factor, proline- and glutamine-richPRO_0000081910Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei7 – 71Omega-N-methylated arginineBy similarity
Modified residuei8 – 81Phosphoserine; by MKNK2By similarity
Modified residuei9 – 91Omega-N-methylated arginineBy similarity
Modified residuei19 – 191Omega-N-methylated arginineBy similarity
Modified residuei25 – 251Omega-N-methylated arginineBy similarity
Modified residuei33 – 331PhosphoserineBy similarity
Modified residuei200 – 2001N6-acetyllysineCombined sources
Modified residuei265 – 2651PhosphoserineBy similarity
Modified residuei275 – 2751Phosphoserine; by MKNK2By similarity
Modified residuei285 – 2851Phosphotyrosine; by ALKBy similarity
Modified residuei306 – 3061N6,N6-dimethyllysineBy similarity
Modified residuei311 – 3111N6-acetyllysineBy similarity
Modified residuei330 – 3301N6-acetyllysine; alternateBy similarity
Cross-linki330 – 330Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2); alternateBy similarity
Modified residuei371 – 3711PhosphoserineBy similarity
Modified residuei413 – 4131N6-acetyllysineBy similarity
Modified residuei464 – 4641N6-acetyllysineBy similarity
Modified residuei563 – 5631Dimethylated arginineBy similarity
Modified residuei618 – 6181PhosphoserineBy similarity
Modified residuei673 – 6731Omega-N-methylarginineBy similarity
Modified residuei679 – 6791PhosphothreonineCombined sources
Modified residuei685 – 6851Dimethylated arginineBy similarity

Post-translational modificationi

Phosphorylated on multiple serine and threonine residues during apoptosis (By similarity). Phosphorylation of C-terminal tyrosines promotes its cytoplasmic localization, impaired its binding to polypyrimidine RNA and led to cell cycle arrest (By similarity). In resting T-cells is phosphorylated at Thr-679 by GSK3B which is proposed to promote association with THRAP and to prevent binding to PTPRC/CD45 pre-mRNA; T-cell activation leads to reduced phosphorylation at Thr-679.By similarity

Keywords - PTMi

Acetylation, Isopeptide bond, Methylation, Phosphoprotein, Ubl conjugation

Proteomic databases

EPDiQ8VIJ6.
MaxQBiQ8VIJ6.
PaxDbiQ8VIJ6.
PRIDEiQ8VIJ6.

2D gel databases

REPRODUCTION-2DPAGEQ8VIJ6.

PTM databases

iPTMnetiQ8VIJ6.
PhosphoSiteiQ8VIJ6.
SwissPalmiQ8VIJ6.

Expressioni

Gene expression databases

BgeeiQ8VIJ6.
GenevisibleiQ8VIJ6. MM.

Interactioni

Subunit structurei

Monomer and component of the SFPQ-NONO complex, which is probably a heterotetramer of two 52 kDa (NONO) and two 100 kDa (SFPQ) subunits. SFPQ is a component of spliceosome and U5.4/6 snRNP complexes. Interacts with SNRPA/U1A. Component of a snRNP-free complex with SNRPA/U1A. Part of complex consisting of SFPQ, NONO and MATR3. Interacts with polypyrimidine tract-binding protein 1/PTB. Part of a complex consisting of SFPQ, NONO and NR5A1. Interacts with RXRA, probably THRA, and SIN3A. Interacts with TOP1. Part of a complex consisting of SFPQ, NONO and TOP1. Interacts with SNRNP70 in apoptotic cells. Interacts with PSPC1. Interacts with RNF43. Interacts with PITX3 and NR4A2/NURR1. Interacts with PTK6. Interacts with THRAP3; the interaction is dependent on SFPQ phosphorylation at 'Thr-679' and inhibits binding of SFPQ to a ESS1 exonic splicing silencer element-containing RNA. The large PER complex involved in the histone deacetylation is composed of at least HDAC1, PER2, SFPQ and SIN3A. Interacts with PER1 and PER2.4 Publications

GO - Molecular functioni

Protein-protein interaction databases

BioGridi214752. 13 interactions.
IntActiQ8VIJ6. 2 interactions.
MINTiMINT-4120373.
STRINGi10090.ENSMUSP00000030623.

Structurei

3D structure databases

ProteinModelPortaliQ8VIJ6.
SMRiQ8VIJ6. Positions 268-584.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Repeati9 – 1131
Repeati19 – 2132
Repeati25 – 2733
Domaini289 – 36173RRM 1PROSITE-ProRule annotationAdd
BLAST
Domaini363 – 44482RRM 2PROSITE-ProRule annotationAdd
BLAST

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni9 – 27193 X 3 AA repeats of R-G-GAdd
BLAST

Compositional bias

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Compositional biasi10 – 258249Gln/Glu/Pro-richAdd
BLAST
Compositional biasi10 – 156Poly-Gly
Compositional biasi20 – 278Poly-Gly
Compositional biasi54 – 6310Poly-Pro
Compositional biasi65 – 695Poly-Gln
Compositional biasi96 – 1005Poly-Pro
Compositional biasi158 – 1614Poly-Pro
Compositional biasi178 – 1825Poly-Pro
Compositional biasi563 – 5664Poly-Arg
Compositional biasi605 – 6084Poly-Gly
Compositional biasi627 – 6337Poly-Gly

Sequence similaritiesi

Contains 2 RRM (RNA recognition motif) domains.PROSITE-ProRule annotation

Keywords - Domaini

Repeat

Phylogenomic databases

eggNOGiKOG0115. Eukaryota.
ENOG410XQA0. LUCA.
GeneTreeiENSGT00390000005004.
HOGENOMiHOG000231095.
HOVERGENiHBG009801.
InParanoidiQ8VIJ6.
KOiK13219.
OMAiGMGLSQN.
OrthoDBiEOG7327P0.
PhylomeDBiQ8VIJ6.
TreeFamiTF315795.

Family and domain databases

Gene3Di3.30.70.330. 2 hits.
InterProiIPR012975. NOPS.
IPR012677. Nucleotide-bd_a/b_plait.
IPR000504. RRM_dom.
[Graphical view]
PfamiPF08075. NOPS. 1 hit.
PF00076. RRM_1. 2 hits.
[Graphical view]
SMARTiSM00360. RRM. 2 hits.
[Graphical view]
SUPFAMiSSF54928. SSF54928. 1 hit.
PROSITEiPS50102. RRM. 2 hits.
[Graphical view]

Sequencei

Sequence statusi: Complete.

Q8VIJ6-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MSRDRFRSRG GGGGGFHRRG GGGGRGGLHD FRSPPPGMGL NQNRGPMGPG
60 70 80 90 100
PGGPKPPLPP PPPHQQQQQP PPQQPPPQQP PPHQQPPPHQ PPHQQPPPPP
110 120 130 140 150
QESKPVVPQG PGSAPGVSSA PPPAVSAPPA NPPTTGAPPG PGPTPTPPPA
160 170 180 190 200
VPSTAPGPPP PSTPSSGVST TPPQTGGPPP PPAGGAGPGP KPGPGPGGPK
210 220 230 240 250
GGKMPGGPKP GGGPGMGAPG GHPKPPHRGG GEPRGGRQHH APYHQQHHQG
260 270 280 290 300
PPPGGPGPRT EEKISDSEGF KANLSLLRRP GEKTYTQRCR LFVGNLPADI
310 320 330 340 350
TEDEFKRLFA KYGEPGEVFI NKGKGFGFIK LESRALAEIA KAELDDTPMR
360 370 380 390 400
GRQLRVRFAT HAAALSVRNL SPYVSNELLE EAFSQFGPIE RAVVIVDDRG
410 420 430 440 450
RSTGKGIVEF ASKPAARKAF ERCSEGVFLL TTTPRPVIVE PLEQLDDEDG
460 470 480 490 500
LPEKLAQKNP MYQKERETPP RFAQHGTFEY EYSQRWKSLD EMEKQQREQV
510 520 530 540 550
EKNMKDAKDK LESEMEDAYH EHQANLLRQD LMRRQEELRR MEELHSQEMQ
560 570 580 590 600
KRKEMQLRQE EERRRREEEM MIRQREMEEQ MRRQREESYS RMGYMDPRER
610 620 630 640 650
DMRMGGGGTM NMGDPYGSGG QKFPPLGGGG GIGYEANPGV PPATMSGSMM
660 670 680 690
GSDMRTERFG QGGAGPVGGQ GPRGMGPGTP AGYGRGREEY EGPNKKPRF
Length:699
Mass (Da):75,442
Last modified:March 1, 2002 - v1
Checksum:i714F786264C63AA0
GO

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti47 – 471M → Q AA sequence (PubMed:11008015).Curated
Sequence conflicti546 – 5461S → N in AAG17365 (PubMed:11008015).Curated

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AY034062 mRNA. Translation: AAK60397.1.
AL606985 Genomic DNA. Translation: CAM15587.1.
BC089305 mRNA. Translation: AAH89305.1.
AF272847 mRNA. Translation: AAG17365.1.
CCDSiCCDS18662.1.
RefSeqiNP_076092.1. NM_023603.3.
UniGeneiMm.257276.
Mm.482296.

Genome annotation databases

EnsembliENSMUST00000030623; ENSMUSP00000030623; ENSMUSG00000028820.
GeneIDi71514.
KEGGimmu:71514.
UCSCiuc008utz.2. mouse.

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AY034062 mRNA. Translation: AAK60397.1.
AL606985 Genomic DNA. Translation: CAM15587.1.
BC089305 mRNA. Translation: AAH89305.1.
AF272847 mRNA. Translation: AAG17365.1.
CCDSiCCDS18662.1.
RefSeqiNP_076092.1. NM_023603.3.
UniGeneiMm.257276.
Mm.482296.

3D structure databases

ProteinModelPortaliQ8VIJ6.
SMRiQ8VIJ6. Positions 268-584.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi214752. 13 interactions.
IntActiQ8VIJ6. 2 interactions.
MINTiMINT-4120373.
STRINGi10090.ENSMUSP00000030623.

PTM databases

iPTMnetiQ8VIJ6.
PhosphoSiteiQ8VIJ6.
SwissPalmiQ8VIJ6.

2D gel databases

REPRODUCTION-2DPAGEQ8VIJ6.

Proteomic databases

EPDiQ8VIJ6.
MaxQBiQ8VIJ6.
PaxDbiQ8VIJ6.
PRIDEiQ8VIJ6.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENSMUST00000030623; ENSMUSP00000030623; ENSMUSG00000028820.
GeneIDi71514.
KEGGimmu:71514.
UCSCiuc008utz.2. mouse.

Organism-specific databases

CTDi6421.
MGIiMGI:1918764. Sfpq.

Phylogenomic databases

eggNOGiKOG0115. Eukaryota.
ENOG410XQA0. LUCA.
GeneTreeiENSGT00390000005004.
HOGENOMiHOG000231095.
HOVERGENiHBG009801.
InParanoidiQ8VIJ6.
KOiK13219.
OMAiGMGLSQN.
OrthoDBiEOG7327P0.
PhylomeDBiQ8VIJ6.
TreeFamiTF315795.

Miscellaneous databases

ChiTaRSiSfpq. mouse.
NextBioi333919.
PROiQ8VIJ6.
SOURCEiSearch...

Gene expression databases

BgeeiQ8VIJ6.
GenevisibleiQ8VIJ6. MM.

Family and domain databases

Gene3Di3.30.70.330. 2 hits.
InterProiIPR012975. NOPS.
IPR012677. Nucleotide-bd_a/b_plait.
IPR000504. RRM_dom.
[Graphical view]
PfamiPF08075. NOPS. 1 hit.
PF00076. RRM_1. 2 hits.
[Graphical view]
SMARTiSM00360. RRM. 2 hits.
[Graphical view]
SUPFAMiSSF54928. SSF54928. 1 hit.
PROSITEiPS50102. RRM. 2 hits.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Nuclear relocalization of the pre-mRNA splicing factor PSF during apoptosis involves hyperphosphorylation, masking of antigenic epitopes, and changes in protein interactions."
    Shav-Tal Y., Cohen M., Lapter S., Dye B., Patton J.G., Vandekerckhove J., Zipori D.
    Mol. Biol. Cell 12:2328-2340(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA].
    Tissue: Bone marrow.
  2. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
    Strain: C57BL/6J.
  3. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
    Strain: C57BL/6J.
    Tissue: Brain.
  4. "Enhanced proteolysis of pre-mRNA splicing factors in myeloid cells."
    Shav-Tal Y., Lee B., Bar-Haim S., Vandekerckhove J., Zipori D.
    Exp. Hematol. 28:1029-1038(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 198-580, PROTEIN SEQUENCE OF 20-30; 47-55 AND 210-238.
    Tissue: Bone marrow.
  5. Lubec G., Sunyer B., Chen W.-Q.
    Submitted (JAN-2009) to UniProtKB
    Cited for: PROTEIN SEQUENCE OF 291-306; 312-322; 358-368 AND 472-485, IDENTIFICATION BY MASS SPECTROMETRY.
    Strain: OF1.
    Tissue: Hippocampus.
  6. "Expression and functional significance of mouse paraspeckle protein 1 on spermatogenesis."
    Myojin R., Kuwahara S., Yasaki T., Matsunaga T., Sakurai T., Kimura M., Uesugi S., Kurihara Y.
    Biol. Reprod. 71:926-932(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH PSPC1.
  7. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT THR-679, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Liver.
  8. "Pitx3 potentiates Nurr1 in dopamine neuron terminal differentiation through release of SMRT-mediated repression."
    Jacobs F.M., van Erp S., van der Linden A.J., von Oerthel L., Burbach J.P., Smidt M.P.
    Development 136:531-540(2009) [PubMed] [Europe PMC] [Abstract]
    Cited for: INTERACTION WITH PITX3 AND NR4A2.
  9. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT THR-679, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Brain, Brown adipose tissue, Heart, Kidney, Liver, Lung, Pancreas, Spleen and Testis.
  10. "A molecular mechanism for circadian clock negative feedback."
    Duong H.A., Robles M.S., Knutti D., Weitz C.J.
    Science 332:1436-1439(2011) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION IN CIRCADIAN RHYTHMS, IDENTIFICATION IN A LARGE PER COMPLEX, SUBCELLULAR LOCATION.
  11. "Distinct roles of DBHS family members in the circadian transcriptional feedback loop."
    Kowalska E., Ripperger J.A., Muheim C., Maier B., Kurihara Y., Fox A.H., Kramer A., Brown S.A.
    Mol. Cell. Biol. 32:4585-4594(2012) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, INTERACTION WITH PER1 AND PER2.
  12. "SIRT5-mediated lysine desuccinylation impacts diverse metabolic pathways."
    Park J., Chen Y., Tishkoff D.X., Peng C., Tan M., Dai L., Xie Z., Zhang Y., Zwaans B.M., Skinner M.E., Lombard D.B., Zhao Y.
    Mol. Cell 50:919-930(2013) [PubMed] [Europe PMC] [Abstract]
    Cited for: ACETYLATION [LARGE SCALE ANALYSIS] AT LYS-200, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Embryonic fibroblast.

Entry informationi

Entry nameiSFPQ_MOUSE
AccessioniPrimary (citable) accession number: Q8VIJ6
Secondary accession number(s): A2A7U6, Q9ERW2
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 5, 2005
Last sequence update: March 1, 2002
Last modified: April 13, 2016
This is version 140 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

Complete proteome, Direct protein sequencing, Reference proteome

Documents

  1. MGD cross-references
    Mouse Genome Database (MGD) cross-references in UniProtKB/Swiss-Prot
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.