Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Polyribonucleotide 5'-hydroxyl-kinase Clp1

Gene

CLP1

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Polynucleotide kinase that can phosphorylate the 5'-hydroxyl groups of double-stranded RNA (dsRNA), single-stranded RNA (ssRNA), double-stranded DNA (dsDNA) and double-stranded DNA:RNA hybrids. dsRNA is phosphorylated more efficiently than dsDNA, and the RNA component of a DNA:RNA hybrid is phosphorylated more efficiently than the DNA component. Plays a key role in both tRNA splicing and mRNA 3'-end formation. Component of the tRNA splicing endonuclease complex: phosphorylates the 5'-terminus of the tRNA 3'-exon during tRNA splicing; this phosphorylation event is a prerequisite for the subsequent ligation of the two exon halves and the production of a mature tRNA (PubMed:24766809, PubMed:24766810). Its role in tRNA splicing and maturation is required for cerebellar development (PubMed:24766809, PubMed:24766810). Component of the pre-mRNA cleavage complex II (CF-II), which seems to be required for mRNA 3'-end formation. Also phosphorylates the 5'-terminus of exogenously introduced short interfering RNAs (siRNAs), which is a necessary prerequisite for their incorporation into the RNA-induced silencing complex (RISC). However, endogenous siRNAs and microRNAs (miRNAs) that are produced by the cleavage of dsRNA precursors by DICER1 already contain a 5'-phosphate group, so this protein may be dispensible for normal RNA-mediated gene silencing.4 Publications

Catalytic activityi

ATP + 5'-dephospho-DNA = ADP + 5'-phospho-DNA.UniRule annotation
ATP + 5'-dephospho-RNA = ADP + 5'-phospho-RNA.UniRule annotation

Cofactori

Mg2+UniRule annotation1 Publication, Mn2+UniRule annotation1 Publication, Ni2+UniRule annotation1 Publication

Sites

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Binding sitei22 – 221ATPUniRule annotation
Binding sitei62 – 621ATP; via carbonyl oxygenUniRule annotation

Regions

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Nucleotide bindingi124 – 1296ATPUniRule annotation

GO - Molecular functioni

  1. ATP binding Source: UniProtKB
  2. ATP-dependent polydeoxyribonucleotide 5'-hydroxyl-kinase activity Source: UniProtKB-EC
  3. ATP-dependent polyribonucleotide 5'-hydroxyl-kinase activity Source: UniProtKB
  4. polydeoxyribonucleotide kinase activity Source: UniProtKB

GO - Biological processi

  1. cerebellar cortex development Source: UniProtKB
  2. gene expression Source: Reactome
  3. mRNA 3'-end processing Source: UniProtKB
  4. mRNA cleavage Source: GO_Central
  5. mRNA polyadenylation Source: GO_Central
  6. mRNA splicing, via spliceosome Source: Reactome
  7. RNA splicing Source: Reactome
  8. siRNA loading onto RISC involved in RNA interference Source: UniProtKB
  9. targeting of mRNA for destruction involved in RNA interference Source: UniProtKB
  10. termination of RNA polymerase II transcription Source: Reactome
  11. transcription from RNA polymerase II promoter Source: Reactome
  12. tRNA splicing, via endonucleolytic cleavage and ligation Source: UniProtKB
Complete GO annotation...

Keywords - Molecular functioni

Kinase, Transferase

Keywords - Biological processi

mRNA processing, tRNA processing

Keywords - Ligandi

ATP-binding, Magnesium, Manganese, Nickel, Nucleotide-binding

Enzyme and pathway databases

ReactomeiREACT_1096. Processing of Intronless Pre-mRNAs.
REACT_1849. mRNA 3'-end processing.
REACT_387. Cleavage of Growing Transcript in the Termination Region.
REACT_467. mRNA Splicing - Major Pathway.

Names & Taxonomyi

Protein namesi
Recommended name:
Polyribonucleotide 5'-hydroxyl-kinase Clp1UniRule annotation (EC:2.7.1.78UniRule annotation)
Alternative name(s):
Polyadenylation factor Clp1UniRule annotation
Polynucleotide kinase Clp1UniRule annotation
Pre-mRNA cleavage complex II protein Clp1UniRule annotation
Gene namesi
Name:CLP1UniRule annotation
Synonyms:HEAB
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
ProteomesiUP000005640 Componenti: Chromosome 11

Organism-specific databases

HGNCiHGNC:16999. CLP1.

Subcellular locationi

  1. Nucleus UniRule annotation2 Publications

GO - Cellular componenti

  1. cytoplasm Source: HPA
  2. mRNA cleavage factor complex Source: UniProtKB-HAMAP
  3. nucleoplasm Source: HPA
  4. nucleus Source: UniProtKB
  5. tRNA-intron endonuclease complex Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Nucleus

Pathology & Biotechi

Involvement in diseasei

Pontocerebellar hypoplasia 10 (PCH10)2 Publications

The disease is caused by mutations affecting the gene represented in this entry. Neurodegeneration is due to defects in tRNA splicing (PubMed:24766809, PubMed:24766810).

Disease descriptionA form of pontocerebellar hypoplasia, a disorder characterized by structural defects of the pons and cerebellum, evident upon brain imaging. PCH10 features include cortical dysgenesis marked by a simplified gyral pattern, cortical atrophy, mild or focal cerebellar vermian volume loss, delayed myelination, progressive microcephaly, global growth and developmental delays, severe intellectual disabilities, and seizures refractory to treatment.

See also OMIM:615803
Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti140 – 1401R → H in PCH10; decreases kinase activity, impairs formation of the tRNA splicing endonuclease complex and impairs ability to mediate tRNA splicing and maturation. 2 Publications
VAR_070952

Mutagenesis

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Mutagenesisi127 – 1282KS → AA: Abrogates RNA kinase activity. Abrogates complementation of tRNA splicing activity in yeast; when associated with A-128. 2 Publications
Mutagenesisi127 – 1271K → A: Abrogates RNA kinase activity and tRNA splicing activity. 1 Publication
Mutagenesisi151 – 1511D → A: Abrogates complementation of tRNA splicing activity in yeast. 1 Publication

Keywords - Diseasei

Disease mutation, Neurodegeneration

Organism-specific databases

MIMi615803. phenotype.
PharmGKBiPA162382477.

Polymorphism and mutation databases

BioMutaiCLP1.
DMDMi13431366.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 425425Polyribonucleotide 5'-hydroxyl-kinase Clp1PRO_0000089863Add
BLAST

Proteomic databases

MaxQBiQ92989.
PaxDbiQ92989.
PRIDEiQ92989.

PTM databases

PhosphoSiteiQ92989.

Expressioni

Gene expression databases

BgeeiQ92989.
CleanExiHS_CLP1.
ExpressionAtlasiQ92989. baseline and differential.
GenevestigatoriQ92989.

Organism-specific databases

HPAiHPA057770.

Interactioni

Subunit structurei

Component of the tRNA splicing endonuclease complex, composed of CLP1, TSEN2, TSEN15, TSEN34 and TSEN54 (PubMed:24766809). Component of pre-mRNA cleavage complex II (CF-II). Also associates with numerous components of the pre-mRNA cleavage complex I (CF-I/CFIm), including NUDT21, CPSF2, CPSF3, CPSF6 and CPSF7. Interacts with CSTF2 and SYMPK.3 Publications

Protein-protein interaction databases

BioGridi116174. 16 interactions.
IntActiQ92989. 2 interactions.
MINTiMINT-3049820.
STRINGi9606.ENSP00000304704.

Structurei

3D structure databases

SMRiQ92989. Positions 14-423.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Sequence similaritiesi

Belongs to the Clp1 family. Clp1 subfamily.UniRule annotation

Phylogenomic databases

eggNOGiCOG5623.
GeneTreeiENSGT00390000000344.
HOGENOMiHOG000231935.
HOVERGENiHBG000921.
KOiK14399.
PhylomeDBiQ92989.
TreeFamiTF105795.

Family and domain databases

Gene3Di3.40.50.300. 2 hits.
HAMAPiMF_03035. Clp1.
InterProiIPR028606. Clp1.
IPR029007. MobB-typ_P-loop.
IPR027417. P-loop_NTPase.
IPR010655. Pre-mRNA_cleavage_cplxII_Clp1.
[Graphical view]
PfamiPF06807. Clp1. 1 hit.
PF03205. MobB. 1 hit.
[Graphical view]
SUPFAMiSSF52540. SSF52540. 3 hits.

Sequences (2)i

Sequence statusi: Complete.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q92989-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MGEEANDDKK PTTKFELERE TELRFEVEAS QSVQLELLTG MAEIFGTELT
60 70 80 90 100
RNKKFTFDAG AKVAVFTWHG CSVQLSGRTE VAYVSKDTPM LLYLNTHTAL
110 120 130 140 150
EQMRRQAEKE EERGPRVMVV GPTDVGKSTV CRLLLNYAVR LGRRPTYVEL
160 170 180 190 200
DVGQGSVSIP GTMGALYIER PADVEEGFSI QAPLVYHFGS TTPGTNIKLY
210 220 230 240 250
NKITSRLADV FNQRCEVNRR ASVSGCVINT CGWVKGSGYQ ALVHAASAFE
260 270 280 290 300
VDVVVVLDQE RLYNELKRDL PHFVRTVLLP KSGGVVERSK DFRRECRDER
310 320 330 340 350
IREYFYGFRG CFYPHAFNVK FSDVKIYKVG APTIPDSCLP LGMSQEDNQL
360 370 380 390 400
KLVPVTPGRD MVHHLLSVST AEGTEENLSE TSVAGFIVVT SVDLEHQVFT
410 420
VLSPAPRPLP KNFLLIMDIR FMDLK
Length:425
Mass (Da):47,646
Last modified:February 1, 1997 - v1
Checksum:i640AF4768994CFE6
GO
Isoform 2 (identifier: Q92989-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     139-202: Missing.

Show »
Length:361
Mass (Da):40,696
Checksum:i661F193326BCB711
GO

Natural variant

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti140 – 1401R → H in PCH10; decreases kinase activity, impairs formation of the tRNA splicing endonuclease complex and impairs ability to mediate tRNA splicing and maturation. 2 Publications
VAR_070952

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei139 – 20264Missing in isoform 2. 1 PublicationVSP_041164Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U73524 mRNA. Translation: AAC50780.1.
AK300232 mRNA. Translation: BAG62000.1.
AK313007 mRNA. Translation: BAG35843.1.
CH471076 Genomic DNA. Translation: EAW73770.1.
BC000446 mRNA. Translation: AAH00446.1.
CCDSiCCDS44600.1. [Q92989-2]
CCDS7964.1. [Q92989-1]
RefSeqiNP_001136069.1. NM_001142597.1. [Q92989-2]
NP_006822.1. NM_006831.2. [Q92989-1]
UniGeneiHs.523687.

Genome annotation databases

EnsembliENST00000302731; ENSP00000304704; ENSG00000172409. [Q92989-2]
ENST00000525602; ENSP00000436066; ENSG00000172409. [Q92989-1]
ENST00000533682; ENSP00000434995; ENSG00000172409. [Q92989-1]
GeneIDi10978.
KEGGihsa:10978.
UCSCiuc001nkw.3. human. [Q92989-1]
uc010rjw.2. human. [Q92989-2]

Polymorphism and mutation databases

BioMutaiCLP1.

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U73524 mRNA. Translation: AAC50780.1.
AK300232 mRNA. Translation: BAG62000.1.
AK313007 mRNA. Translation: BAG35843.1.
CH471076 Genomic DNA. Translation: EAW73770.1.
BC000446 mRNA. Translation: AAH00446.1.
CCDSiCCDS44600.1. [Q92989-2]
CCDS7964.1. [Q92989-1]
RefSeqiNP_001136069.1. NM_001142597.1. [Q92989-2]
NP_006822.1. NM_006831.2. [Q92989-1]
UniGeneiHs.523687.

3D structure databases

SMRiQ92989. Positions 14-423.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi116174. 16 interactions.
IntActiQ92989. 2 interactions.
MINTiMINT-3049820.
STRINGi9606.ENSP00000304704.

PTM databases

PhosphoSiteiQ92989.

Polymorphism and mutation databases

BioMutaiCLP1.
DMDMi13431366.

Proteomic databases

MaxQBiQ92989.
PaxDbiQ92989.
PRIDEiQ92989.

Protocols and materials databases

DNASUi10978.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000302731; ENSP00000304704; ENSG00000172409. [Q92989-2]
ENST00000525602; ENSP00000436066; ENSG00000172409. [Q92989-1]
ENST00000533682; ENSP00000434995; ENSG00000172409. [Q92989-1]
GeneIDi10978.
KEGGihsa:10978.
UCSCiuc001nkw.3. human. [Q92989-1]
uc010rjw.2. human. [Q92989-2]

Organism-specific databases

CTDi10978.
GeneCardsiGC11P057425.
HGNCiHGNC:16999. CLP1.
HPAiHPA057770.
MIMi608757. gene.
615803. phenotype.
neXtProtiNX_Q92989.
PharmGKBiPA162382477.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiCOG5623.
GeneTreeiENSGT00390000000344.
HOGENOMiHOG000231935.
HOVERGENiHBG000921.
KOiK14399.
PhylomeDBiQ92989.
TreeFamiTF105795.

Enzyme and pathway databases

ReactomeiREACT_1096. Processing of Intronless Pre-mRNAs.
REACT_1849. mRNA 3'-end processing.
REACT_387. Cleavage of Growing Transcript in the Termination Region.
REACT_467. mRNA Splicing - Major Pathway.

Miscellaneous databases

GenomeRNAii10978.
NextBioi41710.
PROiQ92989.
SOURCEiSearch...

Gene expression databases

BgeeiQ92989.
CleanExiHS_CLP1.
ExpressionAtlasiQ92989. baseline and differential.
GenevestigatoriQ92989.

Family and domain databases

Gene3Di3.40.50.300. 2 hits.
HAMAPiMF_03035. Clp1.
InterProiIPR028606. Clp1.
IPR029007. MobB-typ_P-loop.
IPR027417. P-loop_NTPase.
IPR010655. Pre-mRNA_cleavage_cplxII_Clp1.
[Graphical view]
PfamiPF06807. Clp1. 1 hit.
PF03205. MobB. 1 hit.
[Graphical view]
SUPFAMiSSF52540. SSF52540. 3 hits.
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "AF10 is split by MLL and HEAB, a human homolog to a putative Caenorhabditis elegans ATP/GTP-binding protein in an invins(10;11)(p12;q23q12)."
    Tanabe S., Bohlander S.K., Vignon C.V., Espinosa R. III, Zhao N., Strissel P.L., Zeleznik-Le N.J., Rowley J.D.
    Blood 88:3535-3545(1996) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1).
  2. "Complete sequencing and characterization of 21,243 full-length human cDNAs."
    Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R., Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H., Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.
    , Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K., Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A., Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M., Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y., Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M., Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K., Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S., Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J., Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y., Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N., Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S., Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S., Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O., Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H., Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B., Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y., Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T., Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y., Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S., Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T., Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M., Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T., Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K., Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R., Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.
    Nat. Genet. 36:40-45(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 1 AND 2).
    Tissue: Cerebellum and Placenta.
  3. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  4. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
    Tissue: Lung.
  5. "Human pre-mRNA cleavage factor II(m) contains homologs of yeast proteins and bridges two other cleavage factors."
    de Vries H., Rueegsegger U., Huebner W., Friedlein A., Langen H., Keller W.
    EMBO J. 19:5895-5904(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY MASS SPECTROMETRY, IDENTIFICATION AS A COMPONENT OF THE PRE-MRNA CLEAVAGE COMPLEX II, ASSOCIATION WITH THE PRE-MRNA CLEAVAGE COMPLEX I, SUBCELLULAR LOCATION.
  6. "Identification of a human endonuclease complex reveals a link between tRNA splicing and pre-mRNA 3' end formation."
    Paushkin S.V., Patel M., Furia B.S., Peltz S.W., Trotta C.R.
    Cell 117:311-321(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY MASS SPECTROMETRY, IDENTIFICATION IN A COMPLEX WITH TSEN2; TSEN15; TSEN34 TSEN54, INTERACTION WITH CSTF2 AND SYMPK.
  7. "The human RNA kinase hClp1 is active on 3' transfer RNA exons and short interfering RNAs."
    Weitzer S., Martinez J.
    Nature 447:222-226(2007) [PubMed] [Europe PMC] [Abstract]
    Cited for: IDENTIFICATION BY MASS SPECTROMETRY, FUNCTION, CATALYTIC ACTIVITY, COFACTOR, IDENTIFICATION IN A COMPLEX WITH TSEN2; TSEN15; TSEN34 TSEN54, MUTAGENESIS OF 127-LYS-SER-128.
  8. "Human RNA 5'-kinase (hClp1) can function as a tRNA splicing enzyme in vivo."
    Ramirez A., Shuman S., Schwer B.
    RNA 14:1737-1745(2008) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, CATALYTIC ACTIVITY, MUTAGENESIS OF 127-LYS-SER-128 AND ASP-151.
  9. Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  10. Cited for: VARIANT PCH10 HIS-140, FUNCTION, IDENTIFICATION IN THE TRNA SPLICING ENDONUCLEASE COMPLEX, MUTAGENESIS OF LYS-127, CHARACTERIZATION OF VARIANT PCH10 HIS-140.
  11. Cited for: VARIANT PCH10 HIS-140, FUNCTION, SUBCELLULAR LOCATION.

Entry informationi

Entry nameiCLP1_HUMAN
AccessioniPrimary (citable) accession number: Q92989
Secondary accession number(s): B2R7J6, B4DTI8
Entry historyi
Integrated into UniProtKB/Swiss-Prot: April 27, 2001
Last sequence update: February 1, 1997
Last modified: April 29, 2015
This is version 122 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 11
    Human chromosome 11: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

External Data

Dasty 3

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into Uniref entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.