Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Glycerophosphocholine phosphodiesterase GPCPD1

Gene

Gpcpd1

Organism
Mus musculus (Mouse)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

May be involved in the negative regulation of skeletal muscle differentiation, independently of its glycerophosphocholine phosphodiesterase activity.1 Publication

Catalytic activityi

sn-glycero-3-phosphocholine + H2O = choline + sn-glycerol 3-phosphate.1 Publication

Kineticsi

No significant reactions when glycerophosphoglycerol, glycerophosphoinositol and glycerophosphoserine are used as substrates.

    1. Vmax=2.0 µmol/min/mg enzyme with glycerophosphocholine as substrate1 Publication
    2. Vmax=0.34 µmol/min/mg enzyme with glycerophosphoethanolamine as substrate1 Publication

    Sites

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Binding sitei70 – 701SubstrateSequence analysis

    GO - Molecular functioni

    GO - Biological processi

    Complete GO annotation...

    Keywords - Molecular functioni

    Hydrolase

    Enzyme and pathway databases

    ReactomeiR-MMU-1483115. Hydrolysis of LPC.
    R-MMU-1483152. Hydrolysis of LPE.

    Protein family/group databases

    CAZyiCBM20. Carbohydrate-Binding Module Family 20.

    Names & Taxonomyi

    Protein namesi
    Recommended name:
    Glycerophosphocholine phosphodiesterase GPCPD1 (EC:3.1.4.2)
    Alternative name(s):
    Glycerophosphodiester phosphodiesterase 5
    Preimplantation protein 4
    Gene namesi
    Name:Gpcpd1
    Synonyms:Gde5, Kiaa1434, Prei4
    OrganismiMus musculus (Mouse)
    Taxonomic identifieri10090 [NCBI]
    Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeMusMus
    Proteomesi
    • UP000000589 Componenti: Chromosome 2

    Organism-specific databases

    MGIiMGI:104898. Gpcpd1.

    Subcellular locationi

    GO - Cellular componenti

    • cytoplasm Source: MGI
    • cytosol Source: Reactome
    Complete GO annotation...

    Keywords - Cellular componenti

    Cytoplasm

    PTM / Processingi

    Molecule processing

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Chaini1 – 675675Glycerophosphocholine phosphodiesterase GPCPD1PRO_0000251947Add
    BLAST

    Amino acid modifications

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Modified residuei178 – 1781PhosphoserineBy similarity
    Modified residuei427 – 4271PhosphoserineCombined sources
    Modified residuei611 – 6111PhosphotyrosineCombined sources

    Keywords - PTMi

    Phosphoprotein

    Proteomic databases

    EPDiQ8C0L9.
    MaxQBiQ8C0L9.
    PaxDbiQ8C0L9.
    PeptideAtlasiQ8C0L9.
    PRIDEiQ8C0L9.

    PTM databases

    iPTMnetiQ8C0L9.
    PhosphoSiteiQ8C0L9.

    Expressioni

    Tissue specificityi

    Widely expressed with highest levels in skeletal muscle and heart.1 Publication

    Developmental stagei

    Down-regulated in skeletal muscles atrophies, including atrophies linked to aging and denervation.1 Publication

    Gene expression databases

    BgeeiQ8C0L9.
    CleanExiMM_PREI4.
    ExpressionAtlasiQ8C0L9. baseline and differential.
    GenevisibleiQ8C0L9. MM.

    Interactioni

    Protein-protein interaction databases

    STRINGi10090.ENSMUSP00000062221.

    Structurei

    3D structure databases

    ProteinModelPortaliQ8C0L9.
    SMRiQ8C0L9. Positions 3-115.
    ModBaseiSearch...
    MobiDBiSearch...

    Family & Domainsi

    Domains and Repeats

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Domaini1 – 115115CBM20PROSITE-ProRule annotationAdd
    BLAST
    Domaini321 – 621301GP-PDEAdd
    BLAST

    Region

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Regioni88 – 892Substrate bindingSequence analysis

    Compositional bias

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Compositional biasi169 – 1757Poly-Asp

    Sequence similaritiesi

    Contains 1 CBM20 (carbohydrate binding type-20) domain.PROSITE-ProRule annotation
    Contains 1 GP-PDE domain.Curated

    Phylogenomic databases

    eggNOGiKOG2421. Eukaryota.
    COG0584. LUCA.
    GeneTreeiENSGT00440000033970.
    HOVERGENiHBG080384.
    InParanoidiQ8C0L9.
    KOiK18695.
    OMAiCGEPDIH.
    OrthoDBiEOG7RNJZM.
    PhylomeDBiQ8C0L9.
    TreeFamiTF314722.

    Family and domain databases

    Gene3Di2.60.40.10. 1 hit.
    3.20.20.190. 1 hit.
    InterProiIPR013784. Carb-bd-like_fold.
    IPR002044. CBM_fam20.
    IPR030395. GP_PDE_dom.
    IPR013783. Ig-like_fold.
    IPR017946. PLC-like_Pdiesterase_TIM-brl.
    [Graphical view]
    PfamiPF00686. CBM_20. 1 hit.
    PF03009. GDPD. 1 hit.
    [Graphical view]
    SMARTiSM01065. CBM_2. 1 hit.
    [Graphical view]
    SUPFAMiSSF49452. SSF49452. 1 hit.
    SSF51695. SSF51695. 1 hit.
    PROSITEiPS51166. CBM20. 1 hit.
    PS51704. GP_PDE. 1 hit.
    [Graphical view]

    Sequences (3)i

    Sequence statusi: Complete.

    This entry describes 3 isoformsi produced by alternative splicing. AlignAdd to basket

    Isoform 1 (identifier: Q8C0L9-1) [UniParc]FASTAAdd to basket

    This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

    « Hide

            10         20         30         40         50
    MTPSQVTFEI RGTLLPGEVF AICGSCDALG NWNPQNAVAL INENETGDSV
    60 70 80 90 100
    LWKAVIALNR GVSVKYRYFR GCFLEPKTIG GPCQVIVHKW ETHLQPRSIT
    110 120 130 140 150
    PLESEIIIDD GQFGIHNGVE TLDSGWLTCQ TEIRLRLHFS EKPPVSISKK
    160 170 180 190 200
    KFKKSRFRVK LTLEGLEEDE DDDDDKVSPT VLHKMSNSLE ISLISDNEFK
    210 220 230 240 250
    CRHSQPECGY GLQPDRWTEY SIQTMEPDNL ELIFDFFEED LSEHVVQGDV
    260 270 280 290 300
    LPGHVGTACL LSSTIAESGR SAGILTLPIM SRNSRKTIGK VRVDFIIIKP
    310 320 330 340 350
    LPGYSCSMQS SFSKYWKPRI PLDVGHRGAG NSTTTAKLAK VQENTIASLR
    360 370 380 390 400
    NAASHGAAFV EFDVHLSKDF VPVVYHDLTC CLTMKRKYEA DPVELFEIPV
    410 420 430 440 450
    KELTFDQLQL LKLSHVTALK TKDRKQSLYE EENFFSENQP FPSLKMVLES
    460 470 480 490 500
    LPENVGFNIE IKWICQHRDG VWDGNLSTYF DMNVFLDIIL KTVLENSGKR
    510 520 530 540 550
    RIVFSSFDAD ICTMVRQKQN KYPILFLTQG KSDIYPELMD LRSRTTPIAM
    560 570 580 590 600
    SFAQFENILG INAHTEDLLR NPSYVQEAKA KGLVIFCWGD DTNDPENRRK
    610 620 630 640 650
    LKEFGVNGLI YDRIYDWMPE QPNIFQVEQL ERLKQELPEL KNCLCPTVSH
    660 670
    FIPSSFCVEP DIHVDANGID SVENA
    Length:675
    Mass (Da):76,579
    Last modified:March 1, 2003 - v1
    Checksum:i7ABF9EE48CF39B47
    GO
    Isoform 2 (identifier: Q8C0L9-2) [UniParc]FASTAAdd to basket

    The sequence of this isoform differs from the canonical sequence as follows:
         1-184: Missing.

    Show »
    Length:491
    Mass (Da):55,919
    Checksum:i3B087FDB9B0CD315
    GO
    Isoform 3 (identifier: Q8C0L9-3) [UniParc]FASTAAdd to basket

    The sequence of this isoform differs from the canonical sequence as follows:
         1-445: Missing.

    Show »
    Length:230
    Mass (Da):26,541
    Checksum:i15F29E5FEE611B3A
    GO

    Sequence cautioni

    The sequence BAB26361.2 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.Curated
    The sequence BAC33775.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
    The sequence BAC34739.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
    The sequence BAC65792.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.Curated

    Experimental Info

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Sequence conflicti289 – 2891G → D in BAE38686 (PubMed:16141072).Curated
    Sequence conflicti460 – 4601E → Q in BAC33775 (PubMed:16141072).Curated
    Sequence conflicti494 – 4952LE → SQ in BAC33775 (PubMed:16141072).Curated
    Sequence conflicti499 – 4991K → N in BAC33775 (PubMed:16141072).Curated
    Sequence conflicti559 – 5591L → S in BAB26361 (PubMed:16141072).Curated
    Sequence conflicti602 – 6021K → E in BAB26361 (PubMed:16141072).Curated

    Alternative sequence

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Alternative sequencei1 – 445445Missing in isoform 3. 1 PublicationVSP_020819Add
    BLAST
    Alternative sequencei1 – 184184Missing in isoform 2. 1 PublicationVSP_020820Add
    BLAST

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    AK122510 mRNA. Translation: BAC65792.1. Different initiation.
    AK009563 mRNA. Translation: BAB26361.2. Different initiation.
    AK030645 mRNA. Translation: BAC27063.1.
    AK049491 mRNA. Translation: BAC33775.1. Different initiation.
    AK051728 mRNA. Translation: BAC34739.1. Different initiation.
    AK166293 mRNA. Translation: BAE38686.1.
    AL807386 Genomic DNA. Translation: CAM17016.1.
    BC033408 mRNA. Translation: AAH33408.1.
    CCDSiCCDS38246.1. [Q8C0L9-1]
    CCDS71151.1. [Q8C0L9-2]
    RefSeqiNP_001277979.1. NM_001291050.1. [Q8C0L9-2]
    NP_001277980.1. NM_001291051.1. [Q8C0L9-2]
    NP_001277981.1. NM_001291052.1. [Q8C0L9-2]
    NP_081372.1. NM_027096.2. [Q8C0L9-2]
    NP_083078.3. NM_028802.3. [Q8C0L9-1]
    XP_011238111.1. XM_011239809.1. [Q8C0L9-1]
    UniGeneiMm.211211.
    Mm.448334.

    Genome annotation databases

    EnsembliENSMUST00000060955; ENSMUSP00000062221; ENSMUSG00000027346. [Q8C0L9-1]
    ENSMUST00000110136; ENSMUSP00000105763; ENSMUSG00000027346. [Q8C0L9-2]
    ENSMUST00000110142; ENSMUSP00000105769; ENSMUSG00000027346. [Q8C0L9-1]
    GeneIDi74182.
    KEGGimmu:74182.
    UCSCiuc008mmv.2. mouse. [Q8C0L9-1]

    Keywords - Coding sequence diversityi

    Alternative splicing

    Cross-referencesi

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    AK122510 mRNA. Translation: BAC65792.1. Different initiation.
    AK009563 mRNA. Translation: BAB26361.2. Different initiation.
    AK030645 mRNA. Translation: BAC27063.1.
    AK049491 mRNA. Translation: BAC33775.1. Different initiation.
    AK051728 mRNA. Translation: BAC34739.1. Different initiation.
    AK166293 mRNA. Translation: BAE38686.1.
    AL807386 Genomic DNA. Translation: CAM17016.1.
    BC033408 mRNA. Translation: AAH33408.1.
    CCDSiCCDS38246.1. [Q8C0L9-1]
    CCDS71151.1. [Q8C0L9-2]
    RefSeqiNP_001277979.1. NM_001291050.1. [Q8C0L9-2]
    NP_001277980.1. NM_001291051.1. [Q8C0L9-2]
    NP_001277981.1. NM_001291052.1. [Q8C0L9-2]
    NP_081372.1. NM_027096.2. [Q8C0L9-2]
    NP_083078.3. NM_028802.3. [Q8C0L9-1]
    XP_011238111.1. XM_011239809.1. [Q8C0L9-1]
    UniGeneiMm.211211.
    Mm.448334.

    3D structure databases

    ProteinModelPortaliQ8C0L9.
    SMRiQ8C0L9. Positions 3-115.
    ModBaseiSearch...
    MobiDBiSearch...

    Protein-protein interaction databases

    STRINGi10090.ENSMUSP00000062221.

    Protein family/group databases

    CAZyiCBM20. Carbohydrate-Binding Module Family 20.

    PTM databases

    iPTMnetiQ8C0L9.
    PhosphoSiteiQ8C0L9.

    Proteomic databases

    EPDiQ8C0L9.
    MaxQBiQ8C0L9.
    PaxDbiQ8C0L9.
    PeptideAtlasiQ8C0L9.
    PRIDEiQ8C0L9.

    Protocols and materials databases

    Structural Biology KnowledgebaseSearch...

    Genome annotation databases

    EnsembliENSMUST00000060955; ENSMUSP00000062221; ENSMUSG00000027346. [Q8C0L9-1]
    ENSMUST00000110136; ENSMUSP00000105763; ENSMUSG00000027346. [Q8C0L9-2]
    ENSMUST00000110142; ENSMUSP00000105769; ENSMUSG00000027346. [Q8C0L9-1]
    GeneIDi74182.
    KEGGimmu:74182.
    UCSCiuc008mmv.2. mouse. [Q8C0L9-1]

    Organism-specific databases

    CTDi56261.
    MGIiMGI:104898. Gpcpd1.
    RougeiSearch...

    Phylogenomic databases

    eggNOGiKOG2421. Eukaryota.
    COG0584. LUCA.
    GeneTreeiENSGT00440000033970.
    HOVERGENiHBG080384.
    InParanoidiQ8C0L9.
    KOiK18695.
    OMAiCGEPDIH.
    OrthoDBiEOG7RNJZM.
    PhylomeDBiQ8C0L9.
    TreeFamiTF314722.

    Enzyme and pathway databases

    ReactomeiR-MMU-1483115. Hydrolysis of LPC.
    R-MMU-1483152. Hydrolysis of LPE.

    Miscellaneous databases

    PROiQ8C0L9.
    SOURCEiSearch...

    Gene expression databases

    BgeeiQ8C0L9.
    CleanExiMM_PREI4.
    ExpressionAtlasiQ8C0L9. baseline and differential.
    GenevisibleiQ8C0L9. MM.

    Family and domain databases

    Gene3Di2.60.40.10. 1 hit.
    3.20.20.190. 1 hit.
    InterProiIPR013784. Carb-bd-like_fold.
    IPR002044. CBM_fam20.
    IPR030395. GP_PDE_dom.
    IPR013783. Ig-like_fold.
    IPR017946. PLC-like_Pdiesterase_TIM-brl.
    [Graphical view]
    PfamiPF00686. CBM_20. 1 hit.
    PF03009. GDPD. 1 hit.
    [Graphical view]
    SMARTiSM01065. CBM_2. 1 hit.
    [Graphical view]
    SUPFAMiSSF49452. SSF49452. 1 hit.
    SSF51695. SSF51695. 1 hit.
    PROSITEiPS51166. CBM20. 1 hit.
    PS51704. GP_PDE. 1 hit.
    [Graphical view]
    ProtoNetiSearch...

    Publicationsi

    « Hide 'large scale' publications
    1. "Prediction of the coding sequences of mouse homologues of KIAA gene: II. The complete nucleotide sequences of 400 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries."
      Okazaki N., Kikuno R., Ohara R., Inamoto S., Aizawa H., Yuasa S., Nakajima D., Nagase T., Ohara O., Koga H.
      DNA Res. 10:35-48(2003) [PubMed] [Europe PMC] [Abstract]
      Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 3).
      Tissue: Brain.
    2. "The transcriptional landscape of the mammalian genome."
      Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C., Kodzius R., Shimokawa K., Bajic V.B., Brenner S.E., Batalov S., Forrest A.R., Zavolan M., Davis M.J.
      , Wilming L.G., Aidinis V., Allen J.E., Ambesi-Impiombato A., Apweiler R., Aturaliya R.N., Bailey T.L., Bansal M., Baxter L., Beisel K.W., Bersano T., Bono H., Chalk A.M., Chiu K.P., Choudhary V., Christoffels A., Clutterbuck D.R., Crowe M.L., Dalla E., Dalrymple B.P., de Bono B., Della Gatta G., di Bernardo D., Down T., Engstrom P., Fagiolini M., Faulkner G., Fletcher C.F., Fukushima T., Furuno M., Futaki S., Gariboldi M., Georgii-Hemming P., Gingeras T.R., Gojobori T., Green R.E., Gustincich S., Harbers M., Hayashi Y., Hensch T.K., Hirokawa N., Hill D., Huminiecki L., Iacono M., Ikeo K., Iwama A., Ishikawa T., Jakt M., Kanapin A., Katoh M., Kawasawa Y., Kelso J., Kitamura H., Kitano H., Kollias G., Krishnan S.P., Kruger A., Kummerfeld S.K., Kurochkin I.V., Lareau L.F., Lazarevic D., Lipovich L., Liu J., Liuni S., McWilliam S., Madan Babu M., Madera M., Marchionni L., Matsuda H., Matsuzawa S., Miki H., Mignone F., Miyake S., Morris K., Mottagui-Tabar S., Mulder N., Nakano N., Nakauchi H., Ng P., Nilsson R., Nishiguchi S., Nishikawa S., Nori F., Ohara O., Okazaki Y., Orlando V., Pang K.C., Pavan W.J., Pavesi G., Pesole G., Petrovsky N., Piazza S., Reed J., Reid J.F., Ring B.Z., Ringwald M., Rost B., Ruan Y., Salzberg S.L., Sandelin A., Schneider C., Schoenbach C., Sekiguchi K., Semple C.A., Seno S., Sessa L., Sheng Y., Shibata Y., Shimada H., Shimada K., Silva D., Sinclair B., Sperling S., Stupka E., Sugiura K., Sultana R., Takenaka Y., Taki K., Tammoja K., Tan S.L., Tang S., Taylor M.S., Tegner J., Teichmann S.A., Ueda H.R., van Nimwegen E., Verardo R., Wei C.L., Yagi K., Yamanishi H., Zabarovsky E., Zhu S., Zimmer A., Hide W., Bult C., Grimmond S.M., Teasdale R.D., Liu E.T., Brusic V., Quackenbush J., Wahlestedt C., Mattick J.S., Hume D.A., Kai C., Sasaki D., Tomaru Y., Fukuda S., Kanamori-Katayama M., Suzuki M., Aoki J., Arakawa T., Iida J., Imamura K., Itoh M., Kato T., Kawaji H., Kawagashira N., Kawashima T., Kojima M., Kondo S., Konno H., Nakano K., Ninomiya N., Nishio T., Okada M., Plessy C., Shibata K., Shiraki T., Suzuki S., Tagami M., Waki K., Watahiki A., Okamura-Oho Y., Suzuki H., Kawai J., Hayashizaki Y.
      Science 309:1559-1563(2005) [PubMed] [Europe PMC] [Abstract]
      Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
      Strain: C57BL/6J.
      Tissue: Embryo, Embryonic spinal ganglion, Head, Mammary gland and Tongue.
    3. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
      Strain: C57BL/6J.
    4. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
      The MGC Project Team
      Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
      Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2).
      Tissue: Eye.
    5. "Large-scale identification and evolution indexing of tyrosine phosphorylation sites from murine brain."
      Ballif B.A., Carey G.R., Sunyaev S.R., Gygi S.P.
      J. Proteome Res. 7:311-318(2008) [PubMed] [Europe PMC] [Abstract]
      Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT TYR-611, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
      Tissue: Brain.
    6. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-427, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
      Tissue: Brain, Brown adipose tissue, Heart, Kidney, Liver, Lung, Spleen and Testis.
    7. "A novel glycerophosphodiester phosphodiesterase, GDE5, controls skeletal muscle development via a non-enzymatic mechanism."
      Okazaki Y., Ohshima N., Yoshizawa I., Kamei Y., Mariggio S., Okamoto K., Maeda M., Nogusa Y., Fujioka Y., Izumi T., Ogawa Y., Shiro Y., Wada M., Kato N., Corda D., Yanaka N.
      J. Biol. Chem. 285:27652-27663(2010) [PubMed] [Europe PMC] [Abstract]
      Cited for: FUNCTION, CATALYTIC ACTIVITY, BIOPHYSICOCHEMICAL PROPERTIES, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE.

    Entry informationi

    Entry nameiGPCP1_MOUSE
    AccessioniPrimary (citable) accession number: Q8C0L9
    Secondary accession number(s): A2AMD5
    , Q3TLV6, Q80TD5, Q8BKJ7, Q8BKW7, Q8CFW2, Q9D759
    Entry historyi
    Integrated into UniProtKB/Swiss-Prot: October 3, 2006
    Last sequence update: March 1, 2003
    Last modified: July 6, 2016
    This is version 115 of the entry and version 1 of the sequence. [Complete history]
    Entry statusiReviewed (UniProtKB/Swiss-Prot)
    Annotation programChordata Protein Annotation Program

    Miscellaneousi

    Keywords - Technical termi

    Complete proteome, Reference proteome

    Documents

    1. MGD cross-references
      Mouse Genome Database (MGD) cross-references in UniProtKB/Swiss-Prot
    2. SIMILARITY comments
      Index of protein domains and families

    Similar proteinsi

    Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
    100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
    90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
    50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.