Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Polypeptide N-acetylgalactosaminyltransferase 35A

Gene

Pgant35A

Organism
Drosophila melanogaster (Fruit fly)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Essential glycotransferase, which catalyzes the initial reaction in O-linked oligosaccharide biosynthesis, the transfer of an N-acetyl-D-galactosamine residue to a serine or threonine residue on the protein receptor. It can both act as a peptide transferase that transfers GalNAc onto unmodified peptide substrates, and as a glycopeptide transferase that requires the prior addition of a GalNAc on a peptide before adding additional GalNAc moieties.

Catalytic activityi

UDP-N-acetyl-alpha-D-galactosamine + polypeptide = UDP + N-acetyl-alpha-D-galactosaminyl-polypeptide.2 Publications

Cofactori

Mn2+By similarity

Kineticsi

  1. KM=8.5 µM for UDP-GalNAc1 Publication
  2. KM=0.35 mM for EA2 acceptor peptide1 Publication

    Pathway:iprotein glycosylation

    This protein is involved in the pathway protein glycosylation, which is part of Protein modification.
    View all proteins of this organism that are known to be involved in the pathway protein glycosylation and in Protein modification.

    Sites

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Binding sitei188 – 1881SubstrateBy similarity
    Binding sitei220 – 2201SubstrateBy similarity
    Metal bindingi243 – 2431ManganeseBy similarity
    Binding sitei244 – 2441SubstrateBy similarity
    Metal bindingi245 – 2451ManganeseBy similarity
    Binding sitei348 – 3481SubstrateBy similarity
    Metal bindingi376 – 3761ManganeseBy similarity
    Binding sitei379 – 3791SubstrateBy similarity
    Binding sitei384 – 3841SubstrateBy similarity

    GO - Molecular functioni

    GO - Biological processi

    • oligosaccharide biosynthetic process Source: UniProtKB
    • open tracheal system development Source: FlyBase
    • protein glycosylation Source: UniProtKB-UniPathway
    Complete GO annotation...

    Keywords - Molecular functioni

    Glycosyltransferase, Transferase

    Keywords - Ligandi

    Lectin, Manganese, Metal-binding

    Enzyme and pathway databases

    BRENDAi2.4.1.41. 1994.
    ReactomeiREACT_349991. O-linked glycosylation of mucins.
    SABIO-RKQ8MVS5.
    UniPathwayiUPA00378.

    Protein family/group databases

    CAZyiCBM13. Carbohydrate-Binding Module Family 13.
    GT27. Glycosyltransferase Family 27.

    Names & Taxonomyi

    Protein namesi
    Recommended name:
    Polypeptide N-acetylgalactosaminyltransferase 35A (EC:2.4.1.41)
    Alternative name(s):
    Protein l(2)35Aa
    Protein-UDP acetylgalactosaminyltransferase 35A
    UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase 35A
    Short name:
    pp-GaNTase 35A
    dGalNAc-T1
    Gene namesi
    Name:Pgant35A
    ORF Names:CG7480
    OrganismiDrosophila melanogaster (Fruit fly)
    Taxonomic identifieri7227 [NCBI]
    Taxonomic lineageiEukaryotaMetazoaEcdysozoaArthropodaHexapodaInsectaPterygotaNeopteraEndopterygotaDipteraBrachyceraMuscomorphaEphydroideaDrosophilidaeDrosophilaSophophora
    ProteomesiUP000000803 Componenti: Chromosome 2L

    Organism-specific databases

    FlyBaseiFBgn0001970. Pgant35A.

    Subcellular locationi

    Topology

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Topological domaini1 – 66CytoplasmicSequence Analysis
    Transmembranei7 – 2923Helical; Signal-anchor for type II membrane proteinSequence AnalysisAdd
    BLAST
    Topological domaini30 – 632603LumenalSequence AnalysisAdd
    BLAST

    GO - Cellular componenti

    • Golgi membrane Source: UniProtKB-SubCell
    • Golgi stack Source: UniProtKB
    • integral component of membrane Source: UniProtKB-KW
    Complete GO annotation...

    Keywords - Cellular componenti

    Golgi apparatus, Membrane

    Pathology & Biotechi

    Mutagenesis

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Mutagenesisi227 – 2271R → W in SF32; induces lethality. 2 Publications
    Mutagenesisi243 – 2431D → N: Abolishes glycosyltransferase activity. Not able to rescue lethality caused by SF32 mutation. 1 Publication

    PTM / Processingi

    Molecule processing

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Chaini1 – 632632Polypeptide N-acetylgalactosaminyltransferase 35APRO_0000059168Add
    BLAST

    Amino acid modifications

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Glycosylationi49 – 491N-linked (GlcNAc...)1 Publication
    Glycosylationi69 – 691N-linked (GlcNAc...)Sequence Analysis
    Disulfide bondi136 ↔ 371PROSITE-ProRule annotation
    Glycosylationi264 – 2641N-linked (GlcNAc...)Sequence Analysis
    Disulfide bondi362 ↔ 439PROSITE-ProRule annotation
    Disulfide bondi493 ↔ 516PROSITE-ProRule annotation
    Disulfide bondi539 ↔ 553PROSITE-ProRule annotation
    Disulfide bondi580 ↔ 597PROSITE-ProRule annotation

    Keywords - PTMi

    Disulfide bond, Glycoprotein

    Proteomic databases

    PaxDbiQ8MVS5.

    Expressioni

    Tissue specificityi

    Expressed at high level in ovaries. Expressed at low level in testis. Expressed at higher level in adult females than males. During oogenesis, it is detected in germ cells and follicle epithelia of all developmental stages. Initially expressed during early stages of oogenesis in region I and reaches high levels in regions IIa and IIb of the germarium. Highly expressed in stage 2 egg chambers. Remains highly expressed during later stages of oogenesis. During embryonic stages 9-11, expressed in the primordium of the foregut, midgut and hindgut. Expressed in salivary glands from embryonic stage 12 onwards. During embryonic stages 12-13, expressed in the posterior midgut and hindgut. During embryonic stages 14-15, expression continues in the hindgut. During embryonic stages 16-17, expressed in the dorsal longitudinal trachea and posterior spiracles. In third instar larvae, ubiquitously expressed in wing, eye-antennal, leg and haltere imaginal disks.2 Publications

    Developmental stagei

    Expressed both maternally and zygotically. Expressed throughout embryonic, larval, pupal and adult stages, with increasing levels during larval development.3 Publications

    Gene expression databases

    BgeeiQ8MVS5.
    GenevisibleiQ8MVS5. DM.

    Interactioni

    Protein-protein interaction databases

    STRINGi7227.FBpp0080202.

    Structurei

    3D structure databases

    ProteinModelPortaliQ8MVS5.
    SMRiQ8MVS5. Positions 109-607.
    ModBaseiSearch...
    MobiDBiSearch...

    Family & Domainsi

    Domains and Repeats

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Domaini526 – 632107Ricin B-type lectinPROSITE-ProRule annotationAdd
    BLAST

    Region

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Regioni147 – 259113Catalytic subdomain AAdd
    BLAST
    Regioni317 – 37963Catalytic subdomain BAdd
    BLAST

    Domaini

    There are two conserved domains in the glycosyltransferase region: the N-terminal domain (domain A, also called GT1 motif), which is probably involved in manganese coordination and substrate binding and the C-terminal domain (domain B, also called Gal/GalNAc-T motif), which is probably involved in catalytic reaction and UDP-Gal binding.By similarity
    The ricin B-type lectin domain binds to GalNAc and contributes to the glycopeptide specificity.By similarity

    Sequence similaritiesi

    Contains 1 ricin B-type lectin domain.PROSITE-ProRule annotation

    Keywords - Domaini

    Signal-anchor, Transmembrane, Transmembrane helix

    Phylogenomic databases

    eggNOGiNOG239675.
    GeneTreeiENSGT00760000118828.
    InParanoidiQ8MVS5.
    KOiK00710.
    OMAiMCFYNEH.
    OrthoDBiEOG7J9VP2.
    PhylomeDBiQ8MVS5.

    Family and domain databases

    Gene3Di3.90.550.10. 1 hit.
    InterProiIPR001173. Glyco_trans_2-like.
    IPR029044. Nucleotide-diphossugar_trans.
    IPR000772. Ricin_B_lectin.
    [Graphical view]
    PfamiPF00535. Glycos_transf_2. 1 hit.
    PF00652. Ricin_B_lectin. 1 hit.
    [Graphical view]
    SMARTiSM00458. RICIN. 1 hit.
    [Graphical view]
    SUPFAMiSSF50370. SSF50370. 1 hit.
    SSF53448. SSF53448. 1 hit.
    PROSITEiPS50231. RICIN_B_LECTIN. 1 hit.
    [Graphical view]

    Sequencei

    Sequence statusi: Complete.

    Q8MVS5-1 [UniParc]FASTAAdd to basket

    « Hide

            10         20         30         40         50
    MMQIKRLLCK SCGLGTLLVA VVWLLALLFY SHSLRSSIRS AGWRIDEGNA
    60 70 80 90 100
    TPRAELSYQA RVTVGCTPNA SITTGESPAA PKPPSDPEQL ELLGVVRNKQ
    110 120 130 140 150
    DKYIRDIGYK HHAFNALVSN NIGLFRAIPD TRHKVCDRQE TTEAENLPQA
    160 170 180 190 200
    SIVMCFYNEH KMTLMRSIKT VLERTPSYLL REIILVDDHS DLPELEFHLH
    210 220 230 240 250
    GDLRARLKYD NLRYIKNEQR EGLIRSRVIG AREAVGDVLV FLDSHIEVNQ
    260 270 280 290 300
    QWLEPLLRLI KSENATLAVP VIDLINADTF EYTPSPLVRG GFNWGLHFRW
    310 320 330 340 350
    ENLPEGTLKV PEDFRGPFRS PTMAGGLFAV NRKYFQHLGE YDMAMDIWGG
    360 370 380 390 400
    ENIEISFRAW QCGGAIKIVP CSRVGHIFRK RRPYTSPDGA NTMLKNSLRL
    410 420 430 440 450
    AHVWMDQYKD YYLKHEKVPK TYDYGDISDR LKLRERLQCR DFAWYLKNVY
    460 470 480 490 500
    PELHVPGEES KKSAAAPIFQ PWHSRKRNYV DTFQLRLTGT ELCAAVVAPK
    510 520 530 540 550
    VKGFWKKGSS LQLQTCRRTP NQLWYETEKA EIVLDKLLCL EASGDAQVTV
    560 570 580 590 600
    NKCHEMLGDQ QWRHTRNANS PVYNMAKGTC LRAAAPTTGA LISLDLCSKS
    610 620 630
    NGAGGSWDIV QLKKPTEAEG RAKEARNSDK AL
    Length:632
    Mass (Da):71,828
    Last modified:August 16, 2004 - v2
    Checksum:iE726B9F32481E4E9
    GO

    Sequence cautioni

    The sequence AAK66862.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated

    Experimental Info

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Sequence conflicti72 – 721I → T in AAM62405 (PubMed:11925446).Curated
    Sequence conflicti628 – 6281S → T in AAL49213 (PubMed:12537569).Curated

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    AF478697 mRNA. Translation: AAM62405.1.
    AF478698 Genomic DNA. Translation: AAM62406.1.
    AF478699 Genomic DNA. Translation: AAM62407.1.
    AF478700 Genomic DNA. Translation: AAM62408.1.
    AF158747 mRNA. Translation: AAK66862.1. Different initiation.
    AE014134 Genomic DNA. Translation: AAF53391.1.
    AY071591 mRNA. Translation: AAL49213.1.
    RefSeqiNP_652069.2. NM_143812.4.
    UniGeneiDm.1528.

    Genome annotation databases

    EnsemblMetazoaiFBtr0080629; FBpp0080202; FBgn0001970.
    GeneIDi48775.
    KEGGidme:Dmel_CG7480.

    Cross-referencesi

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    AF478697 mRNA. Translation: AAM62405.1.
    AF478698 Genomic DNA. Translation: AAM62406.1.
    AF478699 Genomic DNA. Translation: AAM62407.1.
    AF478700 Genomic DNA. Translation: AAM62408.1.
    AF158747 mRNA. Translation: AAK66862.1. Different initiation.
    AE014134 Genomic DNA. Translation: AAF53391.1.
    AY071591 mRNA. Translation: AAL49213.1.
    RefSeqiNP_652069.2. NM_143812.4.
    UniGeneiDm.1528.

    3D structure databases

    ProteinModelPortaliQ8MVS5.
    SMRiQ8MVS5. Positions 109-607.
    ModBaseiSearch...
    MobiDBiSearch...

    Protein-protein interaction databases

    STRINGi7227.FBpp0080202.

    Protein family/group databases

    CAZyiCBM13. Carbohydrate-Binding Module Family 13.
    GT27. Glycosyltransferase Family 27.

    Proteomic databases

    PaxDbiQ8MVS5.

    Protocols and materials databases

    Structural Biology KnowledgebaseSearch...

    Genome annotation databases

    EnsemblMetazoaiFBtr0080629; FBpp0080202; FBgn0001970.
    GeneIDi48775.
    KEGGidme:Dmel_CG7480.

    Organism-specific databases

    CTDi48775.
    FlyBaseiFBgn0001970. Pgant35A.

    Phylogenomic databases

    eggNOGiNOG239675.
    GeneTreeiENSGT00760000118828.
    InParanoidiQ8MVS5.
    KOiK00710.
    OMAiMCFYNEH.
    OrthoDBiEOG7J9VP2.
    PhylomeDBiQ8MVS5.

    Enzyme and pathway databases

    UniPathwayiUPA00378.
    BRENDAi2.4.1.41. 1994.
    ReactomeiREACT_349991. O-linked glycosylation of mucins.
    SABIO-RKQ8MVS5.

    Miscellaneous databases

    GenomeRNAii48775.
    NextBioi839543.
    PROiQ8MVS5.

    Gene expression databases

    BgeeiQ8MVS5.
    GenevisibleiQ8MVS5. DM.

    Family and domain databases

    Gene3Di3.90.550.10. 1 hit.
    InterProiIPR001173. Glyco_trans_2-like.
    IPR029044. Nucleotide-diphossugar_trans.
    IPR000772. Ricin_B_lectin.
    [Graphical view]
    PfamiPF00535. Glycos_transf_2. 1 hit.
    PF00652. Ricin_B_lectin. 1 hit.
    [Graphical view]
    SMARTiSM00458. RICIN. 1 hit.
    [Graphical view]
    SUPFAMiSSF50370. SSF50370. 1 hit.
    SSF53448. SSF53448. 1 hit.
    PROSITEiPS50231. RICIN_B_LECTIN. 1 hit.
    [Graphical view]
    ProtoNetiSearch...

    Publicationsi

    « Hide 'large scale' publications
    1. "A UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase is essential for viability in Drosophila melanogaster."
      Ten Hagen K.G., Tran D.T.
      J. Biol. Chem. 277:22616-22622(2002) [PubMed] [Europe PMC] [Abstract]
      Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA / MRNA], ENZYME ACTIVITY, BIOPHYSICOCHEMICAL PROPERTIES, DEVELOPMENTAL STAGE, MUTAGENESIS OF ARG-227.
      Strain: Canton-S.
      Tissue: Embryo.
    2. "Functional conservation of subfamilies of putative UDP-N-acetylgalactosamine:polypeptide N-acetylgalactosaminyltransferases in Drosophila, Caenorhabditis elegans, and mammals. One subfamily composed of l(2)35Aa is essential in Drosophila."
      Schwientek T., Bennett E.P., Flores C., Thacker J., Hollmann M., Reis C.A., Behrens J., Mandel U., Keck B., Schaefer M.A., Haselmann K., Zubarev R., Roepstorff P., Burchell J.M., Taylor-Papadimitriou J., Hollingsworth M.A., Clausen H.
      J. Biol. Chem. 277:22623-22638(2002) [PubMed] [Europe PMC] [Abstract]
      Cited for: NUCLEOTIDE SEQUENCE [MRNA], ENZYME ACTIVITY, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE, MUTAGENESIS OF ARG-227.
    3. "An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region."
      Ashburner M., Misra S., Roote J., Lewis S.E., Blazej R.G., Davis T., Doyle C., Galle R.F., George R.A., Harris N.L., Hartzell G., Harvey D.A., Hong L., Houston K.A., Hoskins R.A., Johnson G., Martin C., Moshrefi A.R.
      , Palazzolo M., Reese M.G., Spradling A.C., Tsang G., Wan K.H., Whitelaw K., Celniker S.E., Rubin G.M.
      Genetics 153:179-219(1999) [PubMed] [Europe PMC] [Abstract]
      Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
      Strain: Berkeley.
    4. "The genome sequence of Drosophila melanogaster."
      Adams M.D., Celniker S.E., Holt R.A., Evans C.A., Gocayne J.D., Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., George R.A., Lewis S.E., Richards S., Ashburner M., Henderson S.N., Sutton G.G., Wortman J.R., Yandell M.D.
      , Zhang Q., Chen L.X., Brandon R.C., Rogers Y.-H.C., Blazej R.G., Champe M., Pfeiffer B.D., Wan K.H., Doyle C., Baxter E.G., Helt G., Nelson C.R., Miklos G.L.G., Abril J.F., Agbayani A., An H.-J., Andrews-Pfannkoch C., Baldwin D., Ballew R.M., Basu A., Baxendale J., Bayraktaroglu L., Beasley E.M., Beeson K.Y., Benos P.V., Berman B.P., Bhandari D., Bolshakov S., Borkova D., Botchan M.R., Bouck J., Brokstein P., Brottier P., Burtis K.C., Busam D.A., Butler H., Cadieu E., Center A., Chandra I., Cherry J.M., Cawley S., Dahlke C., Davenport L.B., Davies P., de Pablos B., Delcher A., Deng Z., Mays A.D., Dew I., Dietz S.M., Dodson K., Doup L.E., Downes M., Dugan-Rocha S., Dunkov B.C., Dunn P., Durbin K.J., Evangelista C.C., Ferraz C., Ferriera S., Fleischmann W., Fosler C., Gabrielian A.E., Garg N.S., Gelbart W.M., Glasser K., Glodek A., Gong F., Gorrell J.H., Gu Z., Guan P., Harris M., Harris N.L., Harvey D.A., Heiman T.J., Hernandez J.R., Houck J., Hostin D., Houston K.A., Howland T.J., Wei M.-H., Ibegwam C., Jalali M., Kalush F., Karpen G.H., Ke Z., Kennison J.A., Ketchum K.A., Kimmel B.E., Kodira C.D., Kraft C.L., Kravitz S., Kulp D., Lai Z., Lasko P., Lei Y., Levitsky A.A., Li J.H., Li Z., Liang Y., Lin X., Liu X., Mattei B., McIntosh T.C., McLeod M.P., McPherson D., Merkulov G., Milshina N.V., Mobarry C., Morris J., Moshrefi A., Mount S.M., Moy M., Murphy B., Murphy L., Muzny D.M., Nelson D.L., Nelson D.R., Nelson K.A., Nixon K., Nusskern D.R., Pacleb J.M., Palazzolo M., Pittman G.S., Pan S., Pollard J., Puri V., Reese M.G., Reinert K., Remington K., Saunders R.D.C., Scheeler F., Shen H., Shue B.C., Siden-Kiamos I., Simpson M., Skupski M.P., Smith T.J., Spier E., Spradling A.C., Stapleton M., Strong R., Sun E., Svirskas R., Tector C., Turner R., Venter E., Wang A.H., Wang X., Wang Z.-Y., Wassarman D.A., Weinstock G.M., Weissenbach J., Williams S.M., Woodage T., Worley K.C., Wu D., Yang S., Yao Q.A., Ye J., Yeh R.-F., Zaveri J.S., Zhan M., Zhang G., Zhao Q., Zheng L., Zheng X.H., Zhong F.N., Zhong W., Zhou X., Zhu S.C., Zhu X., Smith H.O., Gibbs R.A., Myers E.W., Rubin G.M., Venter J.C.
      Science 287:2185-2195(2000) [PubMed] [Europe PMC] [Abstract]
      Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
      Strain: Berkeley.
    5. Cited for: GENOME REANNOTATION.
      Strain: Berkeley.
    6. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
      Strain: Berkeley.
      Tissue: Embryo.
    7. "Expression of the UDP-GalNAc: polypeptide N-acetylgalactosaminyltransferase family is spatially and temporally regulated during Drosophila development."
      Tian E., Ten Hagen K.G.
      Glycobiology 16:83-95(2006) [PubMed] [Europe PMC] [Abstract]
      Cited for: TISSUE SPECIFICITY, DEVELOPMENTAL STAGE.
    8. "Identification of N-glycosylated proteins from the central nervous system of Drosophila melanogaster."
      Koles K., Lim J.-M., Aoki K., Porterfield M., Tiemeyer M., Wells L., Panin V.
      Glycobiology 17:1388-1403(2007) [PubMed] [Europe PMC] [Abstract]
      Cited for: GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-49, IDENTIFICATION BY MASS SPECTROMETRY.
      Strain: Oregon-R.
      Tissue: Head.
    9. "Rescue of Drosophila Melanogaster l(2)35Aa lethality is only mediated by polypeptide GalNAc-transferase pgant35A, but not by the evolutionary conserved human ortholog GalNAc-transferase-T11."
      Bennett E.P., Chen Y.W., Schwientek T., Mandel U., Schjoldager K.T., Cohen S.M., Clausen H.
      Glycoconj. J. 27:435-444(2010) [PubMed] [Europe PMC] [Abstract]
      Cited for: MUTAGENESIS OF ASP-243.

    Entry informationi

    Entry nameiGLT35_DROME
    AccessioniPrimary (citable) accession number: Q8MVS5
    Secondary accession number(s): Q8MVS2
    , Q8MVS3, Q8MVS4, Q8SYF1, Q965E4, Q9V3C9
    Entry historyi
    Integrated into UniProtKB/Swiss-Prot: August 16, 2004
    Last sequence update: August 16, 2004
    Last modified: June 24, 2015
    This is version 101 of the entry and version 2 of the sequence. [Complete history]
    Entry statusiReviewed (UniProtKB/Swiss-Prot)
    Annotation programDrosophila annotation project

    Miscellaneousi

    Miscellaneous

    The human ortholog GALNT11 (AC Q8NCW6) is not able to rescue lethality caused by the SF32 mutation.1 Publication

    Keywords - Technical termi

    Complete proteome, Reference proteome

    Documents

    1. Drosophila
      Drosophila: entries, gene names and cross-references to FlyBase
    2. PATHWAY comments
      Index of metabolic and biosynthesis pathways
    3. SIMILARITY comments
      Index of protein domains and families

    External Data

    Dasty 3

    Similar proteinsi

    Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
    100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into Uniref entry.
    90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
    50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.