Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Polypeptide N-acetylgalactosaminyltransferase 2

Gene

GALNT2

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Catalyzes the initial reaction in O-linked oligosaccharide biosynthesis, the transfer of an N-acetyl-D-galactosamine residue to a serine or threonine residue on the protein receptor. Has a broad spectrum of substrates for peptides such as EA2, Muc5AC, Muc1a, Muc1b. Probably involved in O-linked glycosylation of the immunoglobulin A1 (IgA1) hinge region.4 Publications

Catalytic activityi

UDP-N-acetyl-alpha-D-galactosamine + polypeptide = UDP + N-acetyl-alpha-D-galactosaminyl-polypeptide.2 Publications

Cofactori

Mn2+2 Publications

Pathwayi: protein glycosylation

This protein is involved in the pathway protein glycosylation, which is part of Protein modification.
View all proteins of this organism that are known to be involved in the pathway protein glycosylation and in Protein modification.

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Binding sitei143Substrate1
Binding sitei176Substrate1
Binding sitei201Substrate1
Metal bindingi224Manganese1 Publication1
Binding sitei225Substrate1
Metal bindingi226Manganese1 Publication1
Binding sitei331Substrate1
Metal bindingi359Manganese1 Publication1
Binding sitei362Substrate1
Binding sitei365Substrate1
Binding sitei367Substrate1

GO - Molecular functioni

  • carbohydrate binding Source: UniProtKB-KW
  • manganese ion binding Source: UniProtKB
  • polypeptide N-acetylgalactosaminyltransferase activity Source: UniProtKB

GO - Biological processi

  • immunoglobulin biosynthetic process Source: BHF-UCL
  • O-glycan processing Source: GO_Central
  • protein O-linked glycosylation Source: UniProtKB
  • protein O-linked glycosylation via serine Source: BHF-UCL
  • protein O-linked glycosylation via threonine Source: BHF-UCL
Complete GO annotation...

Keywords - Molecular functioni

Glycosyltransferase, Transferase

Keywords - Ligandi

Lectin, Manganese, Metal-binding

Enzyme and pathway databases

BioCyciZFISH:HS07092-MONOMER.
BRENDAi2.4.1.41. 2681.
ReactomeiR-HSA-6811436. COPI-independent Golgi-to-ER retrograde traffic.
R-HSA-913709. O-linked glycosylation of mucins.
UniPathwayiUPA00378.

Protein family/group databases

CAZyiCBM13. Carbohydrate-Binding Module Family 13.
GT27. Glycosyltransferase Family 27.

Names & Taxonomyi

Protein namesi
Recommended name:
Polypeptide N-acetylgalactosaminyltransferase 2 (EC:2.4.1.41)
Alternative name(s):
Polypeptide GalNAc transferase 2
Short name:
GalNAc-T2
Short name:
pp-GaNTase 2
Protein-UDP acetylgalactosaminyltransferase 2
UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase 2
Cleaved into the following chain:
Gene namesi
Name:GALNT2
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 1

Organism-specific databases

HGNCiHGNC:4124. GALNT2.

Subcellular locationi

Topology

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Topological domaini1 – 6CytoplasmicSequence analysis6
Transmembranei7 – 24Helical; Signal-anchor for type II membrane proteinSequence analysisAdd BLAST18
Topological domaini25 – 571LumenalSequence analysisAdd BLAST547

GO - Cellular componenti

  • endoplasmic reticulum membrane Source: Reactome
  • extracellular exosome Source: UniProtKB
  • Golgi apparatus Source: BHF-UCL
  • Golgi cisterna membrane Source: UniProtKB-SubCell
  • Golgi membrane Source: Reactome
  • Golgi stack Source: BHF-UCL
  • integral component of Golgi membrane Source: BHF-UCL
  • membrane Source: UniProtKB
  • perinuclear region of cytoplasm Source: BHF-UCL
Complete GO annotation...

Keywords - Cellular componenti

Golgi apparatus, Membrane, Secreted

Pathology & Biotechi

Organism-specific databases

DisGeNETi2590.
OpenTargetsiENSG00000143641.
PharmGKBiPA28537.

Polymorphism and mutation databases

BioMutaiGALNT2.
DMDMi51315838.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00002233911 – 571Polypeptide N-acetylgalactosaminyltransferase 2Add BLAST571
ChainiPRO_000001226552 – 571Polypeptide N-acetylgalactosaminyltransferase 2 soluble formAdd BLAST520

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Disulfide bondi126 ↔ 354PROSITE-ProRule annotation1 Publication
Disulfide bondi345 ↔ 423PROSITE-ProRule annotation1 Publication
Disulfide bondi456 ↔ 473PROSITE-ProRule annotation1 Publication
Disulfide bondi496 ↔ 513PROSITE-ProRule annotation1 Publication
Modified residuei536PhosphoserineCombined sources1
Disulfide bondi539 ↔ 555PROSITE-ProRule annotation1 Publication

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sitei516Not glycosylated1

Keywords - PTMi

Disulfide bond, Phosphoprotein

Proteomic databases

EPDiQ10471.
MaxQBiQ10471.
PaxDbiQ10471.
PeptideAtlasiQ10471.
PRIDEiQ10471.

PTM databases

iPTMnetiQ10471.
PhosphoSitePlusiQ10471.
SwissPalmiQ10471.

Expressioni

Tissue specificityi

Widely expressed.1 Publication

Gene expression databases

BgeeiENSG00000143641.
CleanExiHS_GALNT2.
GenevisibleiQ10471. HS.

Organism-specific databases

HPAiHPA011222.

Interactioni

Binary interactionsi

WithEntry#Exp.IntActNotes
CCDC155Q8N6L03EBI-10226985,EBI-749265
NOD2Q9HC292EBI-10226985,EBI-7445625

Protein-protein interaction databases

BioGridi108862. 26 interactors.
IntActiQ10471. 17 interactors.
MINTiMINT-3026412.
STRINGi9606.ENSP00000355632.

Structurei

Secondary structure

1571
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Helixi78 – 80Combined sources3
Helixi83 – 87Combined sources5
Helixi88 – 90Combined sources3
Turni98 – 102Combined sources5
Helixi106 – 111Combined sources6
Helixi124 – 128Combined sources5
Beta strandi138 – 146Combined sources9
Helixi149 – 162Combined sources14
Helixi165 – 167Combined sources3
Beta strandi168 – 175Combined sources8
Helixi182 – 185Combined sources4
Helixi186 – 189Combined sources4
Beta strandi193 – 197Combined sources5
Helixi204 – 214Combined sources11
Beta strandi217 – 223Combined sources7
Beta strandi225 – 229Combined sources5
Helixi235 – 243Combined sources9
Beta strandi247 – 256Combined sources10
Turni258 – 260Combined sources3
Beta strandi270 – 274Combined sources5
Beta strandi280 – 284Combined sources5
Helixi287 – 295Combined sources9
Beta strandi308 – 314Combined sources7
Helixi315 – 320Combined sources6
Beta strandi330 – 332Combined sources3
Helixi336 – 344Combined sources9
Beta strandi348 – 360Combined sources13
Beta strandi363 – 365Combined sources3
Turni366 – 368Combined sources3
Beta strandi369 – 371Combined sources3
Helixi373 – 376Combined sources4
Helixi378 – 388Combined sources11
Helixi390 – 392Combined sources3
Helixi393 – 399Combined sources7
Helixi401 – 403Combined sources3
Helixi412 – 420Combined sources9
Helixi426 – 432Combined sources7
Beta strandi445 – 452Combined sources8
Beta strandi455 – 458Combined sources4
Turni463 – 465Combined sources3
Beta strandi469 – 472Combined sources4
Helixi478 – 480Combined sources3
Beta strandi482 – 484Combined sources3
Beta strandi490 – 492Combined sources3
Beta strandi495 – 498Combined sources4
Beta strandi509 – 512Combined sources4
Helixi518 – 520Combined sources3
Beta strandi522 – 525Combined sources4
Turni526 – 529Combined sources4
Beta strandi530 – 533Combined sources4
Beta strandi536 – 541Combined sources6
Helixi545 – 547Combined sources3
Beta strandi551 – 554Combined sources4
Helixi559 – 561Combined sources3
Beta strandi564 – 568Combined sources5

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
2FFUX-ray1.64A75-571[»]
2FFVX-ray2.75A/B75-571[»]
4D0TX-ray2.45A/B/C/D/E/F1-571[»]
4D0ZX-ray2.20A/B/C/D/E/F1-571[»]
4D11X-ray2.85A/B/C/D/E/F1-571[»]
5AJNX-ray1.67A1-571[»]
5AJOX-ray1.48A1-571[»]
5AJPX-ray1.65A1-571[»]
5FV9X-ray2.07A/B/C/D/E/F1-571[»]
ProteinModelPortaliQ10471.
SMRiQ10471.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiQ10471.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini443 – 566Ricin B-type lectinPROSITE-ProRule annotationAdd BLAST124

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni135 – 240Catalytic subdomain AAdd BLAST106
Regioni300 – 362Catalytic subdomain BAdd BLAST63

Domaini

There are two conserved domains in the glycosyltransferase region: the N-terminal domain (domain A, also called GT1 motif), which is probably involved in manganese coordination and substrate binding and the C-terminal domain (domain B, also called Gal/GalNAc-T motif), which is probably involved in catalytic reaction and UDP-Gal binding.By similarity
The ricin B-type lectin domain binds to GalNAc and contributes to the glycopeptide specificity.By similarity

Sequence similaritiesi

Contains 1 ricin B-type lectin domain.PROSITE-ProRule annotation

Keywords - Domaini

Signal-anchor, Transmembrane, Transmembrane helix

Phylogenomic databases

eggNOGiKOG3738. Eukaryota.
ENOG410XPRX. LUCA.
GeneTreeiENSGT00760000118828.
HOGENOMiHOG000038227.
HOVERGENiHBG051699.
InParanoidiQ10471.
KOiK00710.
OMAiGKVRWPD.
OrthoDBiEOG091G085O.
PhylomeDBiQ10471.
TreeFamiTF313267.

Family and domain databases

Gene3Di3.90.550.10. 1 hit.
InterProiIPR001173. Glyco_trans_2-like.
IPR029044. Nucleotide-diphossugar_trans.
IPR000772. Ricin_B_lectin.
[Graphical view]
PfamiPF00535. Glycos_transf_2. 1 hit.
PF00652. Ricin_B_lectin. 1 hit.
[Graphical view]
SMARTiSM00458. RICIN. 1 hit.
[Graphical view]
SUPFAMiSSF50370. SSF50370. 1 hit.
SSF53448. SSF53448. 1 hit.
PROSITEiPS50231. RICIN_B_LECTIN. 1 hit.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q10471-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MRRRSRMLLC FAFLWVLGIA YYMYSGGGSA LAGGAGGGAG RKEDWNEIDP
60 70 80 90 100
IKKKDLHHSN GEEKAQSMET LPPGKVRWPD FNQEAYVGGT MVRSGQDPYA
110 120 130 140 150
RNKFNQVESD KLRMDRAIPD TRHDQCQRKQ WRVDLPATSV VITFHNEARS
160 170 180 190 200
ALLRTVVSVL KKSPPHLIKE IILVDDYSND PEDGALLGKI EKVRVLRNDR
210 220 230 240 250
REGLMRSRVR GADAAQAKVL TFLDSHCECN EHWLEPLLER VAEDRTRVVS
260 270 280 290 300
PIIDVINMDN FQYVGASADL KGGFDWNLVF KWDYMTPEQR RSRQGNPVAP
310 320 330 340 350
IKTPMIAGGL FVMDKFYFEE LGKYDMMMDV WGGENLEISF RVWQCGGSLE
360 370 380 390 400
IIPCSRVGHV FRKQHPYTFP GGSGTVFARN TRRAAEVWMD EYKNFYYAAV
410 420 430 440 450
PSARNVPYGN IQSRLELRKK LSCKPFKWYL ENVYPELRVP DHQDIAFGAL
460 470 480 490 500
QQGTNCLDTL GHFADGVVGV YECHNAGGNQ EWALTKEKSV KHMDLCLTVV
510 520 530 540 550
DRAPGSLIKL QGCRENDSRQ KWEQIEGNSK LRHVGSNLCL DSRTAKSGGL
560 570
SVEVCGPALS QQWKFTLNLQ Q
Length:571
Mass (Da):64,733
Last modified:November 1, 1996 - v1
Checksum:iD9A0F5D17C55BAF2
GO
Isoform 2 (identifier: Q10471-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-90: Missing.
     303-355: TPMIAGGLFV...GGSLEIIPCS → DLVPRVAVWW...HPPGSRGLDG
     356-571: Missing.

Note: No experimental confirmation available.
Show »
Length:265
Mass (Da):30,256
Checksum:i02CD3D8D4652ED95
GO

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti70T → G AA sequence (PubMed:7592619).Curated1
Sequence conflicti78W → D AA sequence (PubMed:7592619).Curated1
Sequence conflicti93R → G AA sequence (PubMed:7592619).Curated1
Sequence conflicti210R → W AA sequence (PubMed:7592619).Curated1
Sequence conflicti290 – 291RR → SC AA sequence (PubMed:7592619).Curated2
Sequence conflicti293R → Q AA sequence (PubMed:7592619).Curated1
Sequence conflicti300P → H AA sequence (PubMed:7592619).Curated1
Sequence conflicti522W → A AA sequence (PubMed:7592619).Curated1
Sequence conflicti533H → M AA sequence (PubMed:7592619).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_049240245R → H.Corresponds to variant rs1923950dbSNPEnsembl.1
Natural variantiVAR_019575554V → M.1 PublicationCorresponds to variant rs2273970dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0564911 – 90Missing in isoform 2. 1 PublicationAdd BLAST90
Alternative sequenceiVSP_056492303 – 355TPMIA…IIPCS → DLVPRVAVWWQPGDHPVQPC GTRVPEAAPLHVPGWQWHCL CPKHPPGSRGLDG in isoform 2. 1 PublicationAdd BLAST53
Alternative sequenceiVSP_056493356 – 571Missing in isoform 2. 1 PublicationAdd BLAST216

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
X85019 mRNA. Translation: CAA59381.1.
AK290048 mRNA. Translation: BAF82737.1.
AK304029 mRNA. Translation: BAH14094.1.
AL592228 Genomic DNA. No translation available.
AL078646, AL117349, AL136988 Genomic DNA. Translation: CAC00585.2.
AL117349, AL078646, AL136988 Genomic DNA. Translation: CAI22902.1.
AL136988, AL078646, AL117349 Genomic DNA. Translation: CAI23447.1.
FJ515852 Genomic DNA. Translation: ACS13744.1.
CH471098 Genomic DNA. Translation: EAW69911.1.
BC041120 mRNA. Translation: AAH41120.1.
CCDSiCCDS1582.1. [Q10471-1]
PIRiI37405.
RefSeqiNP_001278795.1. NM_001291866.1.
NP_004472.1. NM_004481.4. [Q10471-1]
UniGeneiHs.743964.

Genome annotation databases

EnsembliENST00000366672; ENSP00000355632; ENSG00000143641. [Q10471-1]
GeneIDi2590.
KEGGihsa:2590.
UCSCiuc010pwa.2. human. [Q10471-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Web resourcesi

Functional Glycomics Gateway - GTase

Polypeptide N-acetylgalactosaminyltransferase 2

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
X85019 mRNA. Translation: CAA59381.1.
AK290048 mRNA. Translation: BAF82737.1.
AK304029 mRNA. Translation: BAH14094.1.
AL592228 Genomic DNA. No translation available.
AL078646, AL117349, AL136988 Genomic DNA. Translation: CAC00585.2.
AL117349, AL078646, AL136988 Genomic DNA. Translation: CAI22902.1.
AL136988, AL078646, AL117349 Genomic DNA. Translation: CAI23447.1.
FJ515852 Genomic DNA. Translation: ACS13744.1.
CH471098 Genomic DNA. Translation: EAW69911.1.
BC041120 mRNA. Translation: AAH41120.1.
CCDSiCCDS1582.1. [Q10471-1]
PIRiI37405.
RefSeqiNP_001278795.1. NM_001291866.1.
NP_004472.1. NM_004481.4. [Q10471-1]
UniGeneiHs.743964.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
2FFUX-ray1.64A75-571[»]
2FFVX-ray2.75A/B75-571[»]
4D0TX-ray2.45A/B/C/D/E/F1-571[»]
4D0ZX-ray2.20A/B/C/D/E/F1-571[»]
4D11X-ray2.85A/B/C/D/E/F1-571[»]
5AJNX-ray1.67A1-571[»]
5AJOX-ray1.48A1-571[»]
5AJPX-ray1.65A1-571[»]
5FV9X-ray2.07A/B/C/D/E/F1-571[»]
ProteinModelPortaliQ10471.
SMRiQ10471.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi108862. 26 interactors.
IntActiQ10471. 17 interactors.
MINTiMINT-3026412.
STRINGi9606.ENSP00000355632.

Protein family/group databases

CAZyiCBM13. Carbohydrate-Binding Module Family 13.
GT27. Glycosyltransferase Family 27.

PTM databases

iPTMnetiQ10471.
PhosphoSitePlusiQ10471.
SwissPalmiQ10471.

Polymorphism and mutation databases

BioMutaiGALNT2.
DMDMi51315838.

Proteomic databases

EPDiQ10471.
MaxQBiQ10471.
PaxDbiQ10471.
PeptideAtlasiQ10471.
PRIDEiQ10471.

Protocols and materials databases

DNASUi2590.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000366672; ENSP00000355632; ENSG00000143641. [Q10471-1]
GeneIDi2590.
KEGGihsa:2590.
UCSCiuc010pwa.2. human. [Q10471-1]

Organism-specific databases

CTDi2590.
DisGeNETi2590.
GeneCardsiGALNT2.
H-InvDBHIX0001682.
HGNCiHGNC:4124. GALNT2.
HPAiHPA011222.
MIMi602274. gene.
neXtProtiNX_Q10471.
OpenTargetsiENSG00000143641.
PharmGKBiPA28537.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG3738. Eukaryota.
ENOG410XPRX. LUCA.
GeneTreeiENSGT00760000118828.
HOGENOMiHOG000038227.
HOVERGENiHBG051699.
InParanoidiQ10471.
KOiK00710.
OMAiGKVRWPD.
OrthoDBiEOG091G085O.
PhylomeDBiQ10471.
TreeFamiTF313267.

Enzyme and pathway databases

UniPathwayiUPA00378.
BioCyciZFISH:HS07092-MONOMER.
BRENDAi2.4.1.41. 2681.
ReactomeiR-HSA-6811436. COPI-independent Golgi-to-ER retrograde traffic.
R-HSA-913709. O-linked glycosylation of mucins.

Miscellaneous databases

ChiTaRSiGALNT2. human.
EvolutionaryTraceiQ10471.
GeneWikiiGALNT2.
GenomeRNAii2590.
PROiQ10471.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000143641.
CleanExiHS_GALNT2.
GenevisibleiQ10471. HS.

Family and domain databases

Gene3Di3.90.550.10. 1 hit.
InterProiIPR001173. Glyco_trans_2-like.
IPR029044. Nucleotide-diphossugar_trans.
IPR000772. Ricin_B_lectin.
[Graphical view]
PfamiPF00535. Glycos_transf_2. 1 hit.
PF00652. Ricin_B_lectin. 1 hit.
[Graphical view]
SMARTiSM00458. RICIN. 1 hit.
[Graphical view]
SUPFAMiSSF50370. SSF50370. 1 hit.
SSF53448. SSF53448. 1 hit.
PROSITEiPS50231. RICIN_B_LECTIN. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiGALT2_HUMAN
AccessioniPrimary (citable) accession number: Q10471
Secondary accession number(s): A8K1Y3
, B7Z8V8, C5HU00, Q9NPY4
Entry historyi
Integrated into UniProtKB/Swiss-Prot: August 16, 2004
Last sequence update: November 1, 1996
Last modified: November 2, 2016
This is version 159 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Direct protein sequencing, Reference proteome

Documents

  1. Human chromosome 1
    Human chromosome 1: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. PATHWAY comments
    Index of metabolic and biosynthesis pathways
  6. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  7. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.