Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

GPI ethanolamine phosphate transferase 2

Gene

PIGG

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Ethanolamine phosphate transferase involved in glycosylphosphatidylinositol-anchor biosynthesis. Transfers ethanolamine phosphate to the GPI second mannose.1 Publication

Pathwayi: glycosylphosphatidylinositol-anchor biosynthesis

This protein is involved in the pathway glycosylphosphatidylinositol-anchor biosynthesis, which is part of Glycolipid biosynthesis.
View all proteins of this organism that are known to be involved in the pathway glycosylphosphatidylinositol-anchor biosynthesis and in Glycolipid biosynthesis.

GO - Molecular functioni

GO - Biological processi

  • GPI anchor biosynthetic process Source: MGI
  • preassembly of GPI anchor in ER membrane Source: Reactome
Complete GO annotation...

Keywords - Molecular functioni

Transferase

Keywords - Biological processi

GPI-anchor biosynthesis

Enzyme and pathway databases

ReactomeiR-HSA-162710. Synthesis of glycosylphosphatidylinositol (GPI).
UniPathwayiUPA00196.

Names & Taxonomyi

Protein namesi
Recommended name:
GPI ethanolamine phosphate transferase 2 (EC:2.-.-.-)
Alternative name(s):
GPI7 homolog
Short name:
hGPI7
Phosphatidylinositol-glycan biosynthesis class G protein
Short name:
PIG-G
Gene namesi
Name:PIGG
Synonyms:GPI7
ORF Names:UNQ1930/PRO4405
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 4

Organism-specific databases

HGNCiHGNC:25985. PIGG.

Subcellular locationi

Topology

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Topological domaini1 – 431LumenalSequence analysisAdd BLAST431
Transmembranei432 – 452HelicalSequence analysisAdd BLAST21
Transmembranei471 – 491HelicalSequence analysisAdd BLAST21
Transmembranei506 – 526HelicalSequence analysisAdd BLAST21
Transmembranei552 – 572HelicalSequence analysisAdd BLAST21
Transmembranei699 – 719HelicalSequence analysisAdd BLAST21
Transmembranei721 – 741HelicalSequence analysisAdd BLAST21
Transmembranei752 – 772HelicalSequence analysisAdd BLAST21
Transmembranei789 – 809HelicalSequence analysisAdd BLAST21
Transmembranei812 – 832HelicalSequence analysisAdd BLAST21
Transmembranei879 – 899HelicalSequence analysisAdd BLAST21
Transmembranei919 – 939HelicalSequence analysisAdd BLAST21
Transmembranei955 – 975HelicalSequence analysisAdd BLAST21

GO - Cellular componenti

  • endoplasmic reticulum Source: MGI
  • endoplasmic reticulum membrane Source: Reactome
  • integral component of endoplasmic reticulum membrane Source: GO_Central
  • membrane Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Endoplasmic reticulum, Membrane

Pathology & Biotechi

Involvement in diseasei

Mental retardation, autosomal recessive 53 (MRT53)
The disease is caused by mutations affecting the gene represented in this entry. Cells from patients carrying PIGG disease-causing mutations show abnormal accumulation of the GPI precursors H7 and H7' and absence of mature GPI precursor H8, consistent with a loss of function. However, GPI-anchored proteins, including CD59, CD55, CD24 and CD16, are normally expressed at the cell surface of lymphocytes and granulocytes and CD59 exhibits sensitivity to bacterial phosphatidylinositol-specific phospholipase C, suggesting a normal structure. The role of PIGG in MRT53 etiology is not clear.1 Publication
Disease descriptionA form of mental retardation, a disorder characterized by significantly below average general intellectual functioning associated with impairments in adaptive behavior and manifested during the developmental period. Most MRT53 patients manifest severely delayed psychomotor development, hypotonia, and early-onset seizures. Additional features, such as cerebellar hypoplasia and ataxia have been observed in some patients.
See also OMIM:616917
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_076775669R → C in MRT53; found in a compound heterozygote that also carries a microdeletion encompassing PIGG; almost complete loss of ethanolamine phosphate transferase activity, as evidenced by abnormal accumulation of the GPI precursors H7 and H7' and absence of mature GPI precursor H8 in patient lymphocytes; does not affect protein expression levels in transfected HEK293 cells. 1 PublicationCorresponds to variant rs372392424dbSNPEnsembl.1

Keywords - Diseasei

Disease mutation, Mental retardation

Organism-specific databases

DisGeNETi54872.
MIMi616917. phenotype.
OpenTargetsiENSG00000174227.
PharmGKBiPA143485575.

Polymorphism and mutation databases

BioMutaiPIGG.
DMDMi74707851.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00002461851 – 983GPI ethanolamine phosphate transferase 2Add BLAST983

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Glycosylationi194N-linked (GlcNAc...)Sequence analysis1

Keywords - PTMi

Glycoprotein

Proteomic databases

MaxQBiQ5H8A4.
PaxDbiQ5H8A4.
PeptideAtlasiQ5H8A4.
PRIDEiQ5H8A4.

PTM databases

iPTMnetiQ5H8A4.
PhosphoSitePlusiQ5H8A4.

Expressioni

Gene expression databases

BgeeiENSG00000174227.
CleanExiHS_PIGG.
ExpressionAtlasiQ5H8A4. baseline and differential.
GenevisibleiQ5H8A4. HS.

Organism-specific databases

HPAiHPA015997.

Interactioni

Subunit structurei

Forms a complex with PIGF. PIGF is required to stabilize it. Competes with PIGO for the binding of PIGF.

Protein-protein interaction databases

BioGridi120220. 8 interactors.
IntActiQ5H8A4. 1 interactor.
MINTiMINT-3042035.
STRINGi9606.ENSP00000415203.

Structurei

3D structure databases

ProteinModelPortaliQ5H8A4.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Sequence similaritiesi

Belongs to the PIGG/PIGN/PIGO family. PIGG subfamily.Curated

Keywords - Domaini

Transmembrane, Transmembrane helix

Phylogenomic databases

eggNOGiKOG2125. Eukaryota.
COG1524. LUCA.
GeneTreeiENSGT00840000129928.
HOGENOMiHOG000171439.
HOVERGENiHBG075245.
InParanoidiQ5H8A4.
KOiK05310.
OMAiWYYLGNT.
OrthoDBiEOG091G04HZ.
PhylomeDBiQ5H8A4.
TreeFamiTF300609.

Family and domain databases

Gene3Di3.40.720.10. 1 hit.
InterProiIPR017849. Alkaline_Pase-like_a/b/a.
IPR017850. Alkaline_phosphatase_core.
IPR002591. Phosphodiest/P_Trfase.
[Graphical view]
PfamiPF01663. Phosphodiest. 1 hit.
[Graphical view]
SUPFAMiSSF53649. SSF53649. 1 hit.

Sequences (6)i

Sequence statusi: Complete.

This entry describes 6 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q5H8A4-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MRLGSGTFAT CCVAIEVLGI AVFLRGFFPA PVRSSARAEH GAEPPAPEPS
60 70 80 90 100
AGASSNWTTL PPPLFSKVVI VLIDALRDDF VFGSKGVKFM PYTTYLVEKG
110 120 130 140 150
ASHSFVAEAK PPTVTMPRIK ALMTGSLPGF VDVIRNLNSP ALLEDSVIRQ
160 170 180 190 200
AKAAGKRIVF YGDETWVKLF PKHFVEYDGT TSFFVSDYTE VDNNVTRHLD
210 220 230 240 250
KVLKRGDWDI LILHYLGLDH IGHISGPNSP LIGQKLSEMD SVLMKIHTSL
260 270 280 290 300
QSKERETPLP NLLVLCGDHG MSETGSHGAS STEEVNTPLI LISSAFERKP
310 320 330 340 350
GDIRHPKHVQ QTDVAATLAI ALGLPIPKDS VGSLLFPVVE GRPMREQLRF
360 370 380 390 400
LHLNTVQLSK LLQENVPSYE KDPGFEQFKM SERLHGNWIR LYLEEKHSEV
410 420 430 440 450
LFNLGSKVLR QYLDALKTLS LSLSAQVAQY DIYSMMVGTV VVLEVLTLLL
460 470 480 490 500
LSVPQALRRK AELEVPLSSP GFSLLFYLVI LVLSAVHVIV CTSAESSCYF
510 520 530 540 550
CGLSWLAAGG VMVLASALLC VIVSVLTNVL VGGNTPRKNP MHPSSRWSEL
560 570 580 590 600
DLLILLGTAG HVLSLGASSF VEEEHQTWYF LVNTLCLALS QETYRNYFLG
610 620 630 640 650
DDGEPPCGLC VEQGHDGATA AWQDGPGCDV LERDKGHGSP STSEVLRGRE
660 670 680 690 700
KWMVLASPWL ILACCRLLRS LNQTGVQWAH RPDLGHWLTS SDHKAELSVL
710 720 730 740 750
AALSLLVVFV LVQRGCSPVS KAALALGLLG VYCYRAAIGS VRFPWRPDSK
760 770 780 790 800
DISKGIIEAR FVYVFVLGIL FTGTKDLLKS QVIAADFKLK TVGLWEIYSG
810 820 830 840 850
LVLLAALLFR PHNLPVLAFS LLIQTLMTKF IWKPLRHDAA EITVMHYWFG
860 870 880 890 900
QAFFYFQGNS NNIATVDISA GFVGLDTYVE IPAVLLTAFG TYAGPVLWAS
910 920 930 940 950
HLVHFLSSET RSGSALSHAC FCYALICSIP VFTYIVLVTS LRYHLFIWSV
960 970 980
FSPKLLYEGM HLLITAAVCV FFTAMDQTRL TQS
Length:983
Mass (Da):108,173
Last modified:March 1, 2005 - v1
Checksum:i18D5DF737B000D2D
GO
Isoform 2 (identifier: Q5H8A4-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     437-444: Missing.

Note: No experimental confirmation available.
Show »
Length:975
Mass (Da):107,376
Checksum:i1D597136E243863E
GO
Isoform 3 (identifier: Q5H8A4-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     120-252: Missing.

Note: No experimental confirmation available.
Show »
Length:850
Mass (Da):93,251
Checksum:i731AF8C6140D36C0
GO
Isoform 4 (identifier: Q5H8A4-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     430-463: YDIYSMMVGTVVVLEVLTLLLLSVPQALRRKAEL → FSPCSCSASHRHCTERLSWKSHCHLLGFLCSFIW
     464-983: Missing.

Note: May be produced at very low levels due to a premature stop codon in the mRNA, leading to nonsense-mediated mRNA decay.
Show »
Length:463
Mass (Da):51,267
Checksum:i02EDEEBFE96AC294
GO
Isoform 5 (identifier: Q5H8A4-5) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-89: Missing.
     372-388: DPGFEQFKMSERLHGNW → GSHPAPAQRPTGTAQKG
     389-983: Missing.

Note: No experimental confirmation available.
Show »
Length:299
Mass (Da):32,913
Checksum:i73D96DB53EA11006
GO
Isoform 6 (identifier: Q5H8A4-6) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-122: Missing.
     430-463: YDIYSMMVGTVVVLEVLTLLLLSVPQALRRKAEL → FSPCSCSASHRHCAERLSWKSHCHLLGFLCSFIW
     464-983: Missing.

Note: No experimental confirmation available.
Show »
Length:341
Mass (Da):38,290
Checksum:i9DC3B4F0B1EC6633
GO

Sequence cautioni

The sequence AAQ88902 differs from that shown. Reason: Erroneous termination at position 311. Translated as Gln.Curated
The sequence BAA91046 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence BAB55130 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti624D → G in BAC11227 (PubMed:16303743).Curated1
Sequence conflicti889F → L in BAC11227 (PubMed:16303743).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_05768055S → Y.Corresponds to variant rs34120878dbSNPEnsembl.1
Natural variantiVAR_027022458R → H.1 PublicationCorresponds to variant rs13115344dbSNPEnsembl.1
Natural variantiVAR_027023610C → R.1 PublicationCorresponds to variant rs7666425dbSNPEnsembl.1
Natural variantiVAR_076775669R → C in MRT53; found in a compound heterozygote that also carries a microdeletion encompassing PIGG; almost complete loss of ethanolamine phosphate transferase activity, as evidenced by abnormal accumulation of the GPI precursors H7 and H7' and absence of mature GPI precursor H8 in patient lymphocytes; does not affect protein expression levels in transfected HEK293 cells. 1 PublicationCorresponds to variant rs372392424dbSNPEnsembl.1
Natural variantiVAR_027024699V → I.2 PublicationsCorresponds to variant rs13114026dbSNPEnsembl.1
Natural variantiVAR_060086731V → I.Corresponds to variant rs34916638dbSNPEnsembl.1
Natural variantiVAR_060087881I → T.Corresponds to variant rs34623004dbSNPEnsembl.1
Natural variantiVAR_027025932F → S.Corresponds to variant rs1127410dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0543871 – 122Missing in isoform 6. 1 PublicationAdd BLAST122
Alternative sequenceiVSP_0198271 – 89Missing in isoform 5. 1 PublicationAdd BLAST89
Alternative sequenceiVSP_019828120 – 252Missing in isoform 3. 1 PublicationAdd BLAST133
Alternative sequenceiVSP_019829372 – 388DPGFE…LHGNW → GSHPAPAQRPTGTAQKG in isoform 5. 1 PublicationAdd BLAST17
Alternative sequenceiVSP_019830389 – 983Missing in isoform 5. 1 PublicationAdd BLAST595
Alternative sequenceiVSP_019831430 – 463YDIYS…RKAEL → FSPCSCSASHRHCTERLSWK SHCHLLGFLCSFIW in isoform 4. 1 PublicationAdd BLAST34
Alternative sequenceiVSP_054388430 – 463YDIYS…RKAEL → FSPCSCSASHRHCAERLSWK SHCHLLGFLCSFIW in isoform 6. 1 PublicationAdd BLAST34
Alternative sequenceiVSP_019832437 – 444Missing in isoform 2. 1 Publication8
Alternative sequenceiVSP_019833464 – 983Missing in isoform 4 and isoform 6. 2 PublicationsAdd BLAST520

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB162713 mRNA. Translation: BAD89023.1.
AY358538 mRNA. Translation: AAQ88902.1. Sequence problems.
AK074815 mRNA. Translation: BAC11227.1.
AK000272 mRNA. Translation: BAA91046.1. Different initiation.
AK027465 mRNA. Translation: BAB55130.1. Different initiation.
AK097244 mRNA. Translation: BAC04984.1.
AK296507 mRNA. Translation: BAG59139.1.
AC092574 Genomic DNA. No translation available.
AC116565 Genomic DNA. No translation available.
BC000937 mRNA. Translation: AAH00937.2.
BC001249 mRNA. Translation: AAH01249.2.
BC110878 mRNA. Translation: AAI10879.1.
CCDSiCCDS3336.1. [Q5H8A4-2]
CCDS46992.1. [Q5H8A4-1]
CCDS75080.1. [Q5H8A4-3]
CCDS75083.1. [Q5H8A4-5]
RefSeqiNP_001120650.1. NM_001127178.2. [Q5H8A4-1]
NP_001275980.1. NM_001289051.1.
NP_001275981.1. NM_001289052.1. [Q5H8A4-3]
NP_001275984.1. NM_001289055.1. [Q5H8A4-6]
NP_001275986.1. NM_001289057.1. [Q5H8A4-5]
NP_060203.3. NM_017733.4. [Q5H8A4-2]
UniGeneiHs.7099.

Genome annotation databases

EnsembliENST00000310340; ENSP00000311750; ENSG00000174227. [Q5H8A4-2]
ENST00000383028; ENSP00000372494; ENSG00000174227. [Q5H8A4-3]
ENST00000453061; ENSP00000415203; ENSG00000174227. [Q5H8A4-1]
ENST00000503111; ENSP00000426002; ENSG00000174227. [Q5H8A4-5]
GeneIDi54872.
KEGGihsa:54872.
UCSCiuc003gaj.6. human. [Q5H8A4-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB162713 mRNA. Translation: BAD89023.1.
AY358538 mRNA. Translation: AAQ88902.1. Sequence problems.
AK074815 mRNA. Translation: BAC11227.1.
AK000272 mRNA. Translation: BAA91046.1. Different initiation.
AK027465 mRNA. Translation: BAB55130.1. Different initiation.
AK097244 mRNA. Translation: BAC04984.1.
AK296507 mRNA. Translation: BAG59139.1.
AC092574 Genomic DNA. No translation available.
AC116565 Genomic DNA. No translation available.
BC000937 mRNA. Translation: AAH00937.2.
BC001249 mRNA. Translation: AAH01249.2.
BC110878 mRNA. Translation: AAI10879.1.
CCDSiCCDS3336.1. [Q5H8A4-2]
CCDS46992.1. [Q5H8A4-1]
CCDS75080.1. [Q5H8A4-3]
CCDS75083.1. [Q5H8A4-5]
RefSeqiNP_001120650.1. NM_001127178.2. [Q5H8A4-1]
NP_001275980.1. NM_001289051.1.
NP_001275981.1. NM_001289052.1. [Q5H8A4-3]
NP_001275984.1. NM_001289055.1. [Q5H8A4-6]
NP_001275986.1. NM_001289057.1. [Q5H8A4-5]
NP_060203.3. NM_017733.4. [Q5H8A4-2]
UniGeneiHs.7099.

3D structure databases

ProteinModelPortaliQ5H8A4.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi120220. 8 interactors.
IntActiQ5H8A4. 1 interactor.
MINTiMINT-3042035.
STRINGi9606.ENSP00000415203.

PTM databases

iPTMnetiQ5H8A4.
PhosphoSitePlusiQ5H8A4.

Polymorphism and mutation databases

BioMutaiPIGG.
DMDMi74707851.

Proteomic databases

MaxQBiQ5H8A4.
PaxDbiQ5H8A4.
PeptideAtlasiQ5H8A4.
PRIDEiQ5H8A4.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000310340; ENSP00000311750; ENSG00000174227. [Q5H8A4-2]
ENST00000383028; ENSP00000372494; ENSG00000174227. [Q5H8A4-3]
ENST00000453061; ENSP00000415203; ENSG00000174227. [Q5H8A4-1]
ENST00000503111; ENSP00000426002; ENSG00000174227. [Q5H8A4-5]
GeneIDi54872.
KEGGihsa:54872.
UCSCiuc003gaj.6. human. [Q5H8A4-1]

Organism-specific databases

CTDi54872.
DisGeNETi54872.
GeneCardsiPIGG.
HGNCiHGNC:25985. PIGG.
HPAiHPA015997.
MIMi616917. phenotype.
616918. gene.
neXtProtiNX_Q5H8A4.
OpenTargetsiENSG00000174227.
PharmGKBiPA143485575.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG2125. Eukaryota.
COG1524. LUCA.
GeneTreeiENSGT00840000129928.
HOGENOMiHOG000171439.
HOVERGENiHBG075245.
InParanoidiQ5H8A4.
KOiK05310.
OMAiWYYLGNT.
OrthoDBiEOG091G04HZ.
PhylomeDBiQ5H8A4.
TreeFamiTF300609.

Enzyme and pathway databases

UniPathwayiUPA00196.
ReactomeiR-HSA-162710. Synthesis of glycosylphosphatidylinositol (GPI).

Miscellaneous databases

ChiTaRSiPIGG. human.
GenomeRNAii54872.
PROiQ5H8A4.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000174227.
CleanExiHS_PIGG.
ExpressionAtlasiQ5H8A4. baseline and differential.
GenevisibleiQ5H8A4. HS.

Family and domain databases

Gene3Di3.40.720.10. 1 hit.
InterProiIPR017849. Alkaline_Pase-like_a/b/a.
IPR017850. Alkaline_phosphatase_core.
IPR002591. Phosphodiest/P_Trfase.
[Graphical view]
PfamiPF01663. Phosphodiest. 1 hit.
[Graphical view]
SUPFAMiSSF53649. SSF53649. 1 hit.
ProtoNetiSearch...

Entry informationi

Entry nameiPIGG_HUMAN
AccessioniPrimary (citable) accession number: Q5H8A4
Secondary accession number(s): B4DKC7
, Q2TAK5, Q6UX31, Q7L5Y4, Q8N866, Q8NCC9, Q96SY9, Q9BVT7, Q9NXG5
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 25, 2006
Last sequence update: March 1, 2005
Last modified: November 2, 2016
This is version 102 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 4
    Human chromosome 4: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. PATHWAY comments
    Index of metabolic and biosynthesis pathways
  6. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.