Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

GTPase-activating protein and VPS9 domain-containing protein 1

Gene

Gapvd1

Organism
Mus musculus (Mouse)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Acts both as a GTPase-activating protein (GAP) and a guanine nucleotide exchange factor (GEF), and participates in various processes such as endocytosis, insulin receptor internalization or LC2A4/GLUT4 trafficking. Acts as a GEF for the Ras-related protein RAB31 by exchanging bound GDP for free GTP, leading to regulate LC2A4/GLUT4 trafficking. In the absence of insulin, it maintains RAB31 in an active state and promotes a futile cycle between LC2A4/GLUT4 storage vesicles and early endosomes, retaining LC2A4/GLUT4 inside the cells. Upon insulin stimulation, it is translocated to the plasma membrane, releasing LC2A4/GLUT4 from intracellular storage vesicles. Also involved in EGFR trafficking and degradation, possibly by promoting EGFR ubiquitination and subsequent degradation by the proteasome. Has GEF activity for Rab5 and GAP activity for Ras.3 Publications

GO - Molecular functioni

  • cadherin binding involved in cell-cell adhesion Source: MGI
  • GTPase activating protein binding Source: UniProtKB
  • GTPase activator activity Source: UniProtKB-KW
  • guanyl-nucleotide exchange factor activity Source: UniProtKB

GO - Biological processi

  • endocytosis Source: UniProtKB-KW
  • regulation of protein transport Source: UniProtKB
  • signal transduction Source: InterPro
Complete GO annotation...

Keywords - Molecular functioni

GTPase activation, Guanine-nucleotide releasing factor

Keywords - Biological processi

Endocytosis

Enzyme and pathway databases

ReactomeiR-MMU-8856828. Clathrin-mediated endocytosis.
R-MMU-8876198. RAB GEFs exchange GTP for GDP on RABs.

Names & Taxonomyi

Protein namesi
Recommended name:
GTPase-activating protein and VPS9 domain-containing protein 1
Alternative name(s):
GAPex-5
Rab5-activating protein 6
Gene namesi
Name:Gapvd1
Synonyms:Gapex5, Kiaa1521
OrganismiMus musculus (Mouse)
Taxonomic identifieri10090 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeMusMus
Proteomesi
  • UP000000589 Componenti: Chromosome 2

Organism-specific databases

MGIiMGI:1913941. Gapvd1.

Subcellular locationi

GO - Cellular componenti

  • cell-cell adherens junction Source: MGI
  • cytosol Source: UniProtKB
  • endosome Source: UniProtKB-SubCell
  • membrane Source: UniProtKB-SubCell
Complete GO annotation...

Keywords - Cellular componenti

Endosome, Membrane

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00003247721 – 1458GTPase-activating protein and VPS9 domain-containing protein 1Add BLAST1458

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Modified residuei227PhosphoserineBy similarity1
Modified residuei390PhosphothreonineBy similarity1
Modified residuei458PhosphothreonineBy similarity1
Modified residuei466PhosphoserineBy similarity1
Modified residuei470PhosphothreonineBy similarity1
Modified residuei566PhosphoserineCombined sources1
Modified residuei569PhosphoserineCombined sources1
Modified residuei742PhosphoserineCombined sources1
Modified residuei746PhosphoserineBy similarity1
Modified residuei757PhosphoserineCombined sources1
Modified residuei762PhosphothreonineBy similarity1
Modified residuei766PhosphoserineBy similarity1
Modified residuei876PhosphoserineBy similarity1
Modified residuei902PhosphoserineCombined sources1
Modified residuei903PhosphoserineBy similarity1
Modified residuei908PhosphoserineCombined sources1
Modified residuei964PhosphoserineBy similarity1
Modified residuei1017PhosphoserineBy similarity1
Modified residuei1044PhosphoserineCombined sources1
Modified residuei1076PhosphoserineBy similarity1

Keywords - PTMi

Phosphoprotein

Proteomic databases

PaxDbiQ6PAR5.
PeptideAtlasiQ6PAR5.
PRIDEiQ6PAR5.

PTM databases

iPTMnetiQ6PAR5.
PhosphoSitePlusiQ6PAR5.

Expressioni

Tissue specificityi

Present in adipocytes and fibroblasts (at protein level). Ubiquitously expressed.1 Publication

Gene expression databases

BgeeiENSMUSG00000026867.
CleanExiMM_GAPVD1.
ExpressionAtlasiQ6PAR5. baseline and differential.
GenevisibleiQ6PAR5. MM.

Interactioni

Subunit structurei

Interacts with RAB5A (By similarity). Interacts with TRIP10/CIP4.By similarity1 Publication

GO - Molecular functioni

  • cadherin binding involved in cell-cell adhesion Source: MGI
  • GTPase activating protein binding Source: UniProtKB

Protein-protein interaction databases

BioGridi211649. 1 interactor.
IntActiQ6PAR5. 3 interactors.
MINTiMINT-4125168.
STRINGi10090.ENSMUSP00000028224.

Structurei

3D structure databases

ProteinModelPortaliQ6PAR5.
SMRiQ6PAR5.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini131 – 353Ras-GAPPROSITE-ProRule annotationAdd BLAST223
Domaini1318 – 1458VPS9PROSITE-ProRule annotationAdd BLAST141

Sequence similaritiesi

Belongs to the GAPVD1 family.Curated
Contains 1 Ras-GAP domain.PROSITE-ProRule annotation
Contains 1 VPS9 domain.PROSITE-ProRule annotation

Phylogenomic databases

eggNOGiENOG410IQ83. Eukaryota.
ENOG410XRXX. LUCA.
GeneTreeiENSGT00530000063341.
HOVERGENiHBG107936.
InParanoidiQ6PAR5.
OMAiCIINCIS.
OrthoDBiEOG091G00G1.
PhylomeDBiQ6PAR5.
TreeFamiTF105908.

Family and domain databases

Gene3Di1.10.506.10. 1 hit.
InterProiIPR001936. RasGAP_dom.
IPR008936. Rho_GTPase_activation_prot.
IPR003123. VPS9.
[Graphical view]
PfamiPF00616. RasGAP. 1 hit.
PF02204. VPS9. 1 hit.
[Graphical view]
SMARTiSM00167. VPS9. 1 hit.
[Graphical view]
SUPFAMiSSF48350. SSF48350. 1 hit.
PROSITEiPS50018. RAS_GTPASE_ACTIV_2. 1 hit.
PS51205. VPS9. 1 hit.
[Graphical view]

Sequences (6)i

Sequence statusi: Complete.

This entry describes 6 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q6PAR5-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MVKLDIHTLA HHLKQERLYV SSEKQLIQRL NADVLKTAEK LYRTAWIAKQ
60 70 80 90 100
QRINLDRLII TSAEASPAEC CQHAKILEDT QFVDGYKQLG FQETAYGEFL
110 120 130 140 150
SRLRENPRLI ASSLVAGEKL NQENTQSVIY TVFTSLYGNC IMQEDESYLL
160 170 180 190 200
QVLRYLIEFE LKESDNPRRL LRRGTCAFSI LFKLFSEGLF SAKLFLTATL
210 220 230 240 250
HEPIMQLLVE DEDHLETDPN KLIERFSPAQ QEKLFGEKGS DRFRQKVQEM
260 270 280 290 300
VDSNEAKLVA LVNKFIGYLK QNTYCFPHSL RWIVSQMYKT LSCVDRLEVG
310 320 330 340 350
EVRAMCTDLL LACFICPAVV NPEQYGIISD APINEVARFN LMQVGRLLQQ
360 370 380 390 400
LAMTGTEEGD PRTKNSLGKF DKGCVAAFLD VVIGGRAVET PPMSSVNLLE
410 420 430 440 450
GLSRTVVYIS YSQLITLVNF MKSVMSGDQL KEDRMALDNL LANLPQAKPG
460 470 480 490 500
KSSSLDMTPY STPQMSPATT PANKKNRLPI ATRSRSRSNM LMDLHMDHEG
510 520 530 540 550
SSQETIQEVQ PEEVLVISLG TGPQLTPGMM SENEVLNMQL SDGGQGDVPV
560 570 580 590 600
DENKLHGKPD KTLRFSLCSD NLEGISEGPS NRSNSVSSLD LEGESVSELG
610 620 630 640 650
AGPSGSNGVE ALQLLEHEQA TTQDNLDDKL RKFEIRDMMG LTDDRDISET
660 670 680 690 700
VSETWSTDVL GSDFDPNVDE DRLQEIAGAA AENVLGSLLC LPGSGSVLLD
710 720 730 740 750
PCTGSTISET TSEAWSVEVL PSDSEAPDLK QEERLQELES CSGLGSTSDD
760 770 780 790 800
TDVREVSSRP STPGLSVVSG ISATSEDIPN KIEDLRSECS SDFGGKDSVT
810 820 830 840 850
SPDMDDIAHG AHQLTSPPSQ SESLLAMFDP LSSHEGASAV VRPKVHYARP
860 870 880 890 900
SHPPPDPPIL EGAVGGNEAR LPNFGSHVLT AAEMEAFKQR HSYPERLVRS
910 920 930 940 950
RSSDIVSSVR RPMSDPSWNR RPGNEELPPA AATGATSLVA APHSSSSSPS
960 970 980 990 1000
KDSSRGETEE RKDSDDERSD RSRPWWRKRF VSAMPKAPIP FRKKEKQEKD
1010 1020 1030 1040 1050
KDDLGPDRFS TLTDEPSPRL SAQAQVAEDI LDKYRNAIKR TSPSEGAMAN
1060 1070 1080 1090 1100
DESAEVMGDG ESAHDSPREE ALQNISADDL PDSASQAAHP QDSAFSYRDV
1110 1120 1130 1140 1150
KKKLRLALCS ADSVAFPVLT HSTRNGLPDH TDPEDNEIVC FLKVQIAEAI
1160 1170 1180 1190 1200
NLQDKSLMAQ LQETMRCVCR FDNRTCRKLL ASIAEDYRKR APYIAYLTRC
1210 1220 1230 1240 1250
RQGLQTTQAH LERLLQRVLR DKEVANRYFT TVCVRLLLES KEKKIREFIQ
1260 1270 1280 1290 1300
DFQKLTAADD KTAQVEDFLQ FLYGVMAQDV IWQNASEEQL QDAQLAIERS
1310 1320 1330 1340 1350
VMNRIFKLAF YPNQDGDILR DQVLHEHIQR LSKVVTANHR ALQIPEVYLR
1360 1370 1380 1390 1400
EAPWPSAQSE IRTISAYKTP RDKVQCILRM CSTIMNLLSL ANEDSVPGAD
1410 1420 1430 1440 1450
DFVPVLVFVL IKANPPCLLS TVQYISSFYA SCLSGEESYW WMQFTAAVEF

IKTIDDRK
Length:1,458
Mass (Da):162,402
Last modified:March 18, 2008 - v2
Checksum:iCA3D2EAF22051FD7
GO
Isoform 2 (identifier: Q6PAR5-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     557-577: Missing.

Show »
Length:1,437
Mass (Da):160,111
Checksum:i99CB3966DC97D605
GO
Isoform 3 (identifier: Q6PAR5-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     373-379: GCVAAFL → VGMSVVS
     380-1458: Missing.

Show »
Length:379
Mass (Da):43,381
Checksum:i7123D700EE30972C
GO
Isoform 4 (identifier: Q6PAR5-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     480-480: I → IGQQLAAITAWDSSATNLTAHIPLVTPF
     557-577: Missing.

Show »
Length:1,464
Mass (Da):162,917
Checksum:iC34EEFDA11081BA7
GO
Isoform 5 (identifier: Q6PAR5-5) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     557-577: Missing.
     1202-1202: Missing.

Show »
Length:1,436
Mass (Da):159,982
Checksum:i4367B45EA4C2D537
GO
Isoform 6 (identifier: Q6PAR5-6) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1055-1055: Missing.

Show »
Length:1,457
Mass (Da):162,273
Checksum:i67DC7B5E4E46334D
GO

Sequence cautioni

The sequence AAH31478 differs from that shown. Contaminating sequence. Potential poly-A sequence.Curated
The sequence AAH43715 differs from that shown. Reason: Erroneous initiation.Curated
The sequence AAH48847 differs from that shown. Contaminating sequence. Potential poly-A sequence.Curated
The sequence AAH57164 differs from that shown. Reason: Erroneous initiation.Curated
The sequence BAB29377 differs from that shown. Reason: Erroneous initiation.Curated
The sequence BAC98191 differs from that shown. Reason: Erroneous initiation.Curated
The sequence BAE22277 differs from that shown. Reason: Erroneous initiation.Curated
The sequence BAE29251 differs from that shown. Reason: Frameshift at position 811.Curated
The sequence CAM15445 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence CAM15446 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence CAM15455 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence CAM15456 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence CAM24604 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence CAM24605 differs from that shown. Reason: Erroneous gene model prediction.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti581N → D in BAE29251 (PubMed:16141072).Curated1
Sequence conflicti661G → E in AAH31478 (PubMed:15489334).Curated1
Sequence conflicti893Y → H in BAE22277 (PubMed:16141072).Curated1
Sequence conflicti1257A → V in BAC98191 (PubMed:14621295).Curated1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_032363373 – 379GCVAAFL → VGMSVVS in isoform 3. 1 Publication7
Alternative sequenceiVSP_032364380 – 1458Missing in isoform 3. 1 PublicationAdd BLAST1079
Alternative sequenceiVSP_032365480I → IGQQLAAITAWDSSATNLTA HIPLVTPF in isoform 4. 1 Publication1
Alternative sequenceiVSP_032366557 – 577Missing in isoform 2, isoform 4 and isoform 5. 4 PublicationsAdd BLAST21
Alternative sequenceiVSP_0323671055Missing in isoform 6. 1 Publication1
Alternative sequenceiVSP_0323681202Missing in isoform 5. 1 Publication1

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
EF155419 mRNA. Translation: ABM68541.1.
AK129381 Transcribed RNA. Translation: BAC98191.1. Different initiation.
AK014474 mRNA. Translation: BAB29377.2. Different initiation.
AK088851 mRNA. Translation: BAC40613.1.
AK003521 mRNA. Translation: BAB22834.1.
AK134776 mRNA. Translation: BAE22277.1. Different initiation.
AK150026 mRNA. Translation: BAE29251.1. Frameshift.
AK165047 mRNA. Translation: BAE38017.1.
AL845262, AL929106 Genomic DNA. Translation: CAM15445.1. Sequence problems.
AL845262, AL929106 Genomic DNA. Translation: CAM15446.1. Sequence problems.
AL845262, AL929106 Genomic DNA. Translation: CAM15447.1.
AL845262 Genomic DNA. Translation: CAM15455.1. Sequence problems.
AL845262 Genomic DNA. Translation: CAM15456.1. Sequence problems.
AL929106, AL845262 Genomic DNA. Translation: CAM24604.1. Sequence problems.
AL929106, AL845262 Genomic DNA. Translation: CAM24605.1. Sequence problems.
AL929106, AL845262 Genomic DNA. Translation: CAM24606.1.
BC031478 mRNA. Translation: AAH31478.1. Sequence problems.
BC043715 mRNA. Translation: AAH43715.1. Different initiation.
BC048847 mRNA. Translation: AAH48847.1. Sequence problems.
BC057164 mRNA. Translation: AAH57164.1. Different initiation.
BC060123 mRNA. Translation: AAH60123.1.
CCDSiCCDS15949.1. [Q6PAR5-2]
RefSeqiNP_079985.2. NM_025709.2. [Q6PAR5-2]
XP_011237449.1. XM_011239147.1. [Q6PAR5-4]
XP_011237450.1. XM_011239148.2. [Q6PAR5-1]
UniGeneiMm.156452.
Mm.393397.

Genome annotation databases

EnsembliENSMUST00000028224; ENSMUSP00000028224; ENSMUSG00000026867. [Q6PAR5-2]
ENSMUST00000102800; ENSMUSP00000099864; ENSMUSG00000026867. [Q6PAR5-2]
ENSMUST00000113099; ENSMUSP00000108723; ENSMUSG00000026867. [Q6PAR5-1]
GeneIDi66691.
KEGGimmu:66691.
UCSCiuc008jiq.2. mouse. [Q6PAR5-2]
uc008jir.1. mouse. [Q6PAR5-3]
uc012bug.1. mouse. [Q6PAR5-4]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
EF155419 mRNA. Translation: ABM68541.1.
AK129381 Transcribed RNA. Translation: BAC98191.1. Different initiation.
AK014474 mRNA. Translation: BAB29377.2. Different initiation.
AK088851 mRNA. Translation: BAC40613.1.
AK003521 mRNA. Translation: BAB22834.1.
AK134776 mRNA. Translation: BAE22277.1. Different initiation.
AK150026 mRNA. Translation: BAE29251.1. Frameshift.
AK165047 mRNA. Translation: BAE38017.1.
AL845262, AL929106 Genomic DNA. Translation: CAM15445.1. Sequence problems.
AL845262, AL929106 Genomic DNA. Translation: CAM15446.1. Sequence problems.
AL845262, AL929106 Genomic DNA. Translation: CAM15447.1.
AL845262 Genomic DNA. Translation: CAM15455.1. Sequence problems.
AL845262 Genomic DNA. Translation: CAM15456.1. Sequence problems.
AL929106, AL845262 Genomic DNA. Translation: CAM24604.1. Sequence problems.
AL929106, AL845262 Genomic DNA. Translation: CAM24605.1. Sequence problems.
AL929106, AL845262 Genomic DNA. Translation: CAM24606.1.
BC031478 mRNA. Translation: AAH31478.1. Sequence problems.
BC043715 mRNA. Translation: AAH43715.1. Different initiation.
BC048847 mRNA. Translation: AAH48847.1. Sequence problems.
BC057164 mRNA. Translation: AAH57164.1. Different initiation.
BC060123 mRNA. Translation: AAH60123.1.
CCDSiCCDS15949.1. [Q6PAR5-2]
RefSeqiNP_079985.2. NM_025709.2. [Q6PAR5-2]
XP_011237449.1. XM_011239147.1. [Q6PAR5-4]
XP_011237450.1. XM_011239148.2. [Q6PAR5-1]
UniGeneiMm.156452.
Mm.393397.

3D structure databases

ProteinModelPortaliQ6PAR5.
SMRiQ6PAR5.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi211649. 1 interactor.
IntActiQ6PAR5. 3 interactors.
MINTiMINT-4125168.
STRINGi10090.ENSMUSP00000028224.

PTM databases

iPTMnetiQ6PAR5.
PhosphoSitePlusiQ6PAR5.

Proteomic databases

PaxDbiQ6PAR5.
PeptideAtlasiQ6PAR5.
PRIDEiQ6PAR5.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENSMUST00000028224; ENSMUSP00000028224; ENSMUSG00000026867. [Q6PAR5-2]
ENSMUST00000102800; ENSMUSP00000099864; ENSMUSG00000026867. [Q6PAR5-2]
ENSMUST00000113099; ENSMUSP00000108723; ENSMUSG00000026867. [Q6PAR5-1]
GeneIDi66691.
KEGGimmu:66691.
UCSCiuc008jiq.2. mouse. [Q6PAR5-2]
uc008jir.1. mouse. [Q6PAR5-3]
uc012bug.1. mouse. [Q6PAR5-4]

Organism-specific databases

CTDi26130.
MGIiMGI:1913941. Gapvd1.
RougeiSearch...

Phylogenomic databases

eggNOGiENOG410IQ83. Eukaryota.
ENOG410XRXX. LUCA.
GeneTreeiENSGT00530000063341.
HOVERGENiHBG107936.
InParanoidiQ6PAR5.
OMAiCIINCIS.
OrthoDBiEOG091G00G1.
PhylomeDBiQ6PAR5.
TreeFamiTF105908.

Enzyme and pathway databases

ReactomeiR-MMU-8856828. Clathrin-mediated endocytosis.
R-MMU-8876198. RAB GEFs exchange GTP for GDP on RABs.

Miscellaneous databases

PROiQ6PAR5.
SOURCEiSearch...

Gene expression databases

BgeeiENSMUSG00000026867.
CleanExiMM_GAPVD1.
ExpressionAtlasiQ6PAR5. baseline and differential.
GenevisibleiQ6PAR5. MM.

Family and domain databases

Gene3Di1.10.506.10. 1 hit.
InterProiIPR001936. RasGAP_dom.
IPR008936. Rho_GTPase_activation_prot.
IPR003123. VPS9.
[Graphical view]
PfamiPF00616. RasGAP. 1 hit.
PF02204. VPS9. 1 hit.
[Graphical view]
SMARTiSM00167. VPS9. 1 hit.
[Graphical view]
SUPFAMiSSF48350. SSF48350. 1 hit.
PROSITEiPS50018. RAS_GTPASE_ACTIV_2. 1 hit.
PS51205. VPS9. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiGAPD1_MOUSE
AccessioniPrimary (citable) accession number: Q6PAR5
Secondary accession number(s): A0PJI8
, A2AR09, A2AR10, A2AR17, A2AR18, Q3TNS1, Q3UDL0, Q3UYD5, Q6ZPP0, Q80V37, Q80ZK4, Q8BTS5, Q9CRS2, Q9CTI1
Entry historyi
Integrated into UniProtKB/Swiss-Prot: March 18, 2008
Last sequence update: March 18, 2008
Last modified: November 30, 2016
This is version 107 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. MGD cross-references
    Mouse Genome Database (MGD) cross-references in UniProtKB/Swiss-Prot
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.