Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Rho guanine nucleotide exchange factor 1

Gene

ARHGEF1

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Seems to play a role in the regulation of RhoA GTPase by guanine nucleotide-binding alpha-12 (GNA12) and alpha-13 (GNA13) subunits. Acts as GTPase-activating protein (GAP) for GNA12 and GNA13, and as guanine nucleotide exchange factor (GEF) for RhoA GTPase. Activated G alpha 13/GNA13 stimulates the RhoGEF activity through interaction with the RGS-like domain. This GEF activity is inhibited by binding to activated GNA12. Mediates angiotensin-2-induced RhoA activation.4 Publications

GO - Molecular functioni

GO - Biological processi

  • cell proliferation Source: ProtInc
  • negative regulation of axonogenesis Source: Reactome
  • regulation of Rho protein signal transduction Source: InterPro
  • Rho protein signal transduction Source: ProtInc
Complete GO annotation...

Keywords - Molecular functioni

GTPase activation, Guanine-nucleotide releasing factor

Enzyme and pathway databases

ReactomeiR-HSA-193634. Axonal growth inhibition (RHOA activation).
R-HSA-193648. NRAGE signals death through JNK.
R-HSA-194840. Rho GTPase cycle.
R-HSA-416482. G alpha (12/13) signalling events.
SIGNORiQ92888.

Names & Taxonomyi

Protein namesi
Recommended name:
Rho guanine nucleotide exchange factor 1
Alternative name(s):
115 kDa guanine nucleotide exchange factor
Short name:
p115-RhoGEF
Short name:
p115RhoGEF
Sub1.5
Gene namesi
Name:ARHGEF1
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 19

Organism-specific databases

HGNCiHGNC:681. ARHGEF1.

Subcellular locationi

  • Cytoplasm 1 Publication
  • Membrane 1 Publication

  • Note: Translocated to the membrane by activated GNA13 or LPA stimulation.

GO - Cellular componenti

Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Membrane

Pathology & Biotechi

Mutagenesis

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Mutagenesisi487Y → F: No effect. 1 Publication1
Mutagenesisi738Y → F: Lowers the exchange activity. 1 Publication1

Organism-specific databases

DisGeNETi9138.
OpenTargetsiENSG00000076928.
PharmGKBiPA24966.

Polymorphism and mutation databases

BioMutaiARHGEF1.
DMDMi34395524.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00000809061 – 912Rho guanine nucleotide exchange factor 1Add BLAST912

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Modified residuei374PhosphoserineCombined sources1
Modified residuei409PhosphoserineCombined sources1
Modified residuei695PhosphothreonineCombined sources1
Modified residuei738Phosphotyrosine; by JAK21 Publication1
Modified residuei863PhosphoserineCombined sources1

Post-translational modificationi

Phosphorylated by PKCA. Angiotensin-2 induced Tyr-738 phosphorylation is mediated by JAK2.2 Publications

Keywords - PTMi

Phosphoprotein

Proteomic databases

EPDiQ92888.
MaxQBiQ92888.
PaxDbiQ92888.
PeptideAtlasiQ92888.
PRIDEiQ92888.

PTM databases

iPTMnetiQ92888.
PhosphoSitePlusiQ92888.

Miscellaneous databases

PMAP-CutDBQ92888.

Expressioni

Tissue specificityi

Ubiquitously expressed.2 Publications

Gene expression databases

BgeeiENSG00000076928.
CleanExiHS_ARHGEF1.
ExpressionAtlasiQ92888. baseline and differential.
GenevisibleiQ92888. HS.

Organism-specific databases

HPAiCAB009502.
HPA012924.
HPA060784.

Interactioni

Subunit structurei

Interacts with RHOA, GNA12 and GNA13. Homooligomerizes through the coiled coil region. May interact with CCPG1 (By similarity). Interacts with CTNNAL1.By similarity3 Publications

Binary interactionsi

WithEntry#Exp.IntActNotes
GNA13Q143443EBI-465400,EBI-465387
TCF4P158843EBI-465400,EBI-533224
Z5115Q7DB747EBI-465400,EBI-7864788From a different organism.

Protein-protein interaction databases

BioGridi114585. 30 interactors.
IntActiQ92888. 13 interactors.
MINTiMINT-2813461.
STRINGi9606.ENSP00000337261.

Structurei

Secondary structure

1912
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Beta strandi18 – 20Combined sources3
Turni26 – 33Combined sources8
Helixi49 – 52Combined sources4
Helixi56 – 69Combined sources14
Helixi73 – 84Combined sources12
Helixi89 – 103Combined sources15
Helixi116 – 122Combined sources7
Turni127 – 129Combined sources3
Helixi132 – 144Combined sources13
Helixi147 – 162Combined sources16
Helixi169 – 176Combined sources8
Helixi183 – 203Combined sources21
Helixi205 – 207Combined sources3
Helixi212 – 228Combined sources17
Turni399 – 401Combined sources3
Helixi404 – 407Combined sources4
Helixi413 – 441Combined sources29
Helixi443 – 449Combined sources7
Helixi454 – 460Combined sources7
Helixi464 – 484Combined sources21
Helixi493 – 500Combined sources8
Helixi502 – 517Combined sources16
Helixi519 – 532Combined sources14
Helixi534 – 544Combined sources11
Helixi547 – 549Combined sources3
Helixi554 – 557Combined sources4
Helixi560 – 577Combined sources18
Helixi582 – 621Combined sources40
Helixi625 – 629Combined sources5
Helixi633 – 635Combined sources3
Beta strandi636 – 638Combined sources3
Beta strandi644 – 646Combined sources3
Beta strandi648 – 661Combined sources14
Beta strandi663 – 681Combined sources19
Beta strandi684 – 686Combined sources3
Beta strandi694 – 697Combined sources4
Beta strandi706 – 709Combined sources4
Helixi710 – 712Combined sources3
Beta strandi713 – 717Combined sources5
Beta strandi719 – 721Combined sources3
Beta strandi724 – 729Combined sources6
Beta strandi737 – 741Combined sources5
Helixi745 – 760Combined sources16

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1IAPX-ray1.90A42-252[»]
1SHZX-ray2.85C/F7-239[»]
3AB3X-ray2.40B/D1-233[»]
3ODOX-ray2.90A/B395-766[»]
3ODWX-ray3.20A/B240-766[»]
3ODXX-ray3.20A/B353-766[»]
3P6AX-ray2.50A/B395-766[»]
ProteinModelPortaliQ92888.
SMRiQ92888.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiQ92888.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini41 – 232RGSLAdd BLAST192
Domaini416 – 605DHPROSITE-ProRule annotationAdd BLAST190
Domaini647 – 760PHPROSITE-ProRule annotationAdd BLAST114

Coiled coil

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Coiled coili865 – 896Sequence analysisAdd BLAST32

Domaini

The RGSL domain, also known as rgRGS domain, is necessary but not sufficient for GAP activity.
The DH domain is involved in interaction with CCPG1.By similarity

Sequence similaritiesi

Contains 1 DH (DBL-homology) domain.PROSITE-ProRule annotation
Contains 1 PH domain.PROSITE-ProRule annotation
Contains 1 RGSL (RGS-like) domain.Curated

Keywords - Domaini

Coiled coil

Phylogenomic databases

eggNOGiKOG3520. Eukaryota.
COG5422. LUCA.
GeneTreeiENSGT00760000119193.
HOGENOMiHOG000034043.
HOVERGENiHBG050565.
InParanoidiQ92888.
KOiK12330.
PhylomeDBiQ92888.
TreeFamiTF106495.

Family and domain databases

Gene3Di1.20.900.10. 1 hit.
2.30.29.30. 1 hit.
InterProiIPR000219. DH-domain.
IPR011993. PH_dom-like.
IPR001849. PH_domain.
IPR016137. RGS.
IPR015212. RGS-like_dom.
[Graphical view]
PfamiPF09128. RGS-like. 1 hit.
PF00621. RhoGEF. 1 hit.
[Graphical view]
SMARTiSM00233. PH. 1 hit.
SM00325. RhoGEF. 1 hit.
[Graphical view]
SUPFAMiSSF48065. SSF48065. 1 hit.
SSF48097. SSF48097. 1 hit.
SSF50729. SSF50729. 1 hit.
PROSITEiPS50010. DH_2. 1 hit.
PS50003. PH_DOMAIN. 1 hit.
[Graphical view]

Sequences (4)i

Sequence statusi: Complete.

This entry describes 4 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q92888-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MEDFARGAAS PGPSRPGLVP VSIIGAEDED FENELETNSE EQNSQFQSLE
60 70 80 90 100
QVKRRPAHLM ALLQHVALQF EPGPLLCCLH ADMLGSLGPK EAKKAFLDFY
110 120 130 140 150
HSFLEKTAVL RVPVPPNVAF ELDRTRADLI SEDVQRRFVQ EVVQSQQVAV
160 170 180 190 200
GRQLEDFRSK RLMGMTPWEQ ELAQLEAWVG RDRASYEARE RHVAERLLMH
210 220 230 240 250
LEEMQHTIST DEEKSAAVVN AIGLYMRHLG VRTKSGDKKS GRNFFRKKVM
260 270 280 290 300
GNRRSDEPAK TKKGLSSILD AARWNRGEPQ VPDFRHLKAE VDAEKPGATD
310 320 330 340 350
RKGGVGMPSR DRNIGAPGQD TPGVSLHPLS LDSPDREPGA DAPLELGDSS
360 370 380 390 400
PQGPMSLESL APPESTDEGA ETESPEPGDE GEPGRSGLEL EPEEPPGWRE
410 420 430 440 450
LVPPDTLHSL PKSQVKRQEV ISELLVTEAA HVRMLRVLHD LFFQPMAECL
460 470 480 490 500
FFPLEELQNI FPSLDELIEV HSLFLDRLMK RRQESGYLIE EIGDVLLARF
510 520 530 540 550
DGAEGSWFQK ISSRFCSRQS FALEQLKAKQ RKDPRFCAFV QEAESRPRCR
560 570 580 590 600
RLQLKDMIPT EMQRLTKYPL LLQSIGQNTE EPTEREKVEL AAECCREILH
610 620 630 640 650
HVNQAVRDME DLLRLKDYQR RLDLSHLRQS SDPMLSEFKN LDITKKKLVH
660 670 680 690 700
EGPLTWRVTK DKAVEVHVLL LDDLLLLLQR QDERLLLKSH SRTLTPTPDG
710 720 730 740 750
KTMLRPVLRL TSAMTREVAT DHKAFYVLFT WDQEAQIYEL VAQTVSERKN
760 770 780 790 800
WCALITETAG SLKVPAPASR PKPRPSPSST REPLLSSSEN GNGGRETSPA
810 820 830 840 850
DARTERILSD LLPFCRPGPE GQLAATALRK VLSLKQLLFP AEEDNGAGPP
860 870 880 890 900
RDGDGVPGGG PLSPARTQEI QENLLSLEET MKQLEELEEE FCRLRPLLSQ
910
LGGNSVPQPG CT
Length:912
Mass (Da):102,435
Last modified:August 29, 2003 - v2
Checksum:i1E773D041652190D
GO
Isoform 2 (identifier: Q92888-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     76-108: Missing.

Note: No experimental confirmation available.
Show »
Length:879
Mass (Da):98,768
Checksum:i1D0863A5D1A57C9B
GO
Isoform 3 (identifier: Q92888-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-1: M → MASLSTWSSPAEPREM

Note: No experimental confirmation available.
Show »
Length:927
Mass (Da):104,066
Checksum:iB2E593D5B2DCE417
GO
Isoform 4 (identifier: Q92888-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-1: M → MASLSTWSSPAEPREM
     76-108: Missing.
     831-912: VLSLKQLLFP...GNSVPQPGCT → GVGGGILPPE...PKCLRSVFIP

Note: No experimental confirmation available.
Show »
Length:948
Mass (Da):105,854
Checksum:iD4F76D1F6EADD661
GO

Sequence cautioni

The sequence CAA70356 differs from that shown. Contaminating sequence. Sequence of unknown origin in the N-terminal part.Curated
The sequence CAA70356 differs from that shown. Reason: Frameshift at position 904.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti257 – 259EPA → DPP in AAB17896 (PubMed:8810315).Curated3
Sequence conflicti305 – 308VGMP → GGDA in CAA70356 (PubMed:9135076).Curated4
Sequence conflicti339Missing in CAA70356 (PubMed:9135076).Curated1
Sequence conflicti346 – 352LGDSSPQ → PGGLIPA in CAA70356 (PubMed:9135076).Curated7
Sequence conflicti549C → S in CAA70356 (PubMed:9135076).Curated1
Sequence conflicti752C → S in CAA70356 (PubMed:9135076).Curated1
Sequence conflicti776S → R in AAB17896 (PubMed:8810315).Curated1
Sequence conflicti862L → R in CAA70356 (PubMed:9135076).Curated1
Sequence conflicti876S → R in CAA70356 (PubMed:9135076).Curated1
Sequence conflicti883Q → T in CAA70356 (PubMed:9135076).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_035969165M → V in a colorectal cancer sample; somatic mutation. 1 Publication1
Natural variantiVAR_033521375P → L.Corresponds to variant rs2303797dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0377661M → MASLSTWSSPAEPREM in isoform 3 and isoform 4. 1 Publication1
Alternative sequenceiVSP_00812576 – 108Missing in isoform 2 and isoform 4. 1 PublicationAdd BLAST33
Alternative sequenceiVSP_057289831 – 912VLSLK…QPGCT → GVGGGILPPETPPVSAWGEL CPPAWLHLRFPPRKAFCKKE RNGGEDVRDHPHPHSCRSIS HPEGLRRGSCGPRLGGAQLG LLAPHEPRPSLPPALCLGDS GLHSGGHHGDPGHLSIACGG HPSTPTPKCLRSVFIP in isoform 4. 1 PublicationAdd BLAST82

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U64105 mRNA. Translation: AAB17896.1.
AC010616 Genomic DNA. No translation available.
AC243967 Genomic DNA. No translation available.
CH471126 Genomic DNA. Translation: EAW57084.1.
BC005155 mRNA. Translation: AAH05155.2.
BC011726 mRNA. Translation: AAH11726.1.
BC015652 mRNA. No translation available.
BC034013 mRNA. Translation: AAH34013.2.
BC067262 mRNA. Translation: AAH67262.1.
Y09160 mRNA. Translation: CAA70356.1. Sequence problems.
BT007421 mRNA. Translation: AAP36089.1.
CCDSiCCDS12590.1. [Q92888-3]
CCDS12591.1. [Q92888-1]
CCDS12592.1. [Q92888-2]
RefSeqiNP_004697.2. NM_004706.3. [Q92888-1]
NP_945328.1. NM_198977.1. [Q92888-2]
NP_945353.1. NM_199002.1. [Q92888-3]
UniGeneiHs.631550.

Genome annotation databases

EnsembliENST00000337665; ENSP00000337261; ENSG00000076928. [Q92888-3]
ENST00000347545; ENSP00000344429; ENSG00000076928. [Q92888-2]
ENST00000354532; ENSP00000346532; ENSG00000076928. [Q92888-1]
ENST00000378152; ENSP00000367394; ENSG00000076928. [Q92888-4]
GeneIDi9138.
KEGGihsa:9138.
UCSCiuc002orx.4. human. [Q92888-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
U64105 mRNA. Translation: AAB17896.1.
AC010616 Genomic DNA. No translation available.
AC243967 Genomic DNA. No translation available.
CH471126 Genomic DNA. Translation: EAW57084.1.
BC005155 mRNA. Translation: AAH05155.2.
BC011726 mRNA. Translation: AAH11726.1.
BC015652 mRNA. No translation available.
BC034013 mRNA. Translation: AAH34013.2.
BC067262 mRNA. Translation: AAH67262.1.
Y09160 mRNA. Translation: CAA70356.1. Sequence problems.
BT007421 mRNA. Translation: AAP36089.1.
CCDSiCCDS12590.1. [Q92888-3]
CCDS12591.1. [Q92888-1]
CCDS12592.1. [Q92888-2]
RefSeqiNP_004697.2. NM_004706.3. [Q92888-1]
NP_945328.1. NM_198977.1. [Q92888-2]
NP_945353.1. NM_199002.1. [Q92888-3]
UniGeneiHs.631550.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1IAPX-ray1.90A42-252[»]
1SHZX-ray2.85C/F7-239[»]
3AB3X-ray2.40B/D1-233[»]
3ODOX-ray2.90A/B395-766[»]
3ODWX-ray3.20A/B240-766[»]
3ODXX-ray3.20A/B353-766[»]
3P6AX-ray2.50A/B395-766[»]
ProteinModelPortaliQ92888.
SMRiQ92888.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi114585. 30 interactors.
IntActiQ92888. 13 interactors.
MINTiMINT-2813461.
STRINGi9606.ENSP00000337261.

PTM databases

iPTMnetiQ92888.
PhosphoSitePlusiQ92888.

Polymorphism and mutation databases

BioMutaiARHGEF1.
DMDMi34395524.

Proteomic databases

EPDiQ92888.
MaxQBiQ92888.
PaxDbiQ92888.
PeptideAtlasiQ92888.
PRIDEiQ92888.

Protocols and materials databases

DNASUi9138.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000337665; ENSP00000337261; ENSG00000076928. [Q92888-3]
ENST00000347545; ENSP00000344429; ENSG00000076928. [Q92888-2]
ENST00000354532; ENSP00000346532; ENSG00000076928. [Q92888-1]
ENST00000378152; ENSP00000367394; ENSG00000076928. [Q92888-4]
GeneIDi9138.
KEGGihsa:9138.
UCSCiuc002orx.4. human. [Q92888-1]

Organism-specific databases

CTDi9138.
DisGeNETi9138.
GeneCardsiARHGEF1.
HGNCiHGNC:681. ARHGEF1.
HPAiCAB009502.
HPA012924.
HPA060784.
MIMi601855. gene.
neXtProtiNX_Q92888.
OpenTargetsiENSG00000076928.
PharmGKBiPA24966.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG3520. Eukaryota.
COG5422. LUCA.
GeneTreeiENSGT00760000119193.
HOGENOMiHOG000034043.
HOVERGENiHBG050565.
InParanoidiQ92888.
KOiK12330.
PhylomeDBiQ92888.
TreeFamiTF106495.

Enzyme and pathway databases

ReactomeiR-HSA-193634. Axonal growth inhibition (RHOA activation).
R-HSA-193648. NRAGE signals death through JNK.
R-HSA-194840. Rho GTPase cycle.
R-HSA-416482. G alpha (12/13) signalling events.
SIGNORiQ92888.

Miscellaneous databases

ChiTaRSiARHGEF1. human.
EvolutionaryTraceiQ92888.
GeneWikiiARHGEF1.
GenomeRNAii9138.
PMAP-CutDBQ92888.
PROiQ92888.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000076928.
CleanExiHS_ARHGEF1.
ExpressionAtlasiQ92888. baseline and differential.
GenevisibleiQ92888. HS.

Family and domain databases

Gene3Di1.20.900.10. 1 hit.
2.30.29.30. 1 hit.
InterProiIPR000219. DH-domain.
IPR011993. PH_dom-like.
IPR001849. PH_domain.
IPR016137. RGS.
IPR015212. RGS-like_dom.
[Graphical view]
PfamiPF09128. RGS-like. 1 hit.
PF00621. RhoGEF. 1 hit.
[Graphical view]
SMARTiSM00233. PH. 1 hit.
SM00325. RhoGEF. 1 hit.
[Graphical view]
SUPFAMiSSF48065. SSF48065. 1 hit.
SSF48097. SSF48097. 1 hit.
SSF50729. SSF50729. 1 hit.
PROSITEiPS50010. DH_2. 1 hit.
PS50003. PH_DOMAIN. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiARHG1_HUMAN
AccessioniPrimary (citable) accession number: Q92888
Secondary accession number(s): O00513
, Q6NX52, Q8N4J4, Q96BF4, Q96F17, Q9BSB1
Entry historyi
Integrated into UniProtKB/Swiss-Prot: August 29, 2003
Last sequence update: August 29, 2003
Last modified: November 30, 2016
This is version 161 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Reference proteome

Documents

  1. Human chromosome 19
    Human chromosome 19: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  6. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.