Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Carbonic anhydrase 4

Gene

CA4

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Reversible hydration of carbon dioxide. May stimulate the sodium/bicarbonate transporter activity of SLC4A4 that acts in pH homeostasis. It is essential for acid overload removal from the retina and retina epithelium, and acid release in the choriocapillaris in the choroid.1 Publication

Catalytic activityi

H2CO3 = CO2 + H2O.

Cofactori

Zn2+1 Publication

Enzyme regulationi

Activated by histamine, L-adrenaline, D-phenylalanine, L- and D-histidine. Inhibited by coumarins, saccharin, sulfonamide derivatives such as acetazolamide and Foscarnet (phosphonoformate trisodium salt).8 Publications

Kineticsi

  1. KM=21.5 mM for CO21 Publication

    Sites

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Active sitei88 – 881Proton acceptorBy similarity
    Metal bindingi115 – 1151Zinc; catalytic1 Publication
    Metal bindingi117 – 1171Zinc; catalytic1 Publication
    Metal bindingi140 – 1401Zinc; catalytic1 Publication

    GO - Molecular functioni

    GO - Biological processi

    • bicarbonate transport Source: DFLAT
    • one-carbon metabolic process Source: InterPro
    Complete GO annotation...

    Keywords - Molecular functioni

    Lyase

    Keywords - Ligandi

    Metal-binding, Zinc

    Enzyme and pathway databases

    BRENDAi4.2.1.1. 2681.
    ReactomeiR-HSA-1237044. Erythrocytes take up carbon dioxide and release oxygen.
    R-HSA-1247673. Erythrocytes take up oxygen and release carbon dioxide.
    R-HSA-1475029. Reversible hydration of carbon dioxide.
    SABIO-RKP22748.

    Names & Taxonomyi

    Protein namesi
    Recommended name:
    Carbonic anhydrase 4 (EC:4.2.1.1)
    Alternative name(s):
    Carbonate dehydratase IV
    Carbonic anhydrase IV
    Short name:
    CA-IV
    Gene namesi
    Name:CA4
    OrganismiHomo sapiens (Human)
    Taxonomic identifieri9606 [NCBI]
    Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
    Proteomesi
    • UP000005640 Componenti: Chromosome 17

    Organism-specific databases

    HGNCiHGNC:1375. CA4.

    Subcellular locationi

    GO - Cellular componenti

    • anchored component of external side of plasma membrane Source: DFLAT
    • apical plasma membrane Source: DFLAT
    • brush border membrane Source: DFLAT
    • cell surface Source: DFLAT
    • endoplasmic reticulum-Golgi intermediate compartment Source: DFLAT
    • extracellular exosome Source: UniProtKB
    • Golgi apparatus Source: DFLAT
    • membrane Source: ProtInc
    • perinuclear region of cytoplasm Source: DFLAT
    • plasma membrane Source: UniProtKB
    • rough endoplasmic reticulum Source: DFLAT
    • secretory granule membrane Source: DFLAT
    • trans-Golgi network Source: DFLAT
    • transport vesicle membrane Source: DFLAT
    Complete GO annotation...

    Keywords - Cellular componenti

    Cell membrane, Membrane

    Pathology & Biotechi

    Involvement in diseasei

    Retinitis pigmentosa 17 (RP17)3 Publications
    The disease is caused by mutations affecting the gene represented in this entry. Defective acid overload removal from retina and retinal epithelium, due to mutant CA4, is responsible for photoreceptor degeneration, indicating that impaired pH homeostasis is the most likely cause underlying the RP17 phenotype.
    Disease descriptionA retinal dystrophy belonging to the group of pigmentary retinopathies. Retinitis pigmentosa is characterized by retinal pigment deposits visible on fundus examination and primary loss of rod photoreceptor cells followed by secondary loss of cone photoreceptors. Patients typically have night vision blindness and loss of midperipheral visual field. As their condition progresses, they lose their far peripheral visual field and eventually central vision as well.
    See also OMIM:600852
    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Natural varianti12 – 121A → T in RP17. 1 Publication
    VAR_071430
    Natural varianti14 – 141R → W in RP17; abolishes interaction with SLC4A4; impaired SLC4A4 cotransporter activity stimulation. 1 Publication
    Corresponds to variant rs104894559 [ dbSNP | Ensembl ].
    VAR_024749
    Natural varianti69 – 691R → H in RP17; has no effect on catalytic activity; loss of interaction with SLC4A4. 1 Publication
    Corresponds to variant rs121434552 [ dbSNP | Ensembl ].
    VAR_071431
    Natural varianti219 – 2191R → S in RP17; no catalytic activity; impaired SLC4A4 cotransporter activity stimulation. 1 Publication
    Corresponds to variant rs121434551 [ dbSNP | Ensembl ].
    VAR_024750

    Mutagenesis

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Mutagenesisi284 – 2841S → F: Loss of C-terminal domain removal and inactivation. 1 Publication

    Keywords - Diseasei

    Disease mutation, Retinitis pigmentosa

    Organism-specific databases

    MalaCardsiCA4.
    MIMi600852. phenotype.
    Orphaneti791. Retinitis pigmentosa.
    PharmGKBiPA25991.

    Chemistry

    ChEMBLiCHEMBL2095180.
    DrugBankiDB00819. Acetazolamide.
    DB00436. Bendroflumethiazide.
    DB00562. Benzthiazide.
    DB01194. Brinzolamide.
    DB00880. Chlorothiazide.
    DB00606. Cyclothiazide.
    DB01144. Diclofenamide.
    DB00869. Dorzolamide.
    DB00999. Hydrochlorothiazide.
    DB00774. Hydroflumethiazide.
    DB00703. Methazolamide.
    DB00232. Methyclothiazide.
    DB00273. Topiramate.
    DB01021. Trichlormethiazide.
    DB00909. Zonisamide.
    GuidetoPHARMACOLOGYi2599.

    Polymorphism and mutation databases

    BioMutaiCA4.
    DMDMi115465.

    PTM / Processingi

    Molecule processing

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Signal peptidei1 – 18181 PublicationAdd
    BLAST
    Chaini19 – 284266Carbonic anhydrase 4PRO_0000004226Add
    BLAST
    Propeptidei285 – 31228Removed in mature formPRO_0000004227Add
    BLAST

    Amino acid modifications

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Disulfide bondi24 ↔ 361 Publication
    Disulfide bondi46 ↔ 2291 Publication
    Lipidationi284 – 2841GPI-anchor amidated serine1 Publication

    Keywords - PTMi

    Disulfide bond, Glycoprotein, GPI-anchor, Lipoprotein

    Proteomic databases

    EPDiP22748.
    PaxDbiP22748.
    PeptideAtlasiP22748.
    PRIDEiP22748.

    PTM databases

    iPTMnetiP22748.
    PhosphoSiteiP22748.

    Expressioni

    Tissue specificityi

    Expressed in the endothelium of the choriocapillaris in eyes (at protein level). Not expressed in the retinal epithelium at detectable levels.1 Publication

    Gene expression databases

    BgeeiENSG00000167434.
    CleanExiHS_CA4.
    ExpressionAtlasiP22748. baseline and differential.
    GenevisibleiP22748. HS.

    Organism-specific databases

    HPAiHPA011089.
    HPA017258.

    Interactioni

    Subunit structurei

    Interacts with SLC4A4.4 Publications

    Protein-protein interaction databases

    BioGridi107217. 5 interactions.
    IntActiP22748. 4 interactions.
    STRINGi9606.ENSP00000300900.

    Chemistry

    BindingDBiP22748.

    Structurei

    Secondary structure

    1
    312
    Legend: HelixTurnBeta strand
    Show more details
    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Helixi26 – 294Combined sources
    Helixi39 – 413Combined sources
    Turni44 – 474Combined sources
    Beta strandi48 – 503Combined sources
    Helixi58 – 603Combined sources
    Beta strandi61 – 633Combined sources
    Beta strandi70 – 778Combined sources
    Beta strandi82 – 854Combined sources
    Beta strandi87 – 937Combined sources
    Beta strandi99 – 1024Combined sources
    Beta strandi109 – 11810Combined sources
    Beta strandi127 – 1304Combined sources
    Beta strandi136 – 14510Combined sources
    Beta strandi162 – 17514Combined sources
    Helixi178 – 1803Combined sources
    Helixi181 – 1866Combined sources
    Helixi187 – 1893Combined sources
    Beta strandi196 – 1983Combined sources
    Helixi205 – 2073Combined sources
    Helixi211 – 2144Combined sources
    Beta strandi217 – 2226Combined sources
    Beta strandi233 – 2408Combined sources
    Beta strandi242 – 2454Combined sources
    Helixi246 – 25510Combined sources
    Beta strandi257 – 2593Combined sources
    Beta strandi264 – 2663Combined sources

    3D structure databases

    Select the link destinations:
    PDBei
    RCSB PDBi
    PDBji
    Links Updated
    EntryMethodResolution (Å)ChainPositionsPDBsum
    1ZNCX-ray2.80A/B19-284[»]
    3F7BX-ray2.05A/B19-284[»]
    3F7UX-ray2.00A/B/C/D19-284[»]
    3FW3X-ray1.72A/B19-284[»]
    ProteinModelPortaliP22748.
    SMRiP22748. Positions 23-284.
    ModBaseiSearch...
    MobiDBiSearch...

    Miscellaneous databases

    EvolutionaryTraceiP22748.

    Family & Domainsi

    Domains and Repeats

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Domaini21 – 285265Alpha-carbonic anhydrasePROSITE-ProRule annotationAdd
    BLAST

    Region

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Regioni225 – 2262Substrate bindingBy similarity

    Sequence similaritiesi

    Belongs to the alpha-carbonic anhydrase family.Curated
    Contains 1 alpha-carbonic anhydrase domain.PROSITE-ProRule annotation

    Keywords - Domaini

    Signal

    Phylogenomic databases

    eggNOGiKOG0382. Eukaryota.
    COG3338. LUCA.
    GeneTreeiENSGT00760000118915.
    HOGENOMiHOG000112637.
    HOVERGENiHBG002837.
    InParanoidiP22748.
    KOiK18246.
    OMAiKQTWTVQ.
    OrthoDBiEOG091G0XFM.
    PhylomeDBiP22748.
    TreeFamiTF316425.

    Family and domain databases

    Gene3Di3.10.200.10. 1 hit.
    InterProiIPR001148. Carbonic_anhydrase_a.
    IPR023561. Carbonic_anhydrase_a-class.
    IPR018338. Carbonic_anhydrase_a-class_CS.
    IPR018343. Carbonic_anhydrase_CA4.
    [Graphical view]
    PANTHERiPTHR18952. PTHR18952. 1 hit.
    PTHR18952:SF95. PTHR18952:SF95. 1 hit.
    PfamiPF00194. Carb_anhydrase. 1 hit.
    [Graphical view]
    SMARTiSM01057. Carb_anhydrase. 1 hit.
    [Graphical view]
    SUPFAMiSSF51069. SSF51069. 1 hit.
    PROSITEiPS00162. ALPHA_CA_1. 1 hit.
    PS51144. ALPHA_CA_2. 1 hit.
    [Graphical view]

    Sequences (2)i

    Sequence statusi: Complete.

    Sequence processingi: The displayed sequence is further processed into a mature form.

    This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

    Isoform 1 (identifier: P22748-1) [UniParc]FASTAAdd to basket

    This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

    « Hide

            10         20         30         40         50
    MRMLLALLAL SAARPSASAE SHWCYEVQAE SSNYPCLVPV KWGGNCQKDR
    60 70 80 90 100
    QSPINIVTTK AKVDKKLGRF FFSGYDKKQT WTVQNNGHSV MMLLENKASI
    110 120 130 140 150
    SGGGLPAPYQ AKQLHLHWSD LPYKGSEHSL DGEHFAMEMH IVHEKEKGTS
    160 170 180 190 200
    RNVKEAQDPE DEIAVLAFLV EAGTQVNEGF QPLVEALSNI PKPEMSTTMA
    210 220 230 240 250
    ESSLLDLLPK EEKLRHYFRY LGSLTTPTCD EKVVWTVFRE PIQLHREQIL
    260 270 280 290 300
    AFSQKLYYDK EQTVSMKDNV RPLQQLGQRT VIKSGAPGRP LPWALPALLG
    310
    PMLACLLAGF LR
    Length:312
    Mass (Da):35,032
    Last modified:August 1, 1992 - v2
    Checksum:iEF5F182474ABE9B0
    GO
    Isoform 2 (identifier: P22748-2) [UniParc]FASTAAdd to basket

    The sequence of this isoform differs from the canonical sequence as follows:
         90-106: VMMLLENKASISGGGLP → GWNPGERGLPATGGGTV
         107-312: Missing.

    Note: No experimental confirmation available.
    Show »
    Length:106
    Mass (Da):11,599
    Checksum:i0149FBF9A98FFF2B
    GO

    Experimental Info

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Sequence conflicti24 – 241C → E AA sequence (PubMed:2111324).Curated

    Natural variant

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Natural varianti12 – 121A → T in RP17. 1 Publication
    VAR_071430
    Natural varianti14 – 141R → W in RP17; abolishes interaction with SLC4A4; impaired SLC4A4 cotransporter activity stimulation. 1 Publication
    Corresponds to variant rs104894559 [ dbSNP | Ensembl ].
    VAR_024749
    Natural varianti69 – 691R → H in RP17; has no effect on catalytic activity; loss of interaction with SLC4A4. 1 Publication
    Corresponds to variant rs121434552 [ dbSNP | Ensembl ].
    VAR_071431
    Natural varianti177 – 1771N → K.1 Publication
    Corresponds to variant rs185942554 [ dbSNP | Ensembl ].
    VAR_071432
    Natural varianti219 – 2191R → S in RP17; no catalytic activity; impaired SLC4A4 cotransporter activity stimulation. 1 Publication
    Corresponds to variant rs121434551 [ dbSNP | Ensembl ].
    VAR_024750
    Natural varianti237 – 2371V → L.
    Corresponds to variant rs2229178 [ dbSNP | Ensembl ].
    VAR_048680

    Alternative sequence

    Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
    Alternative sequencei90 – 10617VMMLL…GGGLP → GWNPGERGLPATGGGTV in isoform 2. 1 PublicationVSP_055973Add
    BLAST
    Alternative sequencei107 – 312206Missing in isoform 2. 1 PublicationVSP_055974Add
    BLAST

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    M83670 mRNA. Translation: AAA35630.1.
    L10955
    , L10951, L10953, L10954 Genomic DNA. Translation: AAA35625.1. Sequence problems.
    L10955, L10954 Genomic DNA. Translation: AAA35626.1. Sequence problems.
    AK289715 mRNA. Translation: BAF82404.1.
    AK298710 mRNA. Translation: BAG60866.1.
    CR541766 mRNA. Translation: CAG46565.1.
    AC025048 Genomic DNA. No translation available.
    CH471109 Genomic DNA. Translation: EAW94362.1.
    BC057792 mRNA. Translation: AAH57792.1.
    BC069649 mRNA. Translation: AAH69649.1.
    BC074768 mRNA. Translation: AAH74768.1.
    CCDSiCCDS11624.1. [P22748-1]
    PIRiA45745. CRHU4.
    RefSeqiNP_000708.1. NM_000717.3. [P22748-1]
    UniGeneiHs.89485.

    Genome annotation databases

    EnsembliENST00000300900; ENSP00000300900; ENSG00000167434. [P22748-1]
    ENST00000586876; ENSP00000467465; ENSG00000167434. [P22748-2]
    GeneIDi762.
    KEGGihsa:762.
    UCSCiuc002iym.5. human. [P22748-1]

    Keywords - Coding sequence diversityi

    Alternative splicing, Polymorphism

    Cross-referencesi

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    M83670 mRNA. Translation: AAA35630.1.
    L10955
    , L10951, L10953, L10954 Genomic DNA. Translation: AAA35625.1. Sequence problems.
    L10955, L10954 Genomic DNA. Translation: AAA35626.1. Sequence problems.
    AK289715 mRNA. Translation: BAF82404.1.
    AK298710 mRNA. Translation: BAG60866.1.
    CR541766 mRNA. Translation: CAG46565.1.
    AC025048 Genomic DNA. No translation available.
    CH471109 Genomic DNA. Translation: EAW94362.1.
    BC057792 mRNA. Translation: AAH57792.1.
    BC069649 mRNA. Translation: AAH69649.1.
    BC074768 mRNA. Translation: AAH74768.1.
    CCDSiCCDS11624.1. [P22748-1]
    PIRiA45745. CRHU4.
    RefSeqiNP_000708.1. NM_000717.3. [P22748-1]
    UniGeneiHs.89485.

    3D structure databases

    Select the link destinations:
    PDBei
    RCSB PDBi
    PDBji
    Links Updated
    EntryMethodResolution (Å)ChainPositionsPDBsum
    1ZNCX-ray2.80A/B19-284[»]
    3F7BX-ray2.05A/B19-284[»]
    3F7UX-ray2.00A/B/C/D19-284[»]
    3FW3X-ray1.72A/B19-284[»]
    ProteinModelPortaliP22748.
    SMRiP22748. Positions 23-284.
    ModBaseiSearch...
    MobiDBiSearch...

    Protein-protein interaction databases

    BioGridi107217. 5 interactions.
    IntActiP22748. 4 interactions.
    STRINGi9606.ENSP00000300900.

    Chemistry

    BindingDBiP22748.
    ChEMBLiCHEMBL2095180.
    DrugBankiDB00819. Acetazolamide.
    DB00436. Bendroflumethiazide.
    DB00562. Benzthiazide.
    DB01194. Brinzolamide.
    DB00880. Chlorothiazide.
    DB00606. Cyclothiazide.
    DB01144. Diclofenamide.
    DB00869. Dorzolamide.
    DB00999. Hydrochlorothiazide.
    DB00774. Hydroflumethiazide.
    DB00703. Methazolamide.
    DB00232. Methyclothiazide.
    DB00273. Topiramate.
    DB01021. Trichlormethiazide.
    DB00909. Zonisamide.
    GuidetoPHARMACOLOGYi2599.

    PTM databases

    iPTMnetiP22748.
    PhosphoSiteiP22748.

    Polymorphism and mutation databases

    BioMutaiCA4.
    DMDMi115465.

    Proteomic databases

    EPDiP22748.
    PaxDbiP22748.
    PeptideAtlasiP22748.
    PRIDEiP22748.

    Protocols and materials databases

    DNASUi762.
    Structural Biology KnowledgebaseSearch...

    Genome annotation databases

    EnsembliENST00000300900; ENSP00000300900; ENSG00000167434. [P22748-1]
    ENST00000586876; ENSP00000467465; ENSG00000167434. [P22748-2]
    GeneIDi762.
    KEGGihsa:762.
    UCSCiuc002iym.5. human. [P22748-1]

    Organism-specific databases

    CTDi762.
    GeneCardsiCA4.
    GeneReviewsiCA4.
    HGNCiHGNC:1375. CA4.
    HPAiHPA011089.
    HPA017258.
    MalaCardsiCA4.
    MIMi114760. gene.
    600852. phenotype.
    neXtProtiNX_P22748.
    Orphaneti791. Retinitis pigmentosa.
    PharmGKBiPA25991.
    GenAtlasiSearch...

    Phylogenomic databases

    eggNOGiKOG0382. Eukaryota.
    COG3338. LUCA.
    GeneTreeiENSGT00760000118915.
    HOGENOMiHOG000112637.
    HOVERGENiHBG002837.
    InParanoidiP22748.
    KOiK18246.
    OMAiKQTWTVQ.
    OrthoDBiEOG091G0XFM.
    PhylomeDBiP22748.
    TreeFamiTF316425.

    Enzyme and pathway databases

    BRENDAi4.2.1.1. 2681.
    ReactomeiR-HSA-1237044. Erythrocytes take up carbon dioxide and release oxygen.
    R-HSA-1247673. Erythrocytes take up oxygen and release carbon dioxide.
    R-HSA-1475029. Reversible hydration of carbon dioxide.
    SABIO-RKP22748.

    Miscellaneous databases

    EvolutionaryTraceiP22748.
    GeneWikiiCarbonic_anhydrase_4.
    GenomeRNAii762.
    PROiP22748.
    SOURCEiSearch...

    Gene expression databases

    BgeeiENSG00000167434.
    CleanExiHS_CA4.
    ExpressionAtlasiP22748. baseline and differential.
    GenevisibleiP22748. HS.

    Family and domain databases

    Gene3Di3.10.200.10. 1 hit.
    InterProiIPR001148. Carbonic_anhydrase_a.
    IPR023561. Carbonic_anhydrase_a-class.
    IPR018338. Carbonic_anhydrase_a-class_CS.
    IPR018343. Carbonic_anhydrase_CA4.
    [Graphical view]
    PANTHERiPTHR18952. PTHR18952. 1 hit.
    PTHR18952:SF95. PTHR18952:SF95. 1 hit.
    PfamiPF00194. Carb_anhydrase. 1 hit.
    [Graphical view]
    SMARTiSM01057. Carb_anhydrase. 1 hit.
    [Graphical view]
    SUPFAMiSSF51069. SSF51069. 1 hit.
    PROSITEiPS00162. ALPHA_CA_1. 1 hit.
    PS51144. ALPHA_CA_2. 1 hit.
    [Graphical view]
    ProtoNetiSearch...

    Entry informationi

    Entry nameiCAH4_HUMAN
    AccessioniPrimary (citable) accession number: P22748
    Secondary accession number(s): B4DQA4, Q6FHI7
    Entry historyi
    Integrated into UniProtKB/Swiss-Prot: August 1, 1991
    Last sequence update: August 1, 1992
    Last modified: September 7, 2016
    This is version 177 of the entry and version 2 of the sequence. [Complete history]
    Entry statusiReviewed (UniProtKB/Swiss-Prot)
    Annotation programChordata Protein Annotation Program
    DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

    Miscellaneousi

    Keywords - Technical termi

    3D-structure, Complete proteome, Direct protein sequencing, Reference proteome

    Documents

    1. Human chromosome 17
      Human chromosome 17: entries, gene names and cross-references to MIM
    2. Human entries with polymorphisms or disease mutations
      List of human entries with polymorphisms or disease mutations
    3. Human polymorphisms and disease mutations
      Index of human polymorphisms and disease mutations
    4. MIM cross-references
      Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
    5. PDB cross-references
      Index of Protein Data Bank (PDB) cross-references
    6. SIMILARITY comments
      Index of protein domains and families

    Similar proteinsi

    Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
    100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
    90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
    50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.