Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Mannan-binding lectin serine protease 1

Gene

Masp1

Organism
Rattus norvegicus (Rat)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Functions in the lectin pathway of complement, which performs a key role in innate immunity by recognizing pathogens through patterns of sugar moieties and neutralizing them. The lectin pathway is triggered upon binding of mannan-binding lectin (MBL) and ficolins to sugar moieties which leads to activation of the associated proteases MASP1 and MASP2. Functions as an endopeptidase and may activate MASP2 or C2 or directly activate C3 the key component of complement reaction. Isoform 2 may have an inhibitory effect on the activation of the lectin pathway of complement or may cleave IGFBP5 (By similarity).By similarity

Enzyme regulationi

Inhibited by SERPING1 and A2M.By similarity

Kineticsi

  1. KM=13 µM for C2 (at 37 degrees Celsius)1 Publication

    Sites

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Metal bindingi73Calcium 1By similarity1
    Metal bindingi81Calcium 1By similarity1
    Metal bindingi126Calcium 1By similarity1
    Metal bindingi128Calcium 1; via carbonyl oxygenBy similarity1
    Metal bindingi144Calcium 2By similarity1
    Metal bindingi145Calcium 2; via carbonyl oxygenBy similarity1
    Metal bindingi147Calcium 2By similarity1
    Metal bindingi164Calcium 2By similarity1
    Metal bindingi165Calcium 2; via carbonyl oxygenBy similarity1
    Metal bindingi168Calcium 2; via carbonyl oxygenBy similarity1
    Metal bindingi240Calcium 3By similarity1
    Metal bindingi250Calcium 3By similarity1
    Metal bindingi287Calcium 3By similarity1
    Metal bindingi289Calcium 3; via carbonyl oxygenBy similarity1
    Active sitei495Charge relay systemBy similarity1
    Active sitei557Charge relay systemBy similarity1
    Active sitei651Charge relay systemBy similarity1

    GO - Molecular functioni

    • calcium-dependent protein binding Source: UniProtKB
    • calcium ion binding Source: UniProtKB
    • protein homodimerization activity Source: UniProtKB
    • serine-type endopeptidase activity Source: UniProtKB

    GO - Biological processi

    • complement activation Source: RGD
    • complement activation, lectin pathway Source: UniProtKB
    Complete GO annotation...

    Keywords - Molecular functioni

    Hydrolase, Protease, Serine protease

    Keywords - Biological processi

    Complement activation lectin pathway, Immunity, Innate immunity

    Keywords - Ligandi

    Calcium, Metal-binding

    Enzyme and pathway databases

    ReactomeiR-RNO-166662. Lectin pathway of complement activation.
    R-RNO-166663. Initial triggering of complement.
    R-RNO-2855086. Ficolins bind to repetitive carbohydrate structures on the target cell surface.
    R-RNO-3000480. Scavenging by Class A Receptors.

    Protein family/group databases

    MEROPSiS01.198.

    Names & Taxonomyi

    Protein namesi
    Recommended name:
    Mannan-binding lectin serine protease 1 (EC:3.4.21.-)
    Alternative name(s):
    Complement factor MASP-3
    Complement-activating component of Ra-reactive factor
    Mannose-binding lectin-associated serine protease 1
    Short name:
    MASP-1
    Mannose-binding protein-associated serine protease
    Ra-reactive factor serine protease p100
    Short name:
    RaRF
    Serine protease 5
    Cleaved into the following 2 chains:
    Gene namesi
    Name:Masp1
    Synonyms:Crarf, Masp3
    OrganismiRattus norvegicus (Rat)
    Taxonomic identifieri10116 [NCBI]
    Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeRattus
    Proteomesi
    • UP000002494 Componenti: Chromosome 11

    Organism-specific databases

    RGDi620213. Masp1.

    Subcellular locationi

    GO - Cellular componenti

    • extracellular space Source: UniProtKB
    Complete GO annotation...

    Keywords - Cellular componenti

    Secreted

    Pathology & Biotechi

    Mutagenesis

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Mutagenesisi651S → A: Prevents protease self-activation through proteolytic cleavage into heavy and light chain. 1 Publication1

    PTM / Processingi

    Molecule processing

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Signal peptidei1 – 24By similarityAdd BLAST24
    ChainiPRO_000036924225 – 704Mannan-binding lectin serine protease 1Add BLAST680
    ChainiPRO_000036924325 – 453Mannan-binding lectin serine protease 1 heavy chainAdd BLAST429
    ChainiPRO_0000369244454 – 704Mannan-binding lectin serine protease 1 light chainAdd BLAST251

    Amino acid modifications

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Glycosylationi54N-linked (GlcNAc...)Sequence analysis1
    Disulfide bondi78 ↔ 96By similarity
    Disulfide bondi148 ↔ 162By similarity
    Disulfide bondi158 ↔ 171By similarity
    Modified residuei164(3R)-3-hydroxyasparagineSequence analysis1
    Disulfide bondi173 ↔ 186By similarity
    Glycosylationi183N-linked (GlcNAc...)Sequence analysis1
    Disulfide bondi190 ↔ 217By similarity
    Disulfide bondi247 ↔ 265By similarity
    Disulfide bondi306 ↔ 354By similarity
    Disulfide bondi334 ↔ 367By similarity
    Disulfide bondi372 ↔ 419By similarity
    Glycosylationi390N-linked (GlcNAc...)Sequence analysis1
    Disulfide bondi402 ↔ 437By similarity
    Glycosylationi412N-linked (GlcNAc...)Sequence analysis1
    Disulfide bondi441 ↔ 577Interchain (between heavy and light chains)PROSITE-ProRule annotation
    Disulfide bondi480 ↔ 496By similarity
    Disulfide bondi619 ↔ 636By similarity
    Disulfide bondi647 ↔ 677By similarity

    Post-translational modificationi

    The iron and 2-oxoglutarate dependent 3-hydroxylation of aspartate and asparagine is (R) stereospecific within EGF domains.By similarity
    N-glycosylated. Some N-linked glycan are of the complex-type.1 Publication
    Autoproteolytic processing of the proenzyme produces the active enzyme composed on the heavy and the light chain held together by a disulfide bond. Isoform 1 but not isoform 2 is activated through autoproteolytic processing (By similarity).By similarity

    Sites

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Sitei453 – 454Cleavage; by autolysisBy similarity2

    Keywords - PTMi

    Autocatalytic cleavage, Disulfide bond, Glycoprotein, Hydroxylation

    Proteomic databases

    PRIDEiQ8CHN8.

    Expressioni

    Tissue specificityi

    Protein of the plasma which is primarily expressed by liver.1 Publication

    Gene expression databases

    BgeeiENSRNOG00000001827.
    ExpressionAtlasiQ8CHN8. baseline and differential.
    GenevisibleiQ8CHN8. RN.

    Interactioni

    Subunit structurei

    Homodimer. Interacts with the oligomeric lectins MBL1, MBL2, FCN2 and FCN3; triggers the lectin pathway of complement through activation of C3. Interacts with SERPING1 (By similarity).By similarity

    GO - Molecular functioni

    • calcium-dependent protein binding Source: UniProtKB
    • protein homodimerization activity Source: UniProtKB

    Structurei

    Secondary structure

    1704
    Legend: HelixTurnBeta strandPDB Structure known for this area
    Show more details
    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Beta strandi190 – 195Combined sources6
    Beta strandi197 – 203Combined sources7
    Turni205 – 208Combined sources4
    Beta strandi216 – 222Combined sources7
    Beta strandi230 – 233Combined sources4
    Beta strandi243 – 247Combined sources5
    Beta strandi251 – 256Combined sources6
    Beta strandi259 – 264Combined sources6
    Beta strandi266 – 268Combined sources3
    Beta strandi278 – 285Combined sources8
    Beta strandi296 – 300Combined sources5

    3D structure databases

    Select the link destinations:
    PDBei
    RCSB PDBi
    PDBji
    Links Updated
    PDB entryMethodResolution (Å)ChainPositionsPDBsum
    3POBX-ray1.80A188-301[»]
    3POEX-ray1.50A188-301[»]
    3POFX-ray1.50A/B188-301[»]
    3POGX-ray2.75A/B/C188-301[»]
    3POIX-ray1.70A/B188-301[»]
    3POJX-ray1.45A/B188-301[»]
    SMRiQ8CHN8.
    ModBaseiSearch...
    MobiDBiSearch...

    Miscellaneous databases

    EvolutionaryTraceiQ8CHN8.

    Family & Domainsi

    Domains and Repeats

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Domaini25 – 143CUB 1PROSITE-ProRule annotationAdd BLAST119
    Domaini144 – 187EGF-like; calcium-bindingBy similarityAdd BLAST44
    Domaini190 – 302CUB 2PROSITE-ProRule annotationAdd BLAST113
    Domaini304 – 369Sushi 1PROSITE-ProRule annotationAdd BLAST66
    Domaini370 – 439Sushi 2PROSITE-ProRule annotationAdd BLAST70
    Domaini454 – 701Peptidase S1PROSITE-ProRule annotationAdd BLAST248

    Region

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Regioni25 – 305Interaction with MBL1Add BLAST281
    Regioni25 – 283Interaction with FCN2By similarityAdd BLAST259
    Regioni25 – 189HomodimerizationAdd BLAST165
    Regioni25 – 189Interaction with MBL2By similarityAdd BLAST165

    Sequence similaritiesi

    Belongs to the peptidase S1 family.PROSITE-ProRule annotation
    Contains 2 CUB domains.PROSITE-ProRule annotation
    Contains 1 EGF-like domain.Curated
    Contains 1 peptidase S1 domain.PROSITE-ProRule annotation
    Contains 2 Sushi (CCP/SCR) domains.PROSITE-ProRule annotation

    Keywords - Domaini

    EGF-like domain, Repeat, Signal, Sushi

    Phylogenomic databases

    GeneTreeiENSGT00760000118890.
    HOGENOMiHOG000237311.
    InParanoidiQ8CHN8.
    KOiK03992.
    OrthoDBiEOG091G02DS.
    PhylomeDBiQ8CHN8.
    TreeFamiTF330373.

    Family and domain databases

    CDDicd00033. CCP. 2 hits.
    cd00041. CUB. 2 hits.
    cd00190. Tryp_SPc. 1 hit.
    Gene3Di2.60.120.290. 2 hits.
    InterProiIPR000859. CUB_dom.
    IPR001881. EGF-like_Ca-bd_dom.
    IPR013032. EGF-like_CS.
    IPR018097. EGF_Ca-bd_CS.
    IPR024175. Pept_S1A_C1r/C1S/mannan-bd.
    IPR009003. Peptidase_S1_PA.
    IPR001314. Peptidase_S1A.
    IPR000436. Sushi_SCR_CCP_dom.
    IPR001254. Trypsin_dom.
    IPR018114. TRYPSIN_HIS.
    IPR033116. TRYPSIN_SER.
    [Graphical view]
    PfamiPF00431. CUB. 2 hits.
    PF07645. EGF_CA. 1 hit.
    PF00084. Sushi. 2 hits.
    PF00089. Trypsin. 1 hit.
    [Graphical view]
    PIRSFiPIRSF001155. C1r_C1s_MASP. 1 hit.
    PRINTSiPR00722. CHYMOTRYPSIN.
    SMARTiSM00032. CCP. 2 hits.
    SM00042. CUB. 2 hits.
    SM00179. EGF_CA. 1 hit.
    SM00020. Tryp_SPc. 1 hit.
    [Graphical view]
    SUPFAMiSSF49854. SSF49854. 2 hits.
    SSF50494. SSF50494. 1 hit.
    SSF57535. SSF57535. 1 hit.
    PROSITEiPS00010. ASX_HYDROXYL. 1 hit.
    PS01180. CUB. 2 hits.
    PS01186. EGF_2. 1 hit.
    PS01187. EGF_CA. 1 hit.
    PS50923. SUSHI. 2 hits.
    PS50240. TRYPSIN_DOM. 1 hit.
    PS00134. TRYPSIN_HIS. 1 hit.
    PS00135. TRYPSIN_SER. 1 hit.
    [Graphical view]

    Sequences (3)i

    Sequence statusi: Complete.

    Sequence processingi: The displayed sequence is further processed into a mature form.

    This entry describes 3 isoformsi produced by alternative splicing. AlignAdd to basket

    Isoform 1 (identifier: Q8CHN8-1) [UniParc]FASTAAdd to basket
    Also known as: MASP-1

    This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

    « Hide

            10         20         30         40         50
    MRFLSFRRLL LYHVLCLTLT EVSAHTVELN EMFGQIQSPG YPDSYPSDSE
    60 70 80 90 100
    VTWNITVPEG FRVQLYFMHF NLESSYLCEY DYVKVETEDQ VLATFCGRET
    110 120 130 140 150
    TDTEQTPGQE VVLSPGSFMS VTFRSDFSNE ERFTGFDAHY MAVDVDECKE
    160 170 180 190 200
    REDEELSCDH YCHNYIGGYY CSCRFGYILH TDNRTCRVEC SGNLFTQRTG
    210 220 230 240 250
    TITSPDYPNP YPKSSECSYT IDLEEGFMVT LQFEDIFDIE DHPEVPCPYD
    260 270 280 290 300
    YIKIKAGSKV WGPFCGEKSP EPISTQSHSI QILFRSDNSG ENRGWRLSYR
    310 320 330 340 350
    AAGNECPKLQ PPVYGKIEPS QAVYSFKDQV LISCDTGYKV LKDNEVMDTF
    360 370 380 390 400
    QIECLKDGAW SNKIPTCKIV DCGVPAVLKH GLVTFSTRNN LTTYKSEIRY
    410 420 430 440 450
    SCQQPYYKML HNTTGVYTCS AHGTWTNEVL KRSLPTCLPV CGLPKFSRKH
    460 470 480 490 500
    ISRIFNGRPA QKGTTPWIAM LSQLNGQPFC GGSLLGSNWV LTAAHCLHHP
    510 520 530 540 550
    LDPEEPILHN SHLLSPSDFK IIMGKHWRRR SDEDEQHLHV KHIMLHPLYN
    560 570 580 590 600
    PSTFENDLGL VELSESPRLN DFVMPVCLPE HPSTEGTMVI VSGWGKQFLQ
    610 620 630 640 650
    RLPENLMEIE IPIVNYHTCQ EAYTPLGKKV TQDMICAGEK EGGKDACAGD
    660 670 680 690 700
    SGGPMVTKDA ERDQWYLVGV VSWGEDCGKK DRYGVYSYIY PNKDWIQRVT

    GVRN
    Length:704
    Mass (Da):80,097
    Last modified:April 14, 2009 - v2
    Checksum:i3CB61ED661967127
    GO
    Isoform 2 (identifier: Q8CHN8-2) [UniParc]FASTAAdd to basket
    Also known as: MASP-3

    The sequence of this isoform differs from the canonical sequence as follows:
         443-443: L → QPSRALPNLV...RGVRELQVER
         444-704: Missing.

    Note: Glycosylated on Asn-538 and Asn-604.By similarity
    Show »
    Length:733
    Mass (Da):82,481
    Checksum:i8C71515F216A4365
    GO
    Isoform 3 (identifier: Q8CHN8-3) [UniParc]FASTAAdd to basket

    The sequence of this isoform differs from the canonical sequence as follows:
         369-385: IVDCGVPAVLKHGLVTF → KSEIDLEEELESEQVAE
         386-704: Missing.

    Show »
    Length:385
    Mass (Da):44,198
    Checksum:i921238D779AF1F05
    GO

    Experimental Info

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Sequence conflicti60Missing in CAD29746 (PubMed:12847554).Curated1
    Sequence conflicti232Q → H in CAD29746 (PubMed:12847554).Curated1

    Alternative sequence

    Feature keyPosition(s)DescriptionActionsGraphical viewLength
    Alternative sequenceiVSP_036816369 – 385IVDCG…GLVTF → KSEIDLEEELESEQVAE in isoform 3. 1 PublicationAdd BLAST17
    Alternative sequenceiVSP_036817386 – 704Missing in isoform 3. 1 PublicationAdd BLAST319
    Alternative sequenceiVSP_036818443L → QPSRALPNLVKRIIGGRNAE LGLFPWQALIVVEDTSRIPN DKWFGSGALLSESWILTAAH VLRSQRRDNTVIPVSKDHVT VYLGLHDVRDKSGAVNSSAA RVVLHPDFNIQNYNHDIALV QLQEPVPLGAHVMPICLPRP EPEGPAPHMLGLVAGWGISN PNVTVDEIIISGTRTLSDVL QYVKLPVVSHAECKASYESR SGNYSVTENMFCAGYYEGGK DTCLGDSGGAFVIFDEMSQR WVAQGLVSWGGPEECGSKQV YGVYTKVSNYVDWLLEEMNS PRGVRELQVER in isoform 2. 1 Publication1
    Alternative sequenceiVSP_036819444 – 704Missing in isoform 2. 1 PublicationAdd BLAST261

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    AJ457084 mRNA. Translation: CAD29746.1.
    AJ487624 mRNA. Translation: CAD32173.1.
    BC085685 mRNA. Translation: AAH85685.1.
    AJ277423 mRNA. Translation: CAB89695.1.
    AF004661 mRNA. Translation: AAB65832.1.
    RefSeqiNP_071593.1. NM_022257.1.
    XP_006248588.1. XM_006248526.3. [Q8CHN8-2]
    XP_008767026.1. XM_008768804.2. [Q8CHN8-3]
    UniGeneiRn.203236.

    Genome annotation databases

    EnsembliENSRNOT00000047678; ENSRNOP00000044812; ENSRNOG00000001827. [Q8CHN8-3]
    GeneIDi64023.
    KEGGirno:64023.
    UCSCiRGD:620213. rat. [Q8CHN8-1]

    Keywords - Coding sequence diversityi

    Alternative splicing

    Cross-referencesi

    Sequence databases

    Select the link destinations:
    EMBLi
    GenBanki
    DDBJi
    Links Updated
    AJ457084 mRNA. Translation: CAD29746.1.
    AJ487624 mRNA. Translation: CAD32173.1.
    BC085685 mRNA. Translation: AAH85685.1.
    AJ277423 mRNA. Translation: CAB89695.1.
    AF004661 mRNA. Translation: AAB65832.1.
    RefSeqiNP_071593.1. NM_022257.1.
    XP_006248588.1. XM_006248526.3. [Q8CHN8-2]
    XP_008767026.1. XM_008768804.2. [Q8CHN8-3]
    UniGeneiRn.203236.

    3D structure databases

    Select the link destinations:
    PDBei
    RCSB PDBi
    PDBji
    Links Updated
    PDB entryMethodResolution (Å)ChainPositionsPDBsum
    3POBX-ray1.80A188-301[»]
    3POEX-ray1.50A188-301[»]
    3POFX-ray1.50A/B188-301[»]
    3POGX-ray2.75A/B/C188-301[»]
    3POIX-ray1.70A/B188-301[»]
    3POJX-ray1.45A/B188-301[»]
    SMRiQ8CHN8.
    ModBaseiSearch...
    MobiDBiSearch...

    Protein family/group databases

    MEROPSiS01.198.

    Proteomic databases

    PRIDEiQ8CHN8.

    Protocols and materials databases

    Structural Biology KnowledgebaseSearch...

    Genome annotation databases

    EnsembliENSRNOT00000047678; ENSRNOP00000044812; ENSRNOG00000001827. [Q8CHN8-3]
    GeneIDi64023.
    KEGGirno:64023.
    UCSCiRGD:620213. rat. [Q8CHN8-1]

    Organism-specific databases

    CTDi5648.
    RGDi620213. Masp1.

    Phylogenomic databases

    GeneTreeiENSGT00760000118890.
    HOGENOMiHOG000237311.
    InParanoidiQ8CHN8.
    KOiK03992.
    OrthoDBiEOG091G02DS.
    PhylomeDBiQ8CHN8.
    TreeFamiTF330373.

    Enzyme and pathway databases

    ReactomeiR-RNO-166662. Lectin pathway of complement activation.
    R-RNO-166663. Initial triggering of complement.
    R-RNO-2855086. Ficolins bind to repetitive carbohydrate structures on the target cell surface.
    R-RNO-3000480. Scavenging by Class A Receptors.

    Miscellaneous databases

    EvolutionaryTraceiQ8CHN8.
    PROiQ8CHN8.

    Gene expression databases

    BgeeiENSRNOG00000001827.
    ExpressionAtlasiQ8CHN8. baseline and differential.
    GenevisibleiQ8CHN8. RN.

    Family and domain databases

    CDDicd00033. CCP. 2 hits.
    cd00041. CUB. 2 hits.
    cd00190. Tryp_SPc. 1 hit.
    Gene3Di2.60.120.290. 2 hits.
    InterProiIPR000859. CUB_dom.
    IPR001881. EGF-like_Ca-bd_dom.
    IPR013032. EGF-like_CS.
    IPR018097. EGF_Ca-bd_CS.
    IPR024175. Pept_S1A_C1r/C1S/mannan-bd.
    IPR009003. Peptidase_S1_PA.
    IPR001314. Peptidase_S1A.
    IPR000436. Sushi_SCR_CCP_dom.
    IPR001254. Trypsin_dom.
    IPR018114. TRYPSIN_HIS.
    IPR033116. TRYPSIN_SER.
    [Graphical view]
    PfamiPF00431. CUB. 2 hits.
    PF07645. EGF_CA. 1 hit.
    PF00084. Sushi. 2 hits.
    PF00089. Trypsin. 1 hit.
    [Graphical view]
    PIRSFiPIRSF001155. C1r_C1s_MASP. 1 hit.
    PRINTSiPR00722. CHYMOTRYPSIN.
    SMARTiSM00032. CCP. 2 hits.
    SM00042. CUB. 2 hits.
    SM00179. EGF_CA. 1 hit.
    SM00020. Tryp_SPc. 1 hit.
    [Graphical view]
    SUPFAMiSSF49854. SSF49854. 2 hits.
    SSF50494. SSF50494. 1 hit.
    SSF57535. SSF57535. 1 hit.
    PROSITEiPS00010. ASX_HYDROXYL. 1 hit.
    PS01180. CUB. 2 hits.
    PS01186. EGF_2. 1 hit.
    PS01187. EGF_CA. 1 hit.
    PS50923. SUSHI. 2 hits.
    PS50240. TRYPSIN_DOM. 1 hit.
    PS00134. TRYPSIN_HIS. 1 hit.
    PS00135. TRYPSIN_SER. 1 hit.
    [Graphical view]
    ProtoNetiSearch...

    Entry informationi

    Entry nameiMASP1_RAT
    AccessioniPrimary (citable) accession number: Q8CHN8
    Secondary accession number(s): O09020
    , Q5U365, Q8CG41, Q9JJS9
    Entry historyi
    Integrated into UniProtKB/Swiss-Prot: April 14, 2009
    Last sequence update: April 14, 2009
    Last modified: November 30, 2016
    This is version 106 of the entry and version 2 of the sequence. [Complete history]
    Entry statusiReviewed (UniProtKB/Swiss-Prot)
    Annotation programChordata Protein Annotation Program

    Miscellaneousi

    Keywords - Technical termi

    3D-structure, Complete proteome, Reference proteome

    Documents

    1. PDB cross-references
      Index of Protein Data Bank (PDB) cross-references
    2. Peptidase families
      Classification of peptidase families and list of entries
    3. SIMILARITY comments
      Index of protein domains and families

    Similar proteinsi

    Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
    100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
    90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
    50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.