Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Calpain-10

Gene

CAPN10

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Calcium-regulated non-lysosomal thiol-protease which catalyze limited proteolysis of substrates involved in cytoskeletal remodeling and signal transduction. May play a role in insulin-stimulated glucose uptake.1 Publication

Catalytic activityi

Broad endopeptidase specificity.

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Active sitei73By similarity1
Active sitei238By similarity1
Active sitei263By similarity1

GO - Molecular functioni

  • calcium-dependent cysteine-type endopeptidase activity Source: BHF-UCL
  • cytoskeletal protein binding Source: BHF-UCL
  • SNARE binding Source: BHF-UCL

GO - Biological processi

  • actin cytoskeleton reorganization Source: BHF-UCL
  • cellular component disassembly involved in execution phase of apoptosis Source: BHF-UCL
  • cellular response to insulin stimulus Source: BHF-UCL
  • positive regulation of glucose import Source: BHF-UCL
  • positive regulation of insulin secretion Source: BHF-UCL
  • positive regulation of intracellular transport Source: BHF-UCL
  • positive regulation of type B pancreatic cell apoptotic process Source: Ensembl
  • proteolysis Source: BHF-UCL
  • type B pancreatic cell apoptotic process Source: BHF-UCL
Complete GO annotation...

Keywords - Molecular functioni

Hydrolase, Protease, Thiol protease

Enzyme and pathway databases

BioCyciZFISH:ENSG00000142330-MONOMER.
BRENDAi3.4.22.B30. 2681.
ReactomeiR-HSA-1474228. Degradation of the extracellular matrix.

Protein family/group databases

MEROPSiC02.018.

Names & Taxonomyi

Protein namesi
Recommended name:
Calpain-10 (EC:3.4.22.-)
Alternative name(s):
Calcium-activated neutral proteinase 10
Short name:
CANP 10
Gene namesi
Name:CAPN10
Synonyms:KIAA1845
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 2

Organism-specific databases

HGNCiHGNC:1477. CAPN10.

Subcellular locationi

GO - Cellular componenti

  • cell Source: BHF-UCL
  • cytosol Source: BHF-UCL
  • mitochondrion Source: BHF-UCL
  • plasma membrane Source: BHF-UCL
Complete GO annotation...

Pathology & Biotechi

Involvement in diseasei

Diabetes mellitus, non-insulin-dependent, 1 (NIDDM1)2 Publications
Disease susceptibility is associated with variations affecting the gene represented in this entry.
Disease descriptionA multifactorial disorder of glucose homeostasis caused by a lack of sensitivity to the body's own insulin. Affected individuals usually have an obese body habitus and manifestations of a metabolic syndrome characterized by diabetes, insulin resistance, hypertension and hypertriglyceridemia. The disease results in long-term complications that affect the eyes, kidneys, nerves, and blood vessels.
See also OMIM:601283

Keywords - Diseasei

Diabetes mellitus

Organism-specific databases

DisGeNETi11132.
MalaCardsiCAPN10.
MIMi601283. phenotype.
OpenTargetsiENSG00000142330.
PharmGKBiPA26058.

Polymorphism and mutation databases

BioMutaiCAPN10.
DMDMi317373329.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00002077251 – 672Calpain-10Add BLAST672

Proteomic databases

PaxDbiQ9HC96.
PRIDEiQ9HC96.

PTM databases

iPTMnetiQ9HC96.
PhosphoSitePlusiQ9HC96.

Expressioni

Tissue specificityi

Detected in primary skeletal muscle cells (at protein level). Ubiquitous.1 Publication

Gene expression databases

BgeeiENSG00000142330.
ExpressionAtlasiQ9HC96. baseline and differential.
GenevisibleiQ9HC96. HS.

Organism-specific databases

HPAiHPA004170.
HPA056098.

Interactioni

GO - Molecular functioni

Protein-protein interaction databases

BioGridi116305. 5 interactors.
IntActiQ9HC96. 5 interactors.
MINTiMINT-104261.
STRINGi9606.ENSP00000375844.

Structurei

3D structure databases

ProteinModelPortaliQ9HC96.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini13 – 321Calpain catalyticPROSITE-ProRule annotationAdd BLAST309

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni322 – 494Domain III 1Add BLAST173
Regioni513 – 654Domain III 2Add BLAST142

Sequence similaritiesi

Belongs to the peptidase C2 family.Curated
Contains 1 calpain catalytic domain.PROSITE-ProRule annotation

Keywords - Domaini

Repeat

Phylogenomic databases

eggNOGiKOG0045. Eukaryota.
ENOG410XP0B. LUCA.
GeneTreeiENSGT00760000118971.
HOVERGENiHBG050787.
InParanoidiQ9HC96.
KOiK08579.
OMAiRYAQEVS.
OrthoDBiEOG091G02O4.
PhylomeDBiQ9HC96.
TreeFamiTF314748.

Family and domain databases

CDDicd00214. Calpain_III. 2 hits.
cd00044. CysPc. 1 hit.
InterProiIPR033883. C2_III.
IPR022684. Calpain_cysteine_protease.
IPR022682. Calpain_domain_III.
IPR022683. Calpain_III.
IPR028791. CAPN10.
IPR000169. Pept_cys_AS.
IPR001300. Peptidase_C2_calpain_cat.
[Graphical view]
PANTHERiPTHR10183:SF30. PTHR10183:SF30. 2 hits.
PfamiPF01067. Calpain_III. 2 hits.
PF00648. Peptidase_C2. 1 hit.
[Graphical view]
PRINTSiPR00704. CALPAIN.
SMARTiSM00720. calpain_III. 2 hits.
SM00230. CysPc. 1 hit.
[Graphical view]
SUPFAMiSSF49758. SSF49758. 2 hits.
PROSITEiPS50203. CALPAIN_CAT. 1 hit.
PS00139. THIOL_PROTEASE_CYS. 1 hit.
[Graphical view]

Sequences (8)i

Sequence statusi: Complete.

This entry describes 8 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform A (identifier: Q9HC96-1) [UniParc]FASTAAdd to basket
Also known as: CAPN10a

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MRAGRGATPA RELFRDAAFP AADSSLFCDL STPLAQFRED ITWRRPQEIC
60 70 80 90 100
ATPRLFPDDP REGQVKQGLL GDCWFLCACA ALQKSRHLLD QVIPPGQPSW
110 120 130 140 150
ADQEYRGSFT CRIWQFGRWV EVTTDDRLPC LAGRLCFSRC QREDVFWLPL
160 170 180 190 200
LEKVYAKVHG SYEHLWAGQV ADALVDLTGG LAERWNLKGV AGSGGQQDRP
210 220 230 240 250
GRWEHRTCRQ LLHLKDQCLI SCCVLSPRAG ARELGEFHAF IVSDLRELQG
260 270 280 290 300
QAGQCILLLR IQNPWGRRCW QGLWREGGEG WSQVDAAVAS ELLSQLQEGE
310 320 330 340 350
FWVEEEEFLR EFDELTVGYP VTEAGHLQSL YTERLLCHTR ALPGAWVKGQ
360 370 380 390 400
SAGGCRNNSG FPSNPKFWLR VSEPSEVYIA VLQRSRLHAA DWAGRARALV
410 420 430 440 450
GDSHTSWSPA SIPGKHYQAV GLHLWKVEKR RVNLPRVLSM PPVAGTACHA
460 470 480 490 500
YDREVHLRCE LSPGYYLAVP STFLKDAPGE FLLRVFSTGR VSLSAIRAVA
510 520 530 540 550
KNTTPGAALP AGEWGTVQLR GSWRVGQTAG GSRNFASYPT NPCFPFSVPE
560 570 580 590 600
GPGPRCVRIT LHQHCRPSDT EFHPIGFHIF QVPEGGRSQD APPLLLQEPL
610 620 630 640 650
LSCVPHRYAQ EVSRLCLLPA GTYKVVPSTY LPDTEGAFTV TIATRIDRPS
660 670
IHSQEMLGQF LQEVSIMAVM KT
Length:672
Mass (Da):74,952
Last modified:January 11, 2011 - v2
Checksum:i74A48D879E896C71
GO
Isoform B (identifier: Q9HC96-2) [UniParc]FASTAAdd to basket
Also known as: CAPN10b

The sequence of this isoform differs from the canonical sequence as follows:
     494-544: SAIRAVAKNT...FASYPTNPCF → RALAPAASAS...HPHCCCRSRC
     545-672: Missing.

Note: May be produced at very low levels due to a premature stop codon in the mRNA, leading to nonsense-mediated mRNA decay.
Show »
Length:544
Mass (Da):60,657
Checksum:i17CE7B881A20855E
GO
Isoform C (identifier: Q9HC96-3) [UniParc]FASTAAdd to basket
Also known as: CAPN10c

The sequence of this isoform differs from the canonical sequence as follows:
     428-582: Missing.

Show »
Length:517
Mass (Da):57,999
Checksum:i8D81A0FA44993180
GO
Isoform D (identifier: Q9HC96-4) [UniParc]FASTAAdd to basket
Also known as: CAPN10d

The sequence of this isoform differs from the canonical sequence as follows:
     494-513: SAIRAVAKNTTPGAALPAGE → RSQRVEGARTHPHCCCRSRC
     514-672: Missing.

Note: May be produced at very low levels due to a premature stop codon in the mRNA, leading to nonsense-mediated mRNA decay.
Show »
Length:513
Mass (Da):57,816
Checksum:iC66DC853F87AEC9C
GO
Isoform E (identifier: Q9HC96-5) [UniParc]FASTAAdd to basket
Also known as: CAPN10e

The sequence of this isoform differs from the canonical sequence as follows:
     427-444: VEKRRVNLPRVLSMPPVA → GVTLGTTLFPVPSWMWPT
     445-672: Missing.

Note: May be produced at very low levels due to a premature stop codon in the mRNA, leading to nonsense-mediated mRNA decay.
Show »
Length:444
Mass (Da):49,994
Checksum:i32E4706B7D06D58E
GO
Isoform F (identifier: Q9HC96-6) [UniParc]FASTAAdd to basket
Also known as: CAPN10f

The sequence of this isoform differs from the canonical sequence as follows:
     154-274: VYAKVHGSYE...WGRRCWQGLW → GPWVLRAPVG...VLAGALERGG
     275-672: Missing.

Note: May be produced at very low levels due to a premature stop codon in the mRNA, leading to nonsense-mediated mRNA decay.
Show »
Length:274
Mass (Da):29,453
Checksum:i570C2A6CB9B6B903
GO
Isoform G (identifier: Q9HC96-7) [UniParc]FASTAAdd to basket
Also known as: CAPN10g

The sequence of this isoform differs from the canonical sequence as follows:
     93-139: IPPGQPSWAD...CLAGRLCFSR → SCPVQLPADW...PDSATWGSWK
     140-672: Missing.

Show »
Length:139
Mass (Da):15,583
Checksum:iC934A0FD16963EC6
GO
Isoform H (identifier: Q9HC96-8) [UniParc]FASTAAdd to basket
Also known as: CAPN10h

The sequence of this isoform differs from the canonical sequence as follows:
     48-581: Missing.

Show »
Length:138
Mass (Da):15,275
Checksum:i28589100F645B66B
GO

Sequence cautioni

The sequence BAB47474 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti195G → S in BAC11220 (PubMed:14702039).Curated1
Sequence conflicti373E → K in AAG17969 (PubMed:11017071).Curated1
Sequence conflicti373E → K in AAG17971 (PubMed:11017071).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_014437200P → T.1 PublicationCorresponds to variant rs3792268dbSNPEnsembl.1
Natural variantiVAR_014438202R → H.1 PublicationCorresponds to variant rs768407925dbSNPEnsembl.1
Natural variantiVAR_036049276E → G in a colorectal cancer sample; somatic mutation. 1 Publication1
Natural variantiVAR_014439341A → V.1 PublicationCorresponds to variant rs776848131dbSNPEnsembl.1
Natural variantiVAR_014440504T → A.2 PublicationsCorresponds to variant rs7607759dbSNPEnsembl.1
Natural variantiVAR_014441529A → S.1 Publication1
Natural variantiVAR_014442613S → N.1 PublicationCorresponds to variant rs146148004dbSNPEnsembl.1
Natural variantiVAR_014443666I → V.4 PublicationsCorresponds to variant rs2975766dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_00524348 – 581Missing in isoform H. CuratedAdd BLAST534
Alternative sequenceiVSP_00524193 – 139IPPGQ…LCFSR → SCPVQLPADWTCKVQPVWLE FPCLPISCRLRVSSDTSPDS ATWGSWK in isoform G. CuratedAdd BLAST47
Alternative sequenceiVSP_005242140 – 672Missing in isoform G. CuratedAdd BLAST533
Alternative sequenceiVSP_005239154 – 274VYAKV…WQGLW → GPWVLRAPVGRAGGGCPGGP DRRPGRKMEPEGRSRKRRPA GQARPLGAQDLSAAAPPEGP VSDQLLRAQPQSRCPGAGGV PCLHCLGPAGAPGSGGPVHP AAADPEPLGPAVLAGALERG G in isoform F. CuratedAdd BLAST121
Alternative sequenceiVSP_005240275 – 672Missing in isoform F. CuratedAdd BLAST398
Alternative sequenceiVSP_005237427 – 444VEKRR…MPPVA → GVTLGTTLFPVPSWMWPT in isoform E. CuratedAdd BLAST18
Alternative sequenceiVSP_005234428 – 582Missing in isoform C. CuratedAdd BLAST155
Alternative sequenceiVSP_005238445 – 672Missing in isoform E. CuratedAdd BLAST228
Alternative sequenceiVSP_005232494 – 544SAIRA…TNPCF → RALAPAASASLCISTAGPVT PSSTPSASISSRSQRVEGAR THPHCCCRSRC in isoform B. CuratedAdd BLAST51
Alternative sequenceiVSP_005235494 – 513SAIRA…LPAGE → RSQRVEGARTHPHCCCRSRC in isoform D. CuratedAdd BLAST20
Alternative sequenceiVSP_005236514 – 672Missing in isoform D. CuratedAdd BLAST159
Alternative sequenceiVSP_005233545 – 672Missing in isoform B. CuratedAdd BLAST128

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF089088 mRNA. Translation: AAG17966.1.
AF089090 mRNA. Translation: AAG17968.1.
AF089091 mRNA. Translation: AAG17969.1.
AF089092 mRNA. Translation: AAG17970.1.
AF089093 mRNA. Translation: AAG17971.1.
AF089094 mRNA. Translation: AAG17972.1.
AF089095 mRNA. Translation: AAG17973.1.
AF089096 mRNA. Translation: AAG17974.1.
AB058748 mRNA. Translation: BAB47474.1. Different initiation.
AK074807 mRNA. Translation: BAC11220.1.
AC124862 Genomic DNA. Translation: AAX88944.1.
BC004260 mRNA. Translation: AAH04260.1.
BC007553 mRNA. Translation: AAH07553.2.
CCDSiCCDS33420.1. [Q9HC96-3]
CCDS42838.1. [Q9HC96-1]
RefSeqiNP_075571.1. NM_023083.3.
NP_075573.2. NM_023085.3.
UniGeneiHs.728234.

Genome annotation databases

EnsembliENST00000270361; ENSP00000270361; ENSG00000142330. [Q9HC96-6]
ENST00000270364; ENSP00000270364; ENSG00000142330. [Q9HC96-7]
ENST00000352879; ENSP00000289381; ENSG00000142330. [Q9HC96-8]
ENST00000354082; ENSP00000270362; ENSG00000142330. [Q9HC96-3]
ENST00000357048; ENSP00000349556; ENSG00000142330. [Q9HC96-4]
ENST00000391983; ENSP00000375843; ENSG00000142330. [Q9HC96-2]
ENST00000391984; ENSP00000375844; ENSG00000142330. [Q9HC96-1]
ENST00000416591; ENSP00000400144; ENSG00000142330. [Q9HC96-5]
GeneIDi11132.
KEGGihsa:11132.
UCSCiuc002vzk.2. human. [Q9HC96-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF089088 mRNA. Translation: AAG17966.1.
AF089090 mRNA. Translation: AAG17968.1.
AF089091 mRNA. Translation: AAG17969.1.
AF089092 mRNA. Translation: AAG17970.1.
AF089093 mRNA. Translation: AAG17971.1.
AF089094 mRNA. Translation: AAG17972.1.
AF089095 mRNA. Translation: AAG17973.1.
AF089096 mRNA. Translation: AAG17974.1.
AB058748 mRNA. Translation: BAB47474.1. Different initiation.
AK074807 mRNA. Translation: BAC11220.1.
AC124862 Genomic DNA. Translation: AAX88944.1.
BC004260 mRNA. Translation: AAH04260.1.
BC007553 mRNA. Translation: AAH07553.2.
CCDSiCCDS33420.1. [Q9HC96-3]
CCDS42838.1. [Q9HC96-1]
RefSeqiNP_075571.1. NM_023083.3.
NP_075573.2. NM_023085.3.
UniGeneiHs.728234.

3D structure databases

ProteinModelPortaliQ9HC96.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi116305. 5 interactors.
IntActiQ9HC96. 5 interactors.
MINTiMINT-104261.
STRINGi9606.ENSP00000375844.

Protein family/group databases

MEROPSiC02.018.

PTM databases

iPTMnetiQ9HC96.
PhosphoSitePlusiQ9HC96.

Polymorphism and mutation databases

BioMutaiCAPN10.
DMDMi317373329.

Proteomic databases

PaxDbiQ9HC96.
PRIDEiQ9HC96.

Protocols and materials databases

DNASUi11132.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000270361; ENSP00000270361; ENSG00000142330. [Q9HC96-6]
ENST00000270364; ENSP00000270364; ENSG00000142330. [Q9HC96-7]
ENST00000352879; ENSP00000289381; ENSG00000142330. [Q9HC96-8]
ENST00000354082; ENSP00000270362; ENSG00000142330. [Q9HC96-3]
ENST00000357048; ENSP00000349556; ENSG00000142330. [Q9HC96-4]
ENST00000391983; ENSP00000375843; ENSG00000142330. [Q9HC96-2]
ENST00000391984; ENSP00000375844; ENSG00000142330. [Q9HC96-1]
ENST00000416591; ENSP00000400144; ENSG00000142330. [Q9HC96-5]
GeneIDi11132.
KEGGihsa:11132.
UCSCiuc002vzk.2. human. [Q9HC96-1]

Organism-specific databases

CTDi11132.
DisGeNETi11132.
GeneCardsiCAPN10.
HGNCiHGNC:1477. CAPN10.
HPAiHPA004170.
HPA056098.
MalaCardsiCAPN10.
MIMi601283. phenotype.
605286. gene.
neXtProtiNX_Q9HC96.
OpenTargetsiENSG00000142330.
PharmGKBiPA26058.
HUGEiSearch...
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG0045. Eukaryota.
ENOG410XP0B. LUCA.
GeneTreeiENSGT00760000118971.
HOVERGENiHBG050787.
InParanoidiQ9HC96.
KOiK08579.
OMAiRYAQEVS.
OrthoDBiEOG091G02O4.
PhylomeDBiQ9HC96.
TreeFamiTF314748.

Enzyme and pathway databases

BioCyciZFISH:ENSG00000142330-MONOMER.
BRENDAi3.4.22.B30. 2681.
ReactomeiR-HSA-1474228. Degradation of the extracellular matrix.

Miscellaneous databases

ChiTaRSiCAPN10. human.
GeneWikiiCAPN10.
GenomeRNAii11132.
PROiQ9HC96.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000142330.
ExpressionAtlasiQ9HC96. baseline and differential.
GenevisibleiQ9HC96. HS.

Family and domain databases

CDDicd00214. Calpain_III. 2 hits.
cd00044. CysPc. 1 hit.
InterProiIPR033883. C2_III.
IPR022684. Calpain_cysteine_protease.
IPR022682. Calpain_domain_III.
IPR022683. Calpain_III.
IPR028791. CAPN10.
IPR000169. Pept_cys_AS.
IPR001300. Peptidase_C2_calpain_cat.
[Graphical view]
PANTHERiPTHR10183:SF30. PTHR10183:SF30. 2 hits.
PfamiPF01067. Calpain_III. 2 hits.
PF00648. Peptidase_C2. 1 hit.
[Graphical view]
PRINTSiPR00704. CALPAIN.
SMARTiSM00720. calpain_III. 2 hits.
SM00230. CysPc. 1 hit.
[Graphical view]
SUPFAMiSSF49758. SSF49758. 2 hits.
PROSITEiPS50203. CALPAIN_CAT. 1 hit.
PS00139. THIOL_PROTEASE_CYS. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiCAN10_HUMAN
AccessioniPrimary (citable) accession number: Q9HC96
Secondary accession number(s): A8MVS7
, Q4ZFV1, Q8NCD4, Q96IG4, Q96JI2, Q9HC89, Q9HC90, Q9HC91, Q9HC92, Q9HC93, Q9HC94, Q9HC95
Entry historyi
Integrated into UniProtKB/Swiss-Prot: May 27, 2002
Last sequence update: January 11, 2011
Last modified: November 30, 2016
This is version 148 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 2
    Human chromosome 2: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. Peptidase families
    Classification of peptidase families and list of entries
  6. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.