Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Serine protease HTRA1

Gene

HTRA1

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Serine protease with a variety of targets, including extracellular matrix proteins such as fibronectin. HTRA1-generated fibronectin fragments further induce synovial cells to up-regulate MMP1 and MMP3 production. May also degrade proteoglycans, such as aggrecan, decorin and fibromodulin. Through cleavage of proteoglycans, may release soluble FGF-glycosaminoglycan complexes that promote the range and intensity of FGF signals in the extracellular space. Regulates the availability of insulin-like growth factors (IGFs) by cleaving IGF-binding proteins. Inhibits signaling mediated by TGF-beta family members. This activity requires the integrity of the catalytic site, although it is unclear whether TGF-beta proteins are themselves degraded. By acting on TGF-beta signaling, may regulate many physiological processes, including retinal angiogenesis and neuronal survival and maturation during development. Intracellularly, degrades TSC2, leading to the activation of TSC2 downstream targets.3 Publications

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sitei169Involved in trimer stabilization1 Publication1
Sitei171Involved in trimer stabilization1 Publication1
Active sitei220Charge relay system1 Publication1
Active sitei250Charge relay system1 Publication1
Sitei278Involved in trimer stabilization1 Publication1
Active sitei328Charge relay system1 Publication1

GO - Molecular functioni

GO - Biological processi

Complete GO annotation...

Keywords - Molecular functioni

Hydrolase, Protease, Serine protease

Keywords - Ligandi

Growth factor binding

Enzyme and pathway databases

BioCyciZFISH:ENSG00000166033-MONOMER.
BRENDAi3.4.21.108. 2681.
ReactomeiR-HSA-1474228. Degradation of the extracellular matrix.

Protein family/group databases

MEROPSiS01.277.

Names & Taxonomyi

Protein namesi
Recommended name:
Serine protease HTRA1 (EC:3.4.21.-)
Alternative name(s):
High-temperature requirement A serine peptidase 1
L56
Serine protease 11
Gene namesi
Name:HTRA1
Synonyms:HTRA, PRSS11
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 10

Organism-specific databases

HGNCiHGNC:9476. HTRA1.

Subcellular locationi

GO - Cellular componenti

  • cytosol Source: UniProtKB-SubCell
  • extracellular exosome Source: UniProtKB
  • extracellular matrix Source: BHF-UCL
  • extracellular region Source: Reactome
  • extracellular space Source: ProtInc
  • plasma membrane Source: HPA
Complete GO annotation...

Keywords - Cellular componenti

Cell membrane, Cytoplasm, Membrane, Secreted

Pathology & Biotechi

Involvement in diseasei

Macular degeneration, age-related, 7 (ARMD7)2 Publications
Disease susceptibility is associated with variations affecting the gene represented in this entry.
Disease descriptionA form of age-related macular degeneration, a multifactorial eye disease and the most common cause of irreversible vision loss in the developed world. In most patients, the disease is manifest as ophthalmoscopically visible yellowish accumulations of protein and lipid that lie beneath the retinal pigment epithelium and within an elastin-containing structure known as Bruch membrane.
See also OMIM:610149
Cerebral arteriopathy, autosomal recessive, with subcortical infarcts and leukoencephalopathy (CARASIL)1 Publication
The disease is caused by mutations affecting the gene represented in this entry.
Disease descriptionA cerebrovascular disease characterized by non-hypertensive arteriopathy of cerebral small vessels with subcortical infarcts, alopecia, and spondylosis. Small cerebral arteries show arteriosclerotic changes, fibrous intimal proliferation, and hyaline degeneration with splitting of the intima and/or the internal elastic membrane. Neurologic features include progressive dementia, gait disturbances, extrapyramidal and pyramidal signs, and demyelination of the cerebral white matter with sparing of U fibers.
See also OMIM:600142
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_063148252A → T in CARASIL; has 21 to 50% normal protease activity; is unable to suppress TGF-beta activity. 1 PublicationCorresponds to variant rs113993968dbSNPEnsembl.1
Natural variantiVAR_063149297V → M in CARASIL; has 21 to 50% normal protease activity; is unable to suppress TGF-beta activity. 1 PublicationCorresponds to variant rs113993969dbSNPEnsembl.1
Cerebral arteriopathy, autosomal dominant, with subcortical infarcts and leukoencephalopathy, 2 (CADASIL2)1 Publication
The disease is caused by mutations affecting the gene represented in this entry.
Disease descriptionA cerebrovascular disease characterized by multiple subcortical infarcts, pseudobulbar palsy, dementia, and the presence of granular deposits in small cerebral arteries producing ischemic stroke.
See also OMIM:616779
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_076373121S → R in CADASIL2. 1 Publication1
Natural variantiVAR_076374123A → S in CADASIL2. 1 Publication1
Natural variantiVAR_076375133R → G in CADASIL2. 1 Publication1
Natural variantiVAR_076376166R → L in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076377173A → P in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076378284S → G in CADASIL2; partial loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076379284S → R in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076380285P → Q in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076381286F → V in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076382450D → H in CADASIL2; unknown pathological significance; small decrease, if any, in proteolytic activity. 1 Publication1

Mutagenesis

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Mutagenesisi328S → A: Loss of activity. 2 Publications1

Keywords - Diseasei

Age-related macular degeneration, Disease mutation

Organism-specific databases

DisGeNETi5654.
MalaCardsiHTRA1.
MIMi600142. phenotype.
610149. phenotype.
616779. phenotype.
OpenTargetsiENSG00000166033.
Orphaneti279. Age-related macular degeneration.
199354. CARASIL.
PharmGKBiPA33829.

Polymorphism and mutation databases

BioMutaiHTRA1.
DMDMi18202620.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Signal peptidei1 – 22Sequence analysisAdd BLAST22
ChainiPRO_000002694323 – 480Serine protease HTRA1Add BLAST458

Proteomic databases

EPDiQ92743.
PaxDbiQ92743.
PeptideAtlasiQ92743.
PRIDEiQ92743.

PTM databases

iPTMnetiQ92743.
PhosphoSitePlusiQ92743.

Expressioni

Tissue specificityi

Widely expressed, with strongest expression in placenta (at protein level). Secreted by synovial fibroblasts. Up-regulated in osteoarthritis and rheumatoid arthritis synovial fluids and cartilage as compared with non-arthritic (at protein level).3 Publications

Developmental stagei

In the placenta, in the first trimester of gestation, low expression in the cells surrounding villi both in the inner layer of the cytotrophoblast and in the outer layer of the syncytiotrophoblast (at protein level). In the third trimester of gestation, very strong expression in the outer layer forming the syncytiotrophoblast and lower in the cytotrophoblast (at protein level).1 Publication

Gene expression databases

BgeeiENSG00000166033.
CleanExiHS_HTRA1.
ExpressionAtlasiQ92743. baseline and differential.
GenevisibleiQ92743. HS.

Organism-specific databases

HPAiHPA036655.

Interactioni

Subunit structurei

Forms homotrimers. In the presence of substrate, may form higher-order multimers in a PDZ-independent manner. Interacts with TGF-beta family members, including BMP4, TGFB1, TGFB2, activin A and GDF5 (By similarity).By similarity

Protein-protein interaction databases

BioGridi111635. 17 interactors.
DIPiDIP-33195N.
IntActiQ92743. 7 interactors.
MINTiMINT-1198897.
STRINGi9606.ENSP00000357980.

Structurei

Secondary structure

1480
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Helixi43 – 45Combined sources3
Beta strandi57 – 59Combined sources3
Beta strandi65 – 68Combined sources4
Beta strandi74 – 77Combined sources4
Beta strandi87 – 90Combined sources4
Beta strandi108 – 113Combined sources6
Beta strandi118 – 123Combined sources6
Beta strandi125 – 128Combined sources4
Helixi129 – 141Combined sources13
Helixi165 – 168Combined sources4
Helixi171 – 179Combined sources9
Helixi180 – 182Combined sources3
Beta strandi183 – 191Combined sources9
Beta strandi193 – 196Combined sources4
Beta strandi198 – 208Combined sources11
Turni211 – 213Combined sources3
Beta strandi214 – 218Combined sources5
Turni219 – 221Combined sources3
Beta strandi224 – 231Combined sources8
Beta strandi233 – 235Combined sources3
Beta strandi237 – 246Combined sources10
Turni247 – 250Combined sources4
Beta strandi251 – 255Combined sources5
Helixi270 – 272Combined sources3
Beta strandi278 – 286Combined sources9
Beta strandi289 – 299Combined sources11
Beta strandi301 – 303Combined sources3
Helixi304 – 306Combined sources3
Beta strandi317 – 321Combined sources5
Turni325 – 329Combined sources5
Beta strandi330 – 333Combined sources4
Beta strandi339 – 348Combined sources10
Beta strandi351 – 356Combined sources6
Helixi357 – 368Combined sources12
Beta strandi380 – 382Combined sources3
Beta strandi384 – 389Combined sources6
Helixi392 – 401Combined sources10
Beta strandi411 – 417Combined sources7
Beta strandi419 – 421Combined sources3
Helixi422 – 426Combined sources5
Beta strandi433 – 437Combined sources5
Helixi445 – 454Combined sources10
Beta strandi456 – 464Combined sources9
Beta strandi467 – 473Combined sources7
Beta strandi476 – 478Combined sources3

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
2JOANMR-A380-480[»]
2YTWNMR-A370-480[»]
3NUMX-ray2.75A158-480[»]
3NWUX-ray3.20A/B/C158-375[»]
3NZIX-ray2.75A158-480[»]
3TJNX-ray3.00A/B/D161-367[»]
3TJOX-ray2.30A/B/D161-370[»]
3TJQX-ray2.00A35-156[»]
ProteinModelPortaliQ92743.
SMRiQ92743.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiQ92743.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini33 – 100IGFBP N-terminalPROSITE-ProRule annotationAdd BLAST68
Domaini98 – 157Kazal-likePROSITE-ProRule annotationAdd BLAST60
Domaini365 – 467PDZPROSITE-ProRule annotationAdd BLAST103

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni204 – 364Serine proteaseAdd BLAST161

Domaini

The IGFBP N-terminal domain mediates interaction with TSC2 substrate.

Sequence similaritiesi

Belongs to the peptidase S1C family.Curated
Contains 1 IGFBP N-terminal domain.PROSITE-ProRule annotation
Contains 1 Kazal-like domain.PROSITE-ProRule annotation
Contains 1 PDZ (DHR) domain.PROSITE-ProRule annotation

Keywords - Domaini

Signal

Phylogenomic databases

eggNOGiKOG1320. Eukaryota.
COG0265. LUCA.
GeneTreeiENSGT00510000046315.
HOGENOMiHOG000223641.
HOVERGENiHBG052044.
InParanoidiQ92743.
KOiK08784.
OMAiGLCVCAS.
OrthoDBiEOG091G0LXR.
PhylomeDBiQ92743.
TreeFamiTF323480.

Family and domain databases

Gene3Di2.30.42.10. 1 hit.
InterProiIPR009030. Growth_fac_rcpt_.
IPR000867. IGFBP-like.
IPR002350. Kazal_dom.
IPR001478. PDZ.
IPR009003. Peptidase_S1_PA.
IPR001940. Peptidase_S1C.
[Graphical view]
PfamiPF00219. IGFBP. 1 hit.
PF07648. Kazal_2. 1 hit.
PF00595. PDZ. 1 hit.
[Graphical view]
PRINTSiPR00834. PROTEASES2C.
SMARTiSM00121. IB. 1 hit.
SM00280. KAZAL. 1 hit.
SM00228. PDZ. 1 hit.
[Graphical view]
SUPFAMiSSF50156. SSF50156. 1 hit.
SSF50494. SSF50494. 1 hit.
SSF57184. SSF57184. 1 hit.
PROSITEiPS51323. IGFBP_N_2. 1 hit.
PS51465. KAZAL_2. 1 hit.
PS50106. PDZ. 1 hit.
[Graphical view]

Sequencei

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

Q92743-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MQIPRAALLP LLLLLLAAPA SAQLSRAGRS APLAAGCPDR CEPARCPPQP
60 70 80 90 100
EHCEGGRARD ACGCCEVCGA PEGAACGLQE GPCGEGLQCV VPFGVPASAT
110 120 130 140 150
VRRRAQAGLC VCASSEPVCG SDANTYANLC QLRAASRRSE RLHRPPVIVL
160 170 180 190 200
QRGACGQGQE DPNSLRHKYN FIADVVEKIA PAVVHIELFR KLPFSKREVP
210 220 230 240 250
VASGSGFIVS EDGLIVTNAH VVTNKHRVKV ELKNGATYEA KIKDVDEKAD
260 270 280 290 300
IALIKIDHQG KLPVLLLGRS SELRPGEFVV AIGSPFSLQN TVTTGIVSTT
310 320 330 340 350
QRGGKELGLR NSDMDYIQTD AIINYGNSGG PLVNLDGEVI GINTLKVTAG
360 370 380 390 400
ISFAIPSDKI KKFLTESHDR QAKGKAITKK KYIGIRMMSL TSSKAKELKD
410 420 430 440 450
RHRDFPDVIS GAYIIEVIPD TPAEAGGLKE NDVIISINGQ SVVSANDVSD
460 470 480
VIKRESTLNM VVRRGNEDIM ITVIPEEIDP
Length:480
Mass (Da):51,287
Last modified:February 1, 1997 - v1
Checksum:iCA20A99480FB2330
GO

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti323I → T in AAC97211 (PubMed:9852107).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_07637120A → V.1 PublicationCorresponds to variant rs369149111dbSNPEnsembl.1
Natural variantiVAR_07637251E → G.1 Publication1
Natural variantiVAR_076373121S → R in CADASIL2. 1 Publication1
Natural variantiVAR_076374123A → S in CADASIL2. 1 Publication1
Natural variantiVAR_076375133R → G in CADASIL2. 1 Publication1
Natural variantiVAR_076376166R → L in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076377173A → P in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_063148252A → T in CARASIL; has 21 to 50% normal protease activity; is unable to suppress TGF-beta activity. 1 PublicationCorresponds to variant rs113993968dbSNPEnsembl.1
Natural variantiVAR_076378284S → G in CADASIL2; partial loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076379284S → R in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076380285P → Q in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_076381286F → V in CADASIL2; loss of proteolytic activity. 1 Publication1
Natural variantiVAR_063149297V → M in CARASIL; has 21 to 50% normal protease activity; is unable to suppress TGF-beta activity. 1 PublicationCorresponds to variant rs113993969dbSNPEnsembl.1
Natural variantiVAR_076382450D → H in CADASIL2; unknown pathological significance; small decrease, if any, in proteolytic activity. 1 Publication1

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
Y07921 mRNA. Translation: CAA69226.1.
AF157623 Genomic DNA. Translation: AAD41525.1.
CH471066 Genomic DNA. Translation: EAW49312.1.
CH471066 Genomic DNA. Translation: EAW49313.1.
AF097709 mRNA. Translation: AAC97211.1.
CCDSiCCDS7630.1.
RefSeqiNP_002766.1. NM_002775.4.
UniGeneiHs.501280.

Genome annotation databases

EnsembliENST00000368984; ENSP00000357980; ENSG00000166033.
GeneIDi5654.
KEGGihsa:5654.
UCSCiuc001lgj.2. human.

Keywords - Coding sequence diversityi

Polymorphism

Cross-referencesi

Web resourcesi

Atlas of Genetics and Cytogenetics in Oncology and Haematology

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
Y07921 mRNA. Translation: CAA69226.1.
AF157623 Genomic DNA. Translation: AAD41525.1.
CH471066 Genomic DNA. Translation: EAW49312.1.
CH471066 Genomic DNA. Translation: EAW49313.1.
AF097709 mRNA. Translation: AAC97211.1.
CCDSiCCDS7630.1.
RefSeqiNP_002766.1. NM_002775.4.
UniGeneiHs.501280.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
2JOANMR-A380-480[»]
2YTWNMR-A370-480[»]
3NUMX-ray2.75A158-480[»]
3NWUX-ray3.20A/B/C158-375[»]
3NZIX-ray2.75A158-480[»]
3TJNX-ray3.00A/B/D161-367[»]
3TJOX-ray2.30A/B/D161-370[»]
3TJQX-ray2.00A35-156[»]
ProteinModelPortaliQ92743.
SMRiQ92743.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi111635. 17 interactors.
DIPiDIP-33195N.
IntActiQ92743. 7 interactors.
MINTiMINT-1198897.
STRINGi9606.ENSP00000357980.

Protein family/group databases

MEROPSiS01.277.

PTM databases

iPTMnetiQ92743.
PhosphoSitePlusiQ92743.

Polymorphism and mutation databases

BioMutaiHTRA1.
DMDMi18202620.

Proteomic databases

EPDiQ92743.
PaxDbiQ92743.
PeptideAtlasiQ92743.
PRIDEiQ92743.

Protocols and materials databases

DNASUi5654.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000368984; ENSP00000357980; ENSG00000166033.
GeneIDi5654.
KEGGihsa:5654.
UCSCiuc001lgj.2. human.

Organism-specific databases

CTDi5654.
DisGeNETi5654.
GeneCardsiHTRA1.
GeneReviewsiHTRA1.
HGNCiHGNC:9476. HTRA1.
HPAiHPA036655.
MalaCardsiHTRA1.
MIMi600142. phenotype.
602194. gene.
610149. phenotype.
616779. phenotype.
neXtProtiNX_Q92743.
OpenTargetsiENSG00000166033.
Orphaneti279. Age-related macular degeneration.
199354. CARASIL.
PharmGKBiPA33829.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG1320. Eukaryota.
COG0265. LUCA.
GeneTreeiENSGT00510000046315.
HOGENOMiHOG000223641.
HOVERGENiHBG052044.
InParanoidiQ92743.
KOiK08784.
OMAiGLCVCAS.
OrthoDBiEOG091G0LXR.
PhylomeDBiQ92743.
TreeFamiTF323480.

Enzyme and pathway databases

BioCyciZFISH:ENSG00000166033-MONOMER.
BRENDAi3.4.21.108. 2681.
ReactomeiR-HSA-1474228. Degradation of the extracellular matrix.

Miscellaneous databases

ChiTaRSiHTRA1. human.
EvolutionaryTraceiQ92743.
GeneWikiiHTRA1.
GenomeRNAii5654.
PROiQ92743.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000166033.
CleanExiHS_HTRA1.
ExpressionAtlasiQ92743. baseline and differential.
GenevisibleiQ92743. HS.

Family and domain databases

Gene3Di2.30.42.10. 1 hit.
InterProiIPR009030. Growth_fac_rcpt_.
IPR000867. IGFBP-like.
IPR002350. Kazal_dom.
IPR001478. PDZ.
IPR009003. Peptidase_S1_PA.
IPR001940. Peptidase_S1C.
[Graphical view]
PfamiPF00219. IGFBP. 1 hit.
PF07648. Kazal_2. 1 hit.
PF00595. PDZ. 1 hit.
[Graphical view]
PRINTSiPR00834. PROTEASES2C.
SMARTiSM00121. IB. 1 hit.
SM00280. KAZAL. 1 hit.
SM00228. PDZ. 1 hit.
[Graphical view]
SUPFAMiSSF50156. SSF50156. 1 hit.
SSF50494. SSF50494. 1 hit.
SSF57184. SSF57184. 1 hit.
PROSITEiPS51323. IGFBP_N_2. 1 hit.
PS51465. KAZAL_2. 1 hit.
PS50106. PDZ. 1 hit.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiHTRA1_HUMAN
AccessioniPrimary (citable) accession number: Q92743
Secondary accession number(s): D3DRE4, Q9UNS5
Entry historyi
Integrated into UniProtKB/Swiss-Prot: September 26, 2001
Last sequence update: February 1, 1997
Last modified: November 2, 2016
This is version 158 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Direct protein sequencing, Reference proteome

Documents

  1. Human chromosome 10
    Human chromosome 10: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  6. Peptidase families
    Classification of peptidase families and list of entries
  7. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.