Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Papilin

Gene

PAPLN

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 4 out of 5-Experimental evidence at transcript leveli

Functioni

GO - Molecular functioni

Complete GO annotation...

Keywords - Molecular functioni

Protease inhibitor, Serine protease inhibitor

Enzyme and pathway databases

BioCyciZFISH:ENSG00000100767-MONOMER.

Protein family/group databases

MEROPSiI02.972.

Names & Taxonomyi

Protein namesi
Recommended name:
Papilin
Gene namesi
Name:PAPLN
ORF Names:UNQ2420/PRO4977
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 14

Organism-specific databases

HGNCiHGNC:19262. PAPLN.

Subcellular locationi

GO - Cellular componenti

Complete GO annotation...

Keywords - Cellular componenti

Secreted

Pathology & Biotechi

Organism-specific databases

DisGeNETi89932.
OpenTargetsiENSG00000100767.
PharmGKBiPA134914395.

Polymorphism and mutation databases

BioMutaiPAPLN.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Signal peptidei1 – 18Sequence analysisAdd BLAST18
ChainiPRO_000032455019 – 1278PapilinAdd BLAST1260

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Disulfide bondi38 ↔ 74By similarity
Disulfide bondi42 ↔ 79By similarity
Disulfide bondi53 ↔ 64By similarity
Disulfide bondi316 ↔ 355By similarity
Disulfide bondi320 ↔ 360By similarity
Disulfide bondi331 ↔ 343By similarity
Disulfide bondi754 ↔ 804By similarity
Disulfide bondi763 ↔ 787By similarity
Disulfide bondi779 ↔ 800By similarity
Disulfide bondi931 ↔ 978By similarity
Disulfide bondi1065 ↔ 1112By similarity
Disulfide bondi1154 ↔ 1202By similarity

Keywords - PTMi

Disulfide bond

Proteomic databases

PaxDbiO95428.
PeptideAtlasiO95428.
PRIDEiO95428.

PTM databases

iPTMnetiO95428.
PhosphoSitePlusiO95428.

Expressioni

Gene expression databases

BgeeiENSG00000100767.
CleanExiHS_PAPLN.
ExpressionAtlasiO95428. baseline and differential.
GenevisibleiO95428. HS.

Organism-specific databases

HPAiHPA048682.
HPA053453.

Interactioni

Protein-protein interaction databases

BioGridi124645. 8 interactors.
IntActiO95428. 1 interactor.
STRINGi9606.ENSP00000345395.

Structurei

3D structure databases

ProteinModelPortaliO95428.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Domaini26 – 80TSP type-1 1PROSITE-ProRule annotationAdd BLAST55
Domaini304 – 361TSP type-1 2PROSITE-ProRule annotationAdd BLAST58
Domaini362 – 421TSP type-1 3PROSITE-ProRule annotationAdd BLAST60
Domaini423 – 481TSP type-1 4PROSITE-ProRule annotationAdd BLAST59
Domaini484 – 539TSP type-1 5PROSITE-ProRule annotationAdd BLAST56
Domaini754 – 804BPTI/Kunitz inhibitorPROSITE-ProRule annotationAdd BLAST51
Domaini900 – 995Ig-like C2-type 1Add BLAST96
Domaini1033 – 1128Ig-like C2-type 2Add BLAST96
Domaini1133 – 1218Ig-like C2-type 3Add BLAST86
Domaini1231 – 1270PLACPROSITE-ProRule annotationAdd BLAST40

Sequence similaritiesi

Belongs to the papilin family.Curated
Contains 1 BPTI/Kunitz inhibitor domain.PROSITE-ProRule annotation
Contains 1 PLAC domain.PROSITE-ProRule annotation
Contains 5 TSP type-1 domains.PROSITE-ProRule annotation

Keywords - Domaini

Immunoglobulin domain, Repeat, Signal

Phylogenomic databases

eggNOGiKOG3510. Eukaryota.
KOG4597. Eukaryota.
ENOG410XQNP. LUCA.
GeneTreeiENSGT00760000118885.
HOVERGENiHBG108281.
InParanoidiO95428.
OMAiSVNIRWS.
OrthoDBiEOG091G14M8.
PhylomeDBiO95428.
TreeFamiTF316874.

Family and domain databases

Gene3Di2.60.40.10. 3 hits.
4.10.410.10. 1 hit.
InterProiIPR010294. ADAM_spacer1.
IPR007110. Ig-like_dom.
IPR013783. Ig-like_fold.
IPR013098. Ig_I-set.
IPR003599. Ig_sub.
IPR003598. Ig_sub2.
IPR013106. Ig_V-set.
IPR002223. Kunitz_BPTI.
IPR013273. Peptidase_M12B_ADAM-TS.
IPR010909. PLAC.
IPR020901. Prtase_inh_Kunz-CS.
IPR000884. TSP1_rpt.
[Graphical view]
PfamiPF05986. ADAM_spacer1. 1 hit.
PF07679. I-set. 2 hits.
PF00014. Kunitz_BPTI. 1 hit.
PF08686. PLAC. 1 hit.
PF00090. TSP_1. 5 hits.
[Graphical view]
PRINTSiPR01857. ADAMTSFAMILY.
PR00759. BASICPTASE.
SMARTiSM00409. IG. 3 hits.
SM00408. IGc2. 3 hits.
SM00406. IGv. 3 hits.
SM00131. KU. 1 hit.
SM00209. TSP1. 5 hits.
[Graphical view]
SUPFAMiSSF48726. SSF48726. 3 hits.
SSF57362. SSF57362. 1 hit.
SSF82895. SSF82895. 5 hits.
PROSITEiPS00280. BPTI_KUNITZ_1. 1 hit.
PS50279. BPTI_KUNITZ_2. 1 hit.
PS50835. IG_LIKE. 3 hits.
PS50900. PLAC. 1 hit.
PS50092. TSP1. 5 hits.
[Graphical view]

Sequences (6)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 6 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: O95428-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MRLLLLVPLL LAPAPGSSAP KVRRQSDTWG PWSQWSPCSR TCGGGVSFRE
60 70 80 90 100
RPCYSQRRDG GSSCVGPARS HRSCRTESCP DGARDFRAEQ CAEFDGAEFQ
110 120 130 140 150
GRRYRWLPYY SAPNKCELNC IPKGENFYYK HREAVVDGTP CEPGKRDVCV
160 170 180 190 200
DGSCRVVGCD HELDSSKQED KCLRCGGDGT TCYPVAGTFD ANDLSRGYNQ
210 220 230 240 250
ILIVPMGATS ILIDEAAASR NFLAVKNVRG EYYLNGHWTI EAARALPAAS
260 270 280 290 300
TILHYERGAE GDLAPERLHA RGPTSEPLVI ELISQEPNPG VHYEYHLPLR
310 320 330 340 350
RPSPGFSWSH GSWSDCSAEC GGGHQSRLVF CTIDHEAYPD HMCQRQPRPA
360 370 380 390 400
DRRSCNLHPC PETKRWKAGP WAPCSASCGG GSQSRSVYCI SSDGAGIQEA
410 420 430 440 450
VEEAECAGLP GKPPAIQACN LQRCAAWSPE PWGECSVSCG VGVRKRSVTC
460 470 480 490 500
RGERGSLLHT AACSLEDRPP LTEPCVHEDC PLLSDQAWHV GTWGLCSKSC
510 520 530 540 550
SSGTRRRQVI CAIGPPSHCG SLQHSKPVDV EPCNTQPCHL PQEVPSMQDV
560 570 580 590 600
HTPASNPWMP LGPQESPASD SRGQWWAAQE HPSARGDHRG ERGDPRGDQG
610 620 630 640 650
THLSALGPAP SLQQPPYQQP LRSGSGPHDC RHSPHGCCPD GHTASLGPQW
660 670 680 690 700
QGCPGAPCQQ SRYGCCPDRV SVAEGPHHAG CTKSYGGDST GGMPRSRAVA
710 720 730 740 750
STVHNTHQPQ AQQNEPSECR GSQFGCCYDN VATAAGPLGE GCVGQPSHAY
760 770 780 790 800
PVRCLLPSAH GSCADWAARW YFVASVGQCN RFWYGGCHGN ANNFASEQEC
810 820 830 840 850
MSSCQGSLHG PRRPQPGASG RSTHTDGGGS SPAGEQEPSQ HRTGAAVQRK
860 870 880 890 900
PWPSGGLWRQ DQQPGPGEAP HTQAFGEWPW GQELGSRAPG LGGDAGSPAP
910 920 930 940 950
PFHSSSYRIS LAGVEPSLVQ AALGQLVRLS CSDDTAPESQ AAWQKDGQPI
960 970 980 990 1000
SSDRHRLQFD GSLIIHPLQA EDAGTYSCGS TRPGRDSQKI QLRIIGGDMA
1010 1020 1030 1040 1050
VLSEAELSRF PQPRDPAQDF GQAGAAGPLG AIPSSHPQPA NRLRLDQNQP
1060 1070 1080 1090 1100
RVVDASPGQR IRMTCRAEGF PPPAIEWQRD GQPVSSPRHQ LQPDGSLVIS
1110 1120 1130 1140 1150
RVAVEDGGFY TCVAFNGQDR DQRWVQLRVL GELTISGLPP TVTVPEGDTA
1160 1170 1180 1190 1200
RLLCVVAGES VNIRWSRNGL PVQADGHRVH QSPDGTLLIY NLRARDEGSY
1210 1220 1230 1240 1250
TCSAYQGSQA VSRSTEVKVV SPAPTAQPRD PGRDCVDQPE LANCDLILQA
1260 1270
QLCGNEYYSS FCCASCSRFQ PHAQPIWQ
Length:1,278
Mass (Da):137,700
Last modified:July 7, 2009 - v4
Checksum:iA8DED3CADB0D70C1
GO
Isoform 2 (identifier: O95428-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-801: Missing.
     802-907: SSCQGSLHGP...PAPPFHSSSY → MGPVVPSLGL...PPTDLTSHLS

Note: No experimental confirmation available.
Show »
Length:477
Mass (Da):51,082
Checksum:i1979A1F2428DE816
GO
Isoform 5 (identifier: O95428-5) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     749-764: Missing.

Show »
Length:1,262
Mass (Da):136,073
Checksum:i3D334E4C30CDE94F
GO
Isoform 6 (identifier: O95428-6) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     197-223: Missing.

Note: No experimental confirmation available.
Show »
Length:1,251
Mass (Da):134,840
Checksum:i149C41975845C4E9
GO
Isoform 3 (identifier: O95428-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     124-129: GENFYY → VLGLQA
     130-1278: Missing.

Note: No experimental confirmation available.
Show »
Length:129
Mass (Da):14,368
Checksum:iD51144A494C12B3B
GO
Isoform 4 (identifier: O95428-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1088-1109: RHQLQPDGSLVISRVAVEDGGF → STHRPAQGPWQGLRRPARAGQL
     1110-1278: Missing.

Note: No experimental confirmation available.
Show »
Length:1,109
Mass (Da):119,209
Checksum:iFB23C426C3F47E83
GO

Sequence cautioni

The sequence AAC97963 differs from that shown. Reason: Erroneous gene model prediction.Curated
The sequence BAC86235 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence BAG57189 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.Curated
The sequence BAG57189 differs from that shown. Reason: Erroneous termination at position 1216. Translated as Glu.Curated
The sequence BAG57757 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.Curated
The sequence CAH56406 differs from that shown. Partially unspliced mRNA.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti326S → P in BAG57757 (PubMed:14702039).Curated1
Sequence conflicti693M → R in BAG57757 (PubMed:14702039).Curated1
Sequence conflicti693M → R in CAD97826 (PubMed:17974005).Curated1
Sequence conflicti693M → R in BAC85123 (Ref. 6) Curated1
Sequence conflicti1162N → D in BAG57189 (PubMed:14702039).Curated1
Sequence conflicti1229R → K in CAD97826 (PubMed:17974005).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_03981533S → G.Corresponds to variant rs2280792dbSNPEnsembl.1
Natural variantiVAR_039816191A → T.Corresponds to variant rs741842dbSNPEnsembl.1
Natural variantiVAR_039817356N → H.Corresponds to variant rs17126331dbSNPEnsembl.1
Natural variantiVAR_039818443V → I.Corresponds to variant rs17126352dbSNPEnsembl.1
Natural variantiVAR_039819461A → V.Corresponds to variant rs17126354dbSNPEnsembl.1
Natural variantiVAR_039820628H → R.Corresponds to variant rs17182244dbSNPEnsembl.1
Natural variantiVAR_039821723Q → H.Corresponds to variant rs2242616dbSNPEnsembl.1
Natural variantiVAR_039822896G → R.1 PublicationCorresponds to variant rs177386dbSNPEnsembl.1
Natural variantiVAR_0398231192L → V.Corresponds to variant rs2107731dbSNPEnsembl.1
Natural variantiVAR_0398241201T → M.1 PublicationCorresponds to variant rs4903104dbSNPEnsembl.1
Natural variantiVAR_0398251260S → T.Corresponds to variant rs11626824dbSNPEnsembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0322691 – 801Missing in isoform 2. 1 PublicationAdd BLAST801
Alternative sequenceiVSP_032270124 – 129GENFYY → VLGLQA in isoform 3. 1 Publication6
Alternative sequenceiVSP_032271130 – 1278Missing in isoform 3. 1 PublicationAdd BLAST1149
Alternative sequenceiVSP_037595197 – 223Missing in isoform 6. 1 PublicationAdd BLAST27
Alternative sequenceiVSP_037596749 – 764Missing in isoform 5. 1 PublicationAdd BLAST16
Alternative sequenceiVSP_032272802 – 907SSCQG…HSSSY → MGPVVPSLGLLEGAPTRMVA AAVLQASRNPASTGQGPRCR ESPGLLVVSGGKTNSLGQGR PPTPRPLENGHGGRSLGPGP LDWVEMPDHQRHPSTAPPTD LTSHLS in isoform 2. 1 PublicationAdd BLAST106
Alternative sequenceiVSP_0322731088 – 1109RHQLQ…EDGGF → STHRPAQGPWQGLRRPARAG QL in isoform 4. 1 PublicationAdd BLAST22
Alternative sequenceiVSP_0322741110 – 1278Missing in isoform 4. 1 PublicationAdd BLAST169

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AY358330 mRNA. Translation: AAQ88696.1.
AK125658 mRNA. Translation: BAC86235.1. Different initiation.
AK294560 mRNA. Translation: BAG57757.1. Different initiation.
AK293773 mRNA. Translation: BAG57189.1. Sequence problems.
AC004846 Genomic DNA. No translation available.
AF109907 Genomic DNA. Translation: AAC97963.1. Sequence problems.
BC042057 mRNA. Translation: AAH42057.1.
AL110280 mRNA. Translation: CAH56406.1. Sequence problems.
BX470414 mRNA. No translation available.
BX537757 mRNA. Translation: CAD97826.1.
AK131073 mRNA. Translation: BAC85123.1.
CCDSiCCDS32114.1. [O95428-6]
RefSeqiNP_775733.3. NM_173462.3. [O95428-6]
XP_011535592.1. XM_011537290.2. [O95428-1]
XP_011535593.1. XM_011537291.2. [O95428-1]
XP_011535594.1. XM_011537292.2. [O95428-1]
UniGeneiHs.509909.

Genome annotation databases

EnsembliENST00000340738; ENSP00000345395; ENSG00000100767. [O95428-6]
ENST00000554301; ENSP00000451803; ENSG00000100767. [O95428-1]
ENST00000555445; ENSP00000451729; ENSG00000100767. [O95428-5]
GeneIDi89932.
KEGGihsa:89932.
UCSCiuc001xnw.5. human. [O95428-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AY358330 mRNA. Translation: AAQ88696.1.
AK125658 mRNA. Translation: BAC86235.1. Different initiation.
AK294560 mRNA. Translation: BAG57757.1. Different initiation.
AK293773 mRNA. Translation: BAG57189.1. Sequence problems.
AC004846 Genomic DNA. No translation available.
AF109907 Genomic DNA. Translation: AAC97963.1. Sequence problems.
BC042057 mRNA. Translation: AAH42057.1.
AL110280 mRNA. Translation: CAH56406.1. Sequence problems.
BX470414 mRNA. No translation available.
BX537757 mRNA. Translation: CAD97826.1.
AK131073 mRNA. Translation: BAC85123.1.
CCDSiCCDS32114.1. [O95428-6]
RefSeqiNP_775733.3. NM_173462.3. [O95428-6]
XP_011535592.1. XM_011537290.2. [O95428-1]
XP_011535593.1. XM_011537291.2. [O95428-1]
XP_011535594.1. XM_011537292.2. [O95428-1]
UniGeneiHs.509909.

3D structure databases

ProteinModelPortaliO95428.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi124645. 8 interactors.
IntActiO95428. 1 interactor.
STRINGi9606.ENSP00000345395.

Protein family/group databases

MEROPSiI02.972.

PTM databases

iPTMnetiO95428.
PhosphoSitePlusiO95428.

Polymorphism and mutation databases

BioMutaiPAPLN.

Proteomic databases

PaxDbiO95428.
PeptideAtlasiO95428.
PRIDEiO95428.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000340738; ENSP00000345395; ENSG00000100767. [O95428-6]
ENST00000554301; ENSP00000451803; ENSG00000100767. [O95428-1]
ENST00000555445; ENSP00000451729; ENSG00000100767. [O95428-5]
GeneIDi89932.
KEGGihsa:89932.
UCSCiuc001xnw.5. human. [O95428-1]

Organism-specific databases

CTDi89932.
DisGeNETi89932.
GeneCardsiPAPLN.
H-InvDBHIX0019543.
HIX0037927.
HGNCiHGNC:19262. PAPLN.
HPAiHPA048682.
HPA053453.
neXtProtiNX_O95428.
OpenTargetsiENSG00000100767.
PharmGKBiPA134914395.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG3510. Eukaryota.
KOG4597. Eukaryota.
ENOG410XQNP. LUCA.
GeneTreeiENSGT00760000118885.
HOVERGENiHBG108281.
InParanoidiO95428.
OMAiSVNIRWS.
OrthoDBiEOG091G14M8.
PhylomeDBiO95428.
TreeFamiTF316874.

Enzyme and pathway databases

BioCyciZFISH:ENSG00000100767-MONOMER.

Miscellaneous databases

ChiTaRSiPAPLN. human.
GeneWikiiPAPLN.
GenomeRNAii89932.
PROiO95428.

Gene expression databases

BgeeiENSG00000100767.
CleanExiHS_PAPLN.
ExpressionAtlasiO95428. baseline and differential.
GenevisibleiO95428. HS.

Family and domain databases

Gene3Di2.60.40.10. 3 hits.
4.10.410.10. 1 hit.
InterProiIPR010294. ADAM_spacer1.
IPR007110. Ig-like_dom.
IPR013783. Ig-like_fold.
IPR013098. Ig_I-set.
IPR003599. Ig_sub.
IPR003598. Ig_sub2.
IPR013106. Ig_V-set.
IPR002223. Kunitz_BPTI.
IPR013273. Peptidase_M12B_ADAM-TS.
IPR010909. PLAC.
IPR020901. Prtase_inh_Kunz-CS.
IPR000884. TSP1_rpt.
[Graphical view]
PfamiPF05986. ADAM_spacer1. 1 hit.
PF07679. I-set. 2 hits.
PF00014. Kunitz_BPTI. 1 hit.
PF08686. PLAC. 1 hit.
PF00090. TSP_1. 5 hits.
[Graphical view]
PRINTSiPR01857. ADAMTSFAMILY.
PR00759. BASICPTASE.
SMARTiSM00409. IG. 3 hits.
SM00408. IGc2. 3 hits.
SM00406. IGv. 3 hits.
SM00131. KU. 1 hit.
SM00209. TSP1. 5 hits.
[Graphical view]
SUPFAMiSSF48726. SSF48726. 3 hits.
SSF57362. SSF57362. 1 hit.
SSF82895. SSF82895. 5 hits.
PROSITEiPS00280. BPTI_KUNITZ_1. 1 hit.
PS50279. BPTI_KUNITZ_2. 1 hit.
PS50835. IG_LIKE. 3 hits.
PS50900. PLAC. 1 hit.
PS50092. TSP1. 5 hits.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiPPN_HUMAN
AccessioniPrimary (citable) accession number: O95428
Secondary accession number(s): B4DES8
, B4DGE6, Q659F2, Q6UXJ4, Q6ZNM1, Q6ZUJ0, Q7Z681, Q8IVU0
Entry historyi
Integrated into UniProtKB/Swiss-Prot: March 18, 2008
Last sequence update: July 7, 2009
Last modified: November 2, 2016
This is version 121 of the entry and version 4 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 14
    Human chromosome 14: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.