Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Carboxypeptidase D

Gene

svr

Organism
Drosophila melanogaster (Fruit fly)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Required for the proper melanization and sclerotization of the cuticle.3 Publications

Catalytic activityi

Releases C-terminal Arg and Lys from polypeptides.

Cofactori

Zn2+By similarity

pH dependencei

Optimum pH is 7.0 for carboxypeptidase domain 1, and 5.0-6.0 for domain 2.

Sites

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Metal bindingi101Zinc 1; catalyticBy similarity1
Metal bindingi104Zinc 1; catalyticBy similarity1
Metal bindingi217Zinc 1; catalyticBy similarity1
Active sitei305Proton donor/acceptor 1By similarity1
Metal bindingi517Zinc 2; catalyticBy similarity1
Metal bindingi520Zinc 2; catalyticBy similarity1
Metal bindingi626Zinc 2; catalyticBy similarity1
Active sitei730Proton donor/acceptor 2By similarity1

GO - Molecular functioni

  • carboxypeptidase activity Source: FlyBase
  • metallocarboxypeptidase activity Source: FlyBase
  • serine-type carboxypeptidase activity Source: GO_Central
  • structural constituent of cuticle Source: UniProtKB-KW
  • zinc ion binding Source: InterPro

GO - Biological processi

  • imaginal disc-derived wing morphogenesis Source: FlyBase
  • long-term memory Source: FlyBase
  • peptide metabolic process Source: GO_Central
  • phagocytosis Source: FlyBase
  • protein processing Source: GO_Central
Complete GO annotation...

Keywords - Molecular functioni

Carboxypeptidase, Hydrolase, Metalloprotease, Protease

Keywords - Ligandi

Metal-binding, Zinc

Enzyme and pathway databases

BRENDAi3.4.17.22. 1994.

Protein family/group databases

MEROPSiM14.037.

Names & Taxonomyi

Protein namesi
Recommended name:
Carboxypeptidase D (EC:3.4.17.22)
Alternative name(s):
Metallocarboxypeptidase D
Protein silver
Gene namesi
Name:svr
Synonyms:CPD, CpepE
ORF Names:CG4122
OrganismiDrosophila melanogaster (Fruit fly)
Taxonomic identifieri7227 [NCBI]
Taxonomic lineageiEukaryotaMetazoaEcdysozoaArthropodaHexapodaInsectaPterygotaNeopteraEndopterygotaDipteraBrachyceraMuscomorphaEphydroideaDrosophilidaeDrosophilaSophophora
Proteomesi
  • UP000000803 Componenti: Chromosome X

Organism-specific databases

FlyBaseiFBgn0004648. svr.

Subcellular locationi

Topology

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Topological domaini26 – 1312ExtracellularSequence analysisAdd BLAST1287
Transmembranei1313 – 1333HelicalSequence analysisAdd BLAST21
Topological domaini1334 – 1406CytoplasmicSequence analysisAdd BLAST73

GO - Cellular componenti

  • endomembrane system Source: FlyBase
  • extracellular region Source: FlyBase
  • extracellular space Source: GO_Central
  • integral component of membrane Source: UniProtKB-KW
  • perinuclear region of cytoplasm Source: FlyBase
Complete GO annotation...

Keywords - Cellular componenti

Cuticle, Membrane

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Signal peptidei1 – 25Sequence analysisAdd BLAST25
ChainiPRO_000000440526 – 1406Carboxypeptidase DAdd BLAST1381

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Glycosylationi133N-linked (GlcNAc...)Sequence analysis1
Glycosylationi269N-linked (GlcNAc...)Sequence analysis1
Glycosylationi458N-linked (GlcNAc...)Sequence analysis1
Glycosylationi549N-linked (GlcNAc...)Sequence analysis1
Glycosylationi612N-linked (GlcNAc...)Sequence analysis1
Glycosylationi652N-linked (GlcNAc...)Sequence analysis1
Glycosylationi787N-linked (GlcNAc...)Sequence analysis1
Glycosylationi808N-linked (GlcNAc...)Sequence analysis1
Glycosylationi981N-linked (GlcNAc...)Sequence analysis1
Glycosylationi1152N-linked (GlcNAc...)Sequence analysis1
Glycosylationi1251N-linked (GlcNAc...)1 Publication1
Modified residuei1380Phosphoserine1 Publication1

Keywords - PTMi

Glycoprotein, Phosphoprotein

Proteomic databases

PaxDbiP42787.
PRIDEiP42787.

PTM databases

iPTMnetiP42787.

Expressioni

Developmental stagei

Embryonic and adult stages.1 Publication

Gene expression databases

BgeeiFBgn0004648.
ExpressionAtlasiP42787. baseline.
GenevisibleiP42787. DM.

Interactioni

Protein-protein interaction databases

BioGridi57568. 4 interactors.
IntActiP42787. 2 interactors.
MINTiMINT-860328.
STRINGi7227.FBpp0089126.

Structurei

3D structure databases

ProteinModelPortaliP42787.
SMRiP42787.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni26 – 446Domain 1Add BLAST421
Regioni447 – 865Domain 2Add BLAST419
Regioni866 – 1313Domain 3Add BLAST448

Motif

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Motifi1343 – 1345Cell attachment siteSequence analysis3

Domaini

There are 3 carboxypeptidase-like domains. Only the first two domains seem to have kept a catalytic activity.

Sequence similaritiesi

Belongs to the peptidase M14 family.Curated

Keywords - Domaini

Repeat, Signal, Transmembrane, Transmembrane helix

Phylogenomic databases

eggNOGiKOG2649. Eukaryota.
ENOG410XX0H. LUCA.
GeneTreeiENSGT00760000119124.
InParanoidiP42787.
KOiK07752.
OMAiLTPPVKY.
OrthoDBiEOG091G06A9.
PhylomeDBiP42787.

Family and domain databases

Gene3Di2.60.40.1120. 4 hits.
InterProiIPR008969. CarboxyPept-like_regulatory.
IPR014766. CarboxyPept_regulatory_dom.
IPR000834. Peptidase_M14.
[Graphical view]
PfamiPF00246. Peptidase_M14. 2 hits.
[Graphical view]
PRINTSiPR00765. CRBOXYPTASEA.
SMARTiSM00631. Zn_pept. 2 hits.
[Graphical view]
SUPFAMiSSF49464. SSF49464. 4 hits.
PROSITEiPS00132. CARBOXYPEPT_ZN_1. 2 hits.
PS00133. CARBOXYPEPT_ZN_2. 2 hits.
[Graphical view]

Sequences (8)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 8 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: P42787-1) [UniParc]FASTAAdd to basket
Also known as: B, 1B long tail-1

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MPTLGLLFAS IGIAVLAMGV PHCRGYTIKE DESFLQQPHY ASQEQLEDLF
60 70 80 90 100
AGLEKAYPNQ AKVHFLGRSL EGRNLLALQI SRNTRSRNLL TPPVKYIANM
110 120 130 140 150
HGDETVGRQL LVYMAQYLLG NHERISDLGQ LVNSTDIYLV PTMNPDGYAL
160 170 180 190 200
SQEGNCESLP NYVGRGNAAN IDLNRDFPDR LEQSHVHQLR AQSRQPETAA
210 220 230 240 250
LVNWIVSKPF VLSANFHGGA VVASYPYDNS LAHNECCEES LTPDDRVFKQ
260 270 280 290 300
LAHTYSDNHP IMRKGNNCND SFSGGITNGA HWYELSGGMQ DFNYAFSNCF
310 320 330 340 350
ELTIELSCCK YPAASTLPQE WQRNKASLLQ LLRQAHIGIK GLVTDASGFP
360 370 380 390 400
IADANVYVAG LEEKPMRTSK RGEYWRLLTP GLYSVHASAF GYQTSAPQQV
410 420 430 440 450
RVTNDNQEAL RLDFKLAPVE TNFDGNFRKV KVERSEPPQK LKKQFNGFLT
460 470 480 490 500
PTKYEHHNFT AMESYLRAIS SSYPSLTRLY SIGKSVQGRD LWVLEIFATP
510 520 530 540 550
GSHVPGVPEF KYVANMHGNE VVGKELLLIL TKYMLERYGN DDRITKLVNG
560 570 580 590 600
TRMHFLYSMN PDGYEISIEG DRTGGVGRAN AHGIDLNRNF PDQYGTDRFN
610 620 630 640 650
KVTEPEVAAV MNWTLSLPFV LSANLHGGSL VANYPFDDNE NDFNDPFMRL
660 670 680 690 700
RNSSINGRKP NPTEDNALFK HLAGIYSNAH PTMYLGQPCE LFQNEFFPDG
710 720 730 740 750
ITNGAQWYSV TGGMQDWNYV RAGCLELTIE MGCDKFPKAA ELSRYWEDHR
760 770 780 790 800
EPLLQFIEQV HCGIHGFVHS TIGTPIAGAV VRLDGANHST YSQVFGDYWK
810 820 830 840 850
LALPGRHNLT VLGDNYAPLR MEVEVPDVHP FEMRMDITLM PDDPQHWASA
860 870 880 890 900
NDFRIIENVV NTRYHTNPQV RARLAELENQ NGQIASFGYA DSEFGTIFNY
910 920 930 940 950
LKMTSDIGEP EEHKYKLLVV SSLYDTTAPL GREILLNLIR HLVEGFKLQD
960 970 980 990 1000
TSVVELLKRS VIYFLPQTSK FQNVFDMYNS NTSICDPVLG DELAERILGP
1010 1020 1030 1040 1050
ETDQAKDVFL QFLRSERFDL MLTFGAGNSD LNYPKGDSVL VKFAHRMQRT
1060 1070 1080 1090 1100
EFNYSPLQCP PSATRQLHRE TTERLTNMMY RIYNLPVYTL GISCCRMPHQ
1110 1120 1130 1140 1150
KKIASVWRKN IDKIKNFLAL VKTGVSGLVQ NDKGQPLREA YVRLLEHDRI
1160 1170 1180 1190 1200
INVTKNVARF QLMLPHGLYG LEVTAPNYES QMIKVDVEDG RVTELGIIRM
1210 1220 1230 1240 1250
HPFTLIRGVV LELPNNDNRA TTSIAGVVLD ESNHPVRNAK VSVVGQTQLR
1260 1270 1280 1290 1300
NFTGSMGQYR ISAVPLGTIT LKVEAPRHLE ATRQMHLIQG GLATENVVFH
1310 1320 1330 1340 1350
LKVNEHVFGL PRFLFILCAS VLIIVGVIVC VLCAQFWFYR RHRGDKPYYN
1360 1370 1380 1390 1400
FSLLPQRGKE QFGLEDDDGG DDGETELFRS PIKRELSQRA HLVNNQTNYS

FIIQAA
Length:1,406
Mass (Da):158,789
Last modified:February 21, 2001 - v2
Checksum:iE7CF31AEC21363BD
GO
Isoform 2 (identifier: P42787-2) [UniParc]FASTAAdd to basket
Also known as: C

The sequence of this isoform differs from the canonical sequence as follows:
     2-152: PTLGLLFASI...MNPDGYALSQ → NTCL

Show »
Length:1,259
Mass (Da):142,410
Checksum:iAD4C718C35869632
GO
Isoform 3 (identifier: P42787-3) [UniParc]FASTAAdd to basket
Also known as: 1A long tail-1, D

The sequence of this isoform differs from the canonical sequence as follows:
     1-152: MPTLGLLFAS...MNPDGYALSQ → MLFFCLALII...CNPDGFAKAK

Show »
Length:1,404
Mass (Da):158,519
Checksum:i8A22898B7FBCDCB4
GO
Isoform 4 (identifier: P42787-4) [UniParc]FASTAAdd to basket
Also known as: H

The sequence of this isoform differs from the canonical sequence as follows:
     1-152: MPTLGLLFAS...MNPDGYALSQ → MLFFCLALII...CNPDGFAKAK
     1385-1406: ELSQRAHLVNNQTNYSFIIQAA → GMTIQPYFDE...LHNNGNKRRH

Note: No experimental confirmation available.
Show »
Length:1,437
Mass (Da):162,436
Checksum:i0F38E501B4E326D4
GO
Isoform 5 (identifier: P42787-5) [UniParc]FASTAAdd to basket
Also known as: 1B long tail-2, G

The sequence of this isoform differs from the canonical sequence as follows:
     1385-1406: ELSQRAHLVNNQTNYSFIIQAA → GMTIQPYFDE...LHNNGNKRRH

Show »
Length:1,439
Mass (Da):162,706
Checksum:iEB058FE1253F2DF7
GO
Isoform 6 (identifier: P42787-6) [UniParc]FASTAAdd to basket
Also known as: 1A short, E

The sequence of this isoform differs from the canonical sequence as follows:
     1-152: MPTLGLLFAS...MNPDGYALSQ → MLFFCLALII...CNPDGFAKAK
     426-435: NFRKVKVERS → ISSFYSPYYF
     436-1406: Missing.

Show »
Length:433
Mass (Da):48,291
Checksum:iF4E23AA4E1FFE4FA
GO
Isoform 7 (identifier: P42787-7) [UniParc]FASTAAdd to basket
Also known as: 1B short, F

The sequence of this isoform differs from the canonical sequence as follows:
     426-435: NFRKVKVERS → ISSFYSPYYF
     436-1406: Missing.

Show »
Length:435
Mass (Da):48,562
Checksum:iBA5776BCECFD85C2
GO
Isoform 8 (identifier: P42787-8) [UniParc]FASTAAdd to basket
Also known as: G, I

The sequence of this isoform differs from the canonical sequence as follows:
     2-152: PTLGLLFASI...MNPDGYALSQ → NTCL
     1385-1406: ELSQRAHLVNNQTNYSFIIQAA → GMTIQPYFDE...LHNNGNKRRH

Note: No experimental confirmation available.
Show »
Length:1,292
Mass (Da):146,328
Checksum:iFAC618E7B2A5CCD3
GO

Sequence cautioni

The sequence AAA91650 differs from that shown. Reason: Frameshift at position 1106.Curated
The sequence AAC46486 differs from that shown. Intron retention.Curated

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti733C → Y in AAA91650 (PubMed:7568156).Curated1
Sequence conflicti1041V → E in ABM92809 (Ref. 5) Curated1
Sequence conflicti1041V → E in AAA91650 (PubMed:7568156).Curated1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_0007731 – 152MPTLG…YALSQ → MLFFCLALIIGCAVGEYSEV RVIQEEDNFLESPHYLKNEE IGDLFSQLAKDYPDLAQTYT IGKSLEDRPIYALALSAPTG ESKNGDLLRPMVKLVANIQG DEAVGRQMVLYMAEYLATHY DGDPKVQALLNLTEIHFLPT CNPDGFAKAK in isoform 3, isoform 4 and isoform 6. 1 PublicationAdd BLAST152
Alternative sequenceiVSP_0374952 – 152PTLGL…YALSQ → NTCL in isoform 2 and isoform 8. 1 PublicationAdd BLAST151
Alternative sequenceiVSP_000775426 – 435NFRKVKVERS → ISSFYSPYYF in isoform 6 and isoform 7. 1 Publication10
Alternative sequenceiVSP_000776436 – 1406Missing in isoform 6 and isoform 7. 1 PublicationAdd BLAST971
Alternative sequenceiVSP_0007791385 – 1406ELSQR…IIQAA → GMTIQPYFDEEQLERILHTD DDDDDGPHMEPELDVADDSE DDIVMLHNNGNKRRH in isoform 4, isoform 5 and isoform 8. 2 PublicationsAdd BLAST22

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF545816 mRNA. Translation: AAN73045.1.
AF545817 mRNA. Translation: AAN73046.1.
AF545818 mRNA. Translation: AAN73047.1.
AF545819 mRNA. Translation: AAN73048.1.
AF545820 mRNA. Translation: AAN73049.1.
AE014298 Genomic DNA. Translation: AAF45514.2.
AE014298 Genomic DNA. Translation: AAF45515.4.
AE014298 Genomic DNA. Translation: AAO41630.2.
AE014298 Genomic DNA. Translation: AAS65237.1.
AE014298 Genomic DNA. Translation: AAS65238.1.
AE014298 Genomic DNA. Translation: AAS65239.1.
AE014298 Genomic DNA. Translation: AAS65240.1.
AE014298 Genomic DNA. Translation: ACL82874.1.
AL009147 Genomic DNA. Translation: CAA15634.1.
AL009147 Genomic DNA. Translation: CAA15635.1.
BT029935 mRNA. Translation: ABM92809.1.
BT099720 mRNA. Translation: ACV53084.1.
BT100310 mRNA. Translation: ACZ52622.1.
U29591 mRNA. Translation: AAA91650.1. Frameshift.
U29592 mRNA. Translation: AAA91651.1.
U03883 Genomic DNA. Translation: AAC46486.1. Sequence problems.
PIRiT13284.
T13420.
T13421.
RefSeqiNP_001138141.1. NM_001144669.2. [P42787-8]
NP_001284742.1. NM_001297813.1. [P42787-6]
NP_001284743.1. NM_001297814.1. [P42787-7]
NP_001284744.1. NM_001297815.1. [P42787-5]
NP_525032.2. NM_080293.4. [P42787-1]
NP_726675.3. NM_166846.5. [P42787-4]
NP_788852.2. NM_176679.3. [P42787-1]
NP_996319.1. NM_206596.2. [P42787-5]
NP_996320.1. NM_206597.2. [P42787-7]
NP_996321.1. NM_206598.3. [P42787-6]
NP_996322.1. NM_206599.2. [P42787-3]
UniGeneiDm.42.

Genome annotation databases

EnsemblMetazoaiFBtr0070081; FBpp0070080; FBgn0004648. [P42787-1]
FBtr0344738; FBpp0311070; FBgn0004648. [P42787-1]
GeneIDi30998.
KEGGidme:Dmel_CG4122.

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF545816 mRNA. Translation: AAN73045.1.
AF545817 mRNA. Translation: AAN73046.1.
AF545818 mRNA. Translation: AAN73047.1.
AF545819 mRNA. Translation: AAN73048.1.
AF545820 mRNA. Translation: AAN73049.1.
AE014298 Genomic DNA. Translation: AAF45514.2.
AE014298 Genomic DNA. Translation: AAF45515.4.
AE014298 Genomic DNA. Translation: AAO41630.2.
AE014298 Genomic DNA. Translation: AAS65237.1.
AE014298 Genomic DNA. Translation: AAS65238.1.
AE014298 Genomic DNA. Translation: AAS65239.1.
AE014298 Genomic DNA. Translation: AAS65240.1.
AE014298 Genomic DNA. Translation: ACL82874.1.
AL009147 Genomic DNA. Translation: CAA15634.1.
AL009147 Genomic DNA. Translation: CAA15635.1.
BT029935 mRNA. Translation: ABM92809.1.
BT099720 mRNA. Translation: ACV53084.1.
BT100310 mRNA. Translation: ACZ52622.1.
U29591 mRNA. Translation: AAA91650.1. Frameshift.
U29592 mRNA. Translation: AAA91651.1.
U03883 Genomic DNA. Translation: AAC46486.1. Sequence problems.
PIRiT13284.
T13420.
T13421.
RefSeqiNP_001138141.1. NM_001144669.2. [P42787-8]
NP_001284742.1. NM_001297813.1. [P42787-6]
NP_001284743.1. NM_001297814.1. [P42787-7]
NP_001284744.1. NM_001297815.1. [P42787-5]
NP_525032.2. NM_080293.4. [P42787-1]
NP_726675.3. NM_166846.5. [P42787-4]
NP_788852.2. NM_176679.3. [P42787-1]
NP_996319.1. NM_206596.2. [P42787-5]
NP_996320.1. NM_206597.2. [P42787-7]
NP_996321.1. NM_206598.3. [P42787-6]
NP_996322.1. NM_206599.2. [P42787-3]
UniGeneiDm.42.

3D structure databases

ProteinModelPortaliP42787.
SMRiP42787.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi57568. 4 interactors.
IntActiP42787. 2 interactors.
MINTiMINT-860328.
STRINGi7227.FBpp0089126.

Protein family/group databases

MEROPSiM14.037.

PTM databases

iPTMnetiP42787.

Proteomic databases

PaxDbiP42787.
PRIDEiP42787.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsemblMetazoaiFBtr0070081; FBpp0070080; FBgn0004648. [P42787-1]
FBtr0344738; FBpp0311070; FBgn0004648. [P42787-1]
GeneIDi30998.
KEGGidme:Dmel_CG4122.

Organism-specific databases

CTDi30998.
FlyBaseiFBgn0004648. svr.

Phylogenomic databases

eggNOGiKOG2649. Eukaryota.
ENOG410XX0H. LUCA.
GeneTreeiENSGT00760000119124.
InParanoidiP42787.
KOiK07752.
OMAiLTPPVKY.
OrthoDBiEOG091G06A9.
PhylomeDBiP42787.

Enzyme and pathway databases

BRENDAi3.4.17.22. 1994.

Miscellaneous databases

ChiTaRSisvr. fly.
GenomeRNAii30998.
PROiP42787.

Gene expression databases

BgeeiFBgn0004648.
ExpressionAtlasiP42787. baseline.
GenevisibleiP42787. DM.

Family and domain databases

Gene3Di2.60.40.1120. 4 hits.
InterProiIPR008969. CarboxyPept-like_regulatory.
IPR014766. CarboxyPept_regulatory_dom.
IPR000834. Peptidase_M14.
[Graphical view]
PfamiPF00246. Peptidase_M14. 2 hits.
[Graphical view]
PRINTSiPR00765. CRBOXYPTASEA.
SMARTiSM00631. Zn_pept. 2 hits.
[Graphical view]
SUPFAMiSSF49464. SSF49464. 4 hits.
PROSITEiPS00132. CARBOXYPEPT_ZN_1. 2 hits.
PS00133. CARBOXYPEPT_ZN_2. 2 hits.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiCBPD_DROME
AccessioniPrimary (citable) accession number: P42787
Secondary accession number(s): A2RVE4
, B7Z112, C7LAH0, D0Z763, O46058, Q24094, Q24095, Q9W5F3, Q9W5F4, Q9W5F5
Entry historyi
Integrated into UniProtKB/Swiss-Prot: November 1, 1995
Last sequence update: February 21, 2001
Last modified: November 30, 2016
This is version 162 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programDrosophila annotation project

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Drosophila
    Drosophila: entries, gene names and cross-references to FlyBase
  2. Peptidase families
    Classification of peptidase families and list of entries
  3. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.