Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Carboxypeptidase A2

Gene

CPA2

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Catalytic activityi

Similar to that of carboxypeptidase A (EC 3.4.17.1), but with a preference for bulkier C-terminal residues.

Cofactori

Zn2+Note: Binds 1 zinc ion per subunit.

Sites

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Metal bindingi179 – 1791Zinc; catalytic
Metal bindingi182 – 1821Zinc; catalytic
Binding sitei237 – 2371SubstrateBy similarity
Metal bindingi306 – 3061Zinc; catalytic
Active sitei380 – 3801Nucleophile

GO - Molecular functioni

  • carboxypeptidase activity Source: UniProtKB
  • metallocarboxypeptidase activity Source: UniProtKB
  • zinc ion binding Source: UniProtKB

GO - Biological processi

  • protein catabolic process in the vacuole Source: ProtInc
Complete GO annotation...

Keywords - Molecular functioni

Carboxypeptidase, Hydrolase, Metalloprotease, Protease

Keywords - Ligandi

Metal-binding, Zinc

Enzyme and pathway databases

SABIO-RKP48052.

Protein family/group databases

MEROPSiM14.002.

Names & Taxonomyi

Protein namesi
Recommended name:
Carboxypeptidase A2 (EC:3.4.17.15)
Gene namesi
Name:CPA2
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
ProteomesiUP000005640 Componenti: Chromosome 7

Organism-specific databases

HGNCiHGNC:2297. CPA2.

Subcellular locationi

GO - Cellular componenti

  • extracellular region Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Secreted

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA26817.

Polymorphism and mutation databases

BioMutaiCPA2.
DMDMi294862522.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Signal peptidei1 – 1818Sequence AnalysisAdd
BLAST
Propeptidei19 – 11496Activation peptide1 PublicationPRO_0000004353Add
BLAST
Chaini115 – 419305Carboxypeptidase A2PRO_0000004354Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Disulfide bondi248 ↔ 271
Disulfide bondi320 ↔ 354

Keywords - PTMi

Disulfide bond, Zymogen

Proteomic databases

PaxDbiP48052.
PRIDEiP48052.

PTM databases

PhosphoSiteiP48052.

Expressioni

Gene expression databases

BgeeiP48052.
CleanExiHS_CPA2.
ExpressionAtlasiP48052. baseline and differential.
GenevestigatoriP48052.

Organism-specific databases

HPAiHPA020342.
HPA021317.

Interactioni

Protein-protein interaction databases

MINTiMINT-125621.
STRINGi9606.ENSP00000222481.

Structurei

Secondary structure

1
419
Legend: HelixTurnBeta strand
Show more details
Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Beta strandi26 – 305Combined sources
Helixi35 – 4612Combined sources
Helixi48 – 503Combined sources
Beta strandi53 – 564Combined sources
Beta strandi65 – 695Combined sources
Helixi71 – 733Combined sources
Helixi74 – 8310Combined sources
Beta strandi88 – 936Combined sources
Helixi95 – 11218Combined sources
Turni113 – 1153Combined sources
Beta strandi120 – 1223Combined sources
Helixi126 – 13914Combined sources
Turni141 – 1433Combined sources
Beta strandi144 – 1518Combined sources
Beta strandi157 – 1637Combined sources
Beta strandi166 – 1683Combined sources
Beta strandi171 – 1755Combined sources
Helixi183 – 19917Combined sources
Turni200 – 2023Combined sources
Helixi204 – 2129Combined sources
Beta strandi214 – 2196Combined sources
Helixi223 – 2319Combined sources
Helixi253 – 2553Combined sources
Beta strandi257 – 2604Combined sources
Beta strandi263 – 2675Combined sources
Helixi284 – 29613Combined sources
Beta strandi299 – 3068Combined sources
Beta strandi311 – 3155Combined sources
Beta strandi317 – 3193Combined sources
Helixi326 – 34116Combined sources
Turni342 – 3443Combined sources
Beta strandi349 – 3524Combined sources
Helixi353 – 3564Combined sources
Helixi364 – 3718Combined sources
Beta strandi375 – 3806Combined sources
Beta strandi384 – 3874Combined sources
Helixi393 – 3953Combined sources
Helixi396 – 41621Combined sources

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
1AYEX-ray1.80A19-419[»]
1DTDX-ray1.65A118-419[»]
1O6XNMR-A19-96[»]
ProteinModelPortaliP48052.
SMRiP48052. Positions 19-419.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiP48052.

Family & Domainsi

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni179 – 1824Substrate bindingBy similarity
Regioni254 – 2552Substrate bindingBy similarity
Regioni307 – 3082Substrate bindingBy similarity

Sequence similaritiesi

Belongs to the peptidase M14 family.Curated

Keywords - Domaini

Signal

Phylogenomic databases

eggNOGiCOG2866.
GeneTreeiENSGT00760000119103.
HOVERGENiHBG050815.
InParanoidiP48052.
KOiK01298.
OMAiKAIMEHV.
OrthoDBiEOG7RZ5Q9.
PhylomeDBiP48052.
TreeFamiTF317197.

Family and domain databases

Gene3Di3.30.70.340. 1 hit.
InterProiIPR000834. Peptidase_M14.
IPR003146. Prot_inh_M14A.
IPR009020. Prot_inh_propept.
[Graphical view]
PfamiPF00246. Peptidase_M14. 1 hit.
PF02244. Propep_M14. 1 hit.
[Graphical view]
PRINTSiPR00765. CRBOXYPTASEA.
SMARTiSM00631. Zn_pept. 1 hit.
[Graphical view]
SUPFAMiSSF54897. SSF54897. 1 hit.
PROSITEiPS00132. CARBOXYPEPT_ZN_1. 1 hit.
PS00133. CARBOXYPEPT_ZN_2. 1 hit.
[Graphical view]

Sequencei

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

P48052-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MAMRLILFFG ALFGHIYCLE TFVGDQVLEI VPSNEEQIKN LLQLEAQEHL
60 70 80 90 100
QLDFWKSPTT PGETAHVRVP FVNVQAVKVF LESQGIAYSI MIEDVQVLLD
110 120 130 140 150
KENEEMLFNR RRERSGNFNF GAYHTLEEIS QEMDNLVAEH PGLVSKVNIG
160 170 180 190 200
SSFENRPMNV LKFSTGGDKP AIWLDAGIHA REWVTQATAL WTANKIVSDY
210 220 230 240 250
GKDPSITSIL DALDIFLLPV TNPDGYVFSQ TKNRMWRKTR SKVSGSLCVG
260 270 280 290 300
VDPNRNWDAG FGGPGASSNP CSDSYHGPSA NSEVEVKSIV DFIKSHGKVK
310 320 330 340 350
AFITLHSYSQ LLMFPYGYKC TKLDDFDELS EVAQKAAQSL RSLHGTKYKV
360 370 380 390 400
GPICSVIYQA SGGSIDWSYD YGIKYSFAFE LRDTGRYGFL LPARQILPTA
410
EETWLGLKAI MEHVRDHPY
Length:419
Mass (Da):47,030
Last modified:April 20, 2010 - v3
Checksum:i00269F2AE50CA38D
GO

Sequence cautioni

The sequence AAA74425.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence AAH07009.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence AAH14571.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence AAH15140.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated
The sequence EAL24092.1 differs from that shown. Reason: Erroneous gene model prediction. Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti19 – 191L → S AA sequence (PubMed:2920728).Curated
Sequence conflicti39 – 391K → N AA sequence (PubMed:2920728).Curated
Sequence conflicti304 – 3041T → I in AAA74425 (PubMed:7896805).Curated

Natural variant

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti82 – 821E → G.2 Publications
Corresponds to variant rs17850135 [ dbSNP | Ensembl ].
VAR_031204

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AC024085 Genomic DNA. No translation available.
CH236950 Genomic DNA. Translation: EAL24092.1. Sequence problems.
BC007009 mRNA. Translation: AAH07009.1. Different initiation.
BC014571 mRNA. Translation: AAH14571.1. Different initiation.
BC015140 mRNA. Translation: AAH15140.1. Different initiation.
U19977 mRNA. Translation: AAA74425.1. Different initiation.
BT007403 mRNA. Translation: AAP36067.1.
CCDSiCCDS5817.2.
PIRiA56171.
RefSeqiNP_001860.2. NM_001869.2.
UniGeneiHs.490038.

Genome annotation databases

EnsembliENST00000222481; ENSP00000222481; ENSG00000158516.
GeneIDi1358.
KEGGihsa:1358.
UCSCiuc003vpq.3. human.

Keywords - Coding sequence diversityi

Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AC024085 Genomic DNA. No translation available.
CH236950 Genomic DNA. Translation: EAL24092.1. Sequence problems.
BC007009 mRNA. Translation: AAH07009.1. Different initiation.
BC014571 mRNA. Translation: AAH14571.1. Different initiation.
BC015140 mRNA. Translation: AAH15140.1. Different initiation.
U19977 mRNA. Translation: AAA74425.1. Different initiation.
BT007403 mRNA. Translation: AAP36067.1.
CCDSiCCDS5817.2.
PIRiA56171.
RefSeqiNP_001860.2. NM_001869.2.
UniGeneiHs.490038.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
1AYEX-ray1.80A19-419[»]
1DTDX-ray1.65A118-419[»]
1O6XNMR-A19-96[»]
ProteinModelPortaliP48052.
SMRiP48052. Positions 19-419.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

MINTiMINT-125621.
STRINGi9606.ENSP00000222481.

Protein family/group databases

MEROPSiM14.002.

PTM databases

PhosphoSiteiP48052.

Polymorphism and mutation databases

BioMutaiCPA2.
DMDMi294862522.

Proteomic databases

PaxDbiP48052.
PRIDEiP48052.

Protocols and materials databases

DNASUi1358.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000222481; ENSP00000222481; ENSG00000158516.
GeneIDi1358.
KEGGihsa:1358.
UCSCiuc003vpq.3. human.

Organism-specific databases

CTDi1358.
GeneCardsiGC07P129906.
HGNCiHGNC:2297. CPA2.
HPAiHPA020342.
HPA021317.
MIMi600688. gene.
neXtProtiNX_P48052.
PharmGKBiPA26817.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiCOG2866.
GeneTreeiENSGT00760000119103.
HOVERGENiHBG050815.
InParanoidiP48052.
KOiK01298.
OMAiKAIMEHV.
OrthoDBiEOG7RZ5Q9.
PhylomeDBiP48052.
TreeFamiTF317197.

Enzyme and pathway databases

SABIO-RKP48052.

Miscellaneous databases

EvolutionaryTraceiP48052.
GeneWikiiCarboxypeptidase_A2.
GenomeRNAii1358.
NextBioi5501.
PROiP48052.
SOURCEiSearch...

Gene expression databases

BgeeiP48052.
CleanExiHS_CPA2.
ExpressionAtlasiP48052. baseline and differential.
GenevestigatoriP48052.

Family and domain databases

Gene3Di3.30.70.340. 1 hit.
InterProiIPR000834. Peptidase_M14.
IPR003146. Prot_inh_M14A.
IPR009020. Prot_inh_propept.
[Graphical view]
PfamiPF00246. Peptidase_M14. 1 hit.
PF02244. Propep_M14. 1 hit.
[Graphical view]
PRINTSiPR00765. CRBOXYPTASEA.
SMARTiSM00631. Zn_pept. 1 hit.
[Graphical view]
SUPFAMiSSF54897. SSF54897. 1 hit.
PROSITEiPS00132. CARBOXYPEPT_ZN_1. 1 hit.
PS00133. CARBOXYPEPT_ZN_2. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "The DNA sequence of human chromosome 7."
    Hillier L.W., Fulton R.S., Fulton L.A., Graves T.A., Pepin K.H., Wagner-McPherson C., Layman D., Maas J., Jaeger S., Walker R., Wylie K., Sekhon M., Becker M.C., O'Laughlin M.D., Schaller M.E., Fewell G.A., Delehaunty K.D., Miner T.L.
    , Nash W.E., Cordes M., Du H., Sun H., Edwards J., Bradshaw-Cordum H., Ali J., Andrews S., Isak A., Vanbrunt A., Nguyen C., Du F., Lamar B., Courtney L., Kalicki J., Ozersky P., Bielicki L., Scott K., Holmes A., Harkins R., Harris A., Strong C.M., Hou S., Tomlinson C., Dauphin-Kohlberg S., Kozlowicz-Reilly A., Leonard S., Rohlfing T., Rock S.M., Tin-Wollam A.-M., Abbott A., Minx P., Maupin R., Strowmatt C., Latreille P., Miller N., Johnson D., Murray J., Woessner J.P., Wendl M.C., Yang S.-P., Schultz B.R., Wallis J.W., Spieth J., Bieri T.A., Nelson J.O., Berkowicz N., Wohldmann P.E., Cook L.L., Hickenbotham M.T., Eldred J., Williams D., Bedell J.A., Mardis E.R., Clifton S.W., Chissoe S.L., Marra M.A., Raymond C., Haugen E., Gillett W., Zhou Y., James R., Phelps K., Iadanoto S., Bubb K., Simms E., Levy R., Clendenning J., Kaul R., Kent W.J., Furey T.S., Baertsch R.A., Brent M.R., Keibler E., Flicek P., Bork P., Suyama M., Bailey J.A., Portnoy M.E., Torrents D., Chinwalla A.T., Gish W.R., Eddy S.R., McPherson J.D., Olson M.V., Eichler E.E., Green E.D., Waterston R.H., Wilson R.K.
    Nature 424:157-164(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  2. "Human chromosome 7: DNA sequence and biology."
    Scherer S.W., Cheung J., MacDonald J.R., Osborne L.R., Nakabayashi K., Herbrick J.-A., Carson A.R., Parker-Katiraee L., Skaug J., Khaja R., Zhang J., Hudek A.K., Li M., Haddad M., Duggan G.E., Fernandez B.A., Kanematsu E., Gentles S.
    , Christopoulos C.C., Choufani S., Kwasnicka D., Zheng X.H., Lai Z., Nusskern D.R., Zhang Q., Gu Z., Lu F., Zeesman S., Nowaczyk M.J., Teshima I., Chitayat D., Shuman C., Weksberg R., Zackai E.H., Grebe T.A., Cox S.R., Kirkpatrick S.J., Rahman N., Friedman J.M., Heng H.H.Q., Pelicci P.G., Lo-Coco F., Belloni E., Shaffer L.G., Pober B., Morton C.C., Gusella J.F., Bruns G.A.P., Korf B.R., Quade B.J., Ligon A.H., Ferguson H., Higgins A.W., Leach N.T., Herrick S.R., Lemyre E., Farra C.G., Kim H.-G., Summers A.M., Gripp K.W., Roberts W., Szatmari P., Winsor E.J.T., Grzeschik K.-H., Teebi A., Minassian B.A., Kere J., Armengol L., Pujana M.A., Estivill X., Wilson M.D., Koop B.F., Tosi S., Moore G.E., Boright A.P., Zlotorynski E., Kerem B., Kroisel P.M., Petek E., Oscier D.G., Mould S.J., Doehner H., Doehner K., Rommens J.M., Vincent J.B., Venter J.C., Li P.W., Mural R.J., Adams M.D., Tsui L.-C.
    Science 300:767-772(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  3. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA], VARIANT GLY-82.
    Tissue: Brain and Pancreas.
  4. "The sequence and conformation of human pancreatic procarboxypeptidase A2. cDNA cloning, sequence analysis, and three-dimensional model."
    Catasus L., Vendrell J., Aviles F.X., Carreira S., Puigserver A., Billeter M.
    J. Biol. Chem. 270:6651-6657(1995) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 2-419, 3D-STRUCTURE MODELING.
    Tissue: Pancreas.
  5. "Expression and characterization of human pancreatic preprocarboxypeptidase A1 and preprocarboxypeptidase A2."
    Laethem R.M., Blumenkopf T.A., Cory M., Elwell L., Moxham C.P., Ray P.H., Walton L.M., Smith G.K.
    Arch. Biochem. Biophys. 332:8-18(1996) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] OF 3-419, CHARACTERIZATION.
  6. "Cloning of human full-length CDSs in BD Creator(TM) system donor vector."
    Kalnine N., Chen X., Rolfs A., Halleck A., Hines L., Eisenstein S., Koundinya M., Raphael J., Moreira D., Kelley T., LaBaer J., Lin Y., Phelan M., Farmer A.
    Submitted (MAY-2003) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 3-419, VARIANT GLY-82.
  7. "Purification and properties of five different forms of human procarboxypeptidases."
    Pascual R., Burgos F.J., Salva M., Soriano F., Mendez E., Aviles F.X.
    Eur. J. Biochem. 179:609-616(1989) [PubMed] [Europe PMC] [Abstract]
    Cited for: PROTEIN SEQUENCE OF 19-45, CHARACTERIZATION.
  8. "Separation of human pancreatic carboxypeptidase A isoenzymes by high performance liquid chromatography."
    Linder D., Linder M., Schade H., Sziegoleit A.
    Biomed. Chromatogr. 7:143-145(1993) [PubMed] [Europe PMC] [Abstract]
    Cited for: PROTEIN SEQUENCE OF 115-144.
    Tissue: Pancreas.
  9. "The three-dimensional structure of human procarboxypeptidase A2. Deciphering the basis of the inhibition, activation and intrinsic activity of the zymogen."
    Garcia-Saez I., Reverter D., Vendrell J., Aviles F.X., Coll M.
    EMBO J. 16:6906-6913(1997) [PubMed] [Europe PMC] [Abstract]
    Cited for: X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS).
  10. "Characterisation and preliminary X-ray diffraction analysis of human pancreatic procarboxypeptidase A2."
    Reverter D., Garcia-Saez I., Catasus L., Vendrell J., Coll M., Aviles F.X.
    FEBS Lett. 420:7-10(1997) [PubMed] [Europe PMC] [Abstract]
    Cited for: X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS) OF 19-419.
  11. "Structure of a novel leech carboxypeptidase inhibitor determined free in solution and in complex with human carboxypeptidase A2."
    Reverter D., Fernandez-Catalan C., Baumgartner R., Pfander R., Huber R., Bode W., Vendrell J., Holak T.A., Aviles F.X.
    Nat. Struct. Biol. 7:322-328(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: X-RAY CRYSTALLOGRAPHY (1.65 ANGSTROMS) OF 118-419.
  12. "A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins."
    Dantas G., Kuhlman B., Callender D., Wong M., Baker D.
    J. Mol. Biol. 332:449-460(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS).
  13. "NMR solution structure of the activation domain of human procarboxypeptidase A2."
    Jimenez M.A., Villegas V., Santoro J., Serrano L., Vendrell J., Aviles F.X., Rico M.
    Protein Sci. 12:296-305(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: STRUCTURE BY NMR OF 19-96.

Entry informationi

Entry nameiCBPA2_HUMAN
AccessioniPrimary (citable) accession number: P48052
Secondary accession number(s): A4D1M4
, C9JIK1, Q53XS1, Q96A12, Q96QN3, Q9UCF1
Entry historyi
Integrated into UniProtKB/Swiss-Prot: February 1, 1996
Last sequence update: April 20, 2010
Last modified: April 29, 2015
This is version 154 of the entry and version 3 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Direct protein sequencing, Reference proteome

Documents

  1. Human chromosome 7
    Human chromosome 7: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  6. Peptidase families
    Classification of peptidase families and list of entries
  7. SIMILARITY comments
    Index of protein domains and families

External Data

Dasty 3

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into Uniref entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.