Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Zinc finger MYM-type protein 2

Gene

ZMYM2

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

May function as a transcription factor.

Regions

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Zinc fingeri352 – 38635MYM-type 1Add
BLAST
Zinc fingeri398 – 43639MYM-type 2Add
BLAST
Zinc fingeri445 – 48036MYM-type 3Add
BLAST
Zinc fingeri491 – 54959MYM-type 4Add
BLAST
Zinc fingeri559 – 59739MYM-type 5Add
BLAST
Zinc fingeri605 – 65248MYM-type 6Add
BLAST
Zinc fingeri660 – 69435MYM-type 7Add
BLAST
Zinc fingeri701 – 74040MYM-type 8Add
BLAST
Zinc fingeri747 – 78135MYM-type 9Add
BLAST

GO - Molecular functioni

GO - Biological processi

Complete GO annotation...

Keywords - Biological processi

Transcription, Transcription regulation

Keywords - Ligandi

Metal-binding, Zinc

Enzyme and pathway databases

ReactomeiR-HSA-1839117. Signaling by cytosolic FGFR1 fusion mutants.
R-HSA-5655302. Signaling by FGFR1 in disease.

Names & Taxonomyi

Protein namesi
Recommended name:
Zinc finger MYM-type protein 2
Alternative name(s):
Fused in myeloproliferative disorders protein
Rearranged in atypical myeloproliferative disorder protein
Zinc finger protein 198
Gene namesi
Name:ZMYM2
Synonyms:FIM, RAMP, ZNF198
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 13

Organism-specific databases

HGNCiHGNC:12989. ZMYM2.

Subcellular locationi

GO - Cellular componenti

  • cytoplasm Source: GO_Central
  • cytosol Source: Reactome
  • PML body Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Nucleus

Pathology & Biotechi

Involvement in diseasei

A chromosomal aberration involving ZMYM2 may be a cause of stem cell leukemia lymphoma syndrome (SCLL). Translocation t(8;13)(p11;q12) with FGFR1. SCLL usually presents as lymphoblastic lymphoma in association with a myeloproliferative disorder, often accompanied by pronounced peripheral eosinophilia and/or prominent eosinophilic infiltrates in the affected bone marrow.

Sites

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sitei913 – 9142Breakpoint for translocation to form ZMYM2-FGFR1

Organism-specific databases

PharmGKBiPA37569.

Polymorphism and mutation databases

BioMutaiZMYM2.
DMDMi17369677.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 13771377Zinc finger MYM-type protein 2PRO_0000191382Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Cross-linki88 – 88Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki98 – 98Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki104 – 104Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Modified residuei159 – 1591PhosphoserineCombined sources
Cross-linki253 – 253Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki297 – 297Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Modified residuei305 – 3051PhosphoserineCombined sources
Cross-linki325 – 325Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki441 – 441Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki513 – 513Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki529 – 529Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki532 – 532Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki576 – 576Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki649 – 649Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Cross-linki700 – 700Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Modified residuei838 – 8381PhosphoserineCombined sources
Modified residuei958 – 9581PhosphoserineCombined sources
Modified residuei1064 – 10641PhosphoserineBy similarity
Modified residuei1376 – 13761PhosphothreonineCombined sources

Keywords - PTMi

Isopeptide bond, Phosphoprotein, Ubl conjugation

Proteomic databases

EPDiQ9UBW7.
MaxQBiQ9UBW7.
PaxDbiQ9UBW7.
PeptideAtlasiQ9UBW7.
PRIDEiQ9UBW7.

PTM databases

iPTMnetiQ9UBW7.
PhosphoSiteiQ9UBW7.

Expressioni

Gene expression databases

BgeeiENSG00000121741.
ExpressionAtlasiQ9UBW7. baseline and differential.
GenevisibleiQ9UBW7. HS.

Organism-specific databases

HPAiHPA031765.

Interactioni

Subunit structurei

May be a component of a BHC histone deacetylase complex that contains HDAC1, HDAC2, HMG20B/BRAF35, KDM1A, RCOR1/CoREST, PHF21A/BHC80, ZMYM2, ZNF217, ZMYM3, GSE1 and GTF2I.1 Publication

GO - Molecular functioni

  • ubiquitin conjugating enzyme binding Source: UniProtKB

Protein-protein interaction databases

BioGridi113534. 61 interactions.
IntActiQ9UBW7. 39 interactions.
MINTiMINT-267007.
STRINGi9606.ENSP00000372322.

Structurei

3D structure databases

ProteinModelPortaliQ9UBW7.
SMRiQ9UBW7. Positions 316-360.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Sequence similaritiesi

Contains 9 MYM-type zinc fingers.Curated

Zinc finger

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Zinc fingeri352 – 38635MYM-type 1Add
BLAST
Zinc fingeri398 – 43639MYM-type 2Add
BLAST
Zinc fingeri445 – 48036MYM-type 3Add
BLAST
Zinc fingeri491 – 54959MYM-type 4Add
BLAST
Zinc fingeri559 – 59739MYM-type 5Add
BLAST
Zinc fingeri605 – 65248MYM-type 6Add
BLAST
Zinc fingeri660 – 69435MYM-type 7Add
BLAST
Zinc fingeri701 – 74040MYM-type 8Add
BLAST
Zinc fingeri747 – 78135MYM-type 9Add
BLAST

Keywords - Domaini

Repeat, Zinc-finger

Phylogenomic databases

eggNOGiENOG410IE8I. Eukaryota.
ENOG410XQR6. LUCA.
GeneTreeiENSGT00550000074408.
HOVERGENiHBG058385.
InParanoidiQ9UBW7.
OMAiAYGVNAW.
OrthoDBiEOG091G00KQ.
PhylomeDBiQ9UBW7.
TreeFamiTF336988.

Family and domain databases

InterProiIPR021893. DUF3504.
IPR011017. TRASH_dom.
IPR010507. Znf_MYM.
[Graphical view]
PfamiPF12012. DUF3504. 1 hit.
PF06467. zf-FCS. 8 hits.
[Graphical view]
SMARTiSM00746. TRASH. 9 hits.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q9UBW7-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MDTSSVGGLE LTDQTPVLLG STAMATSLTN VGNSFSGPAN PLVSRSNKFQ
60 70 80 90 100
NSSVEDDDDV VFIEPVQPPP PSVPVVADQR TITFTSSKNE ELQGNDSKIT
110 120 130 140 150
PSSKELASQK GSVSETIVID DEEDMETNQG QEKNSSNFIE RRPPETKNRT
160 170 180 190 200
NDVDFSTSSF SRSKVNAGMG NSGITTEPDS EIQIANVTTL ETGVSSVNDG
210 220 230 240 250
QLENTDGRDM NLMITHVTSL QNTNLGDVSN GLQSSNFGVN IQTYTPSLTS
260 270 280 290 300
QTKTGVGPFN PGRMNVAGDV FQNGESATHH NPDSWISQSA SFPRNQKQPG
310 320 330 340 350
VDSLSPVASL PKQIFQPSVQ QQPTKPVKVT CANCKKPLQK GQTAYQRKGS
360 370 380 390 400
AHLFCSTTCL SSFSHKPAPK KLCVMCKKDI TTMKGTIVAQ VDSSESFQEF
410 420 430 440 450
CSTSCLSLYE DKQNPTKGAL NKSRCTICGK LTEIRHEVSF KNMTHKLCSD
460 470 480 490 500
HCFNRYRMAN GLIMNCCEQC GEYLPSKGAG NNVLVIDGQQ KRFCCQSCVS
510 520 530 540 550
EYKQVGSHPS FLKEVRDHMQ DSFLMQPEKY GKLTTCTGCR TQCRFFDMTQ
560 570 580 590 600
CIGPNGYMEP YCSTACMNSH KTKYAKSQSL GIICHFCKRN SLPQYQATMP
610 620 630 640 650
DGKLYNFCNS SCVAKFQALS MQSSPNGQFV APSDIQLKCN YCKNSFCSKP
660 670 680 690 700
EILEWENKVH QFCSKTCSDD YKKLHCIVTY CEYCQEEKTL HETVNFSGVK
710 720 730 740 750
RPFCSEGCKL LYKQDFARRL GLRCVTCNYC SQLCKKGATK ELDGVVRDFC
760 770 780 790 800
SEDCCKKFQD WYYKAARCDC CKSQGTLKER VQWRGEMKHF CDQHCLLRFY
810 820 830 840 850
CQQNEPNMTT QKGPENLHYD QGCQTSRTKM TGSAPPPSPT PNKEMKNKAV
860 870 880 890 900
LCKPLTMTKA TYCKPHMQTK SCQTDDTWRT EYVPVPIPVP VYIPVPMHMY
910 920 930 940 950
SQNIPVPTTV PVPVPVPVFL PAPLDSSEKI PAAIEELKSK VSSDALDTEL
960 970 980 990 1000
LTMTDMMSED EGKTETTNIN SVIIETDIIG SDLLKNSDPE TQSSMPDVPY
1010 1020 1030 1040 1050
EPDLDIEIDF PRAAEELDME NEFLLPPVFG EEYEEQPRPR SKKKGAKRKA
1060 1070 1080 1090 1100
VSGYQSHDDS SDNSECSFPF KYTYGVNAWK HWVKTRQLDE DLLVLDELKS
1110 1120 1130 1140 1150
SKSVKLKEDL LSHTTAELNY GLAHFVNEIR RPNGENYAPD SIYYLCLGIQ
1160 1170 1180 1190 1200
EYLCGSNRKD NIFIDPGYQT FEQELNKILR SWQPSILPDG SIFSRVEEDY
1210 1220 1230 1240 1250
LWRIKQLGSH SPVALLNTLF YFNTKYFGLK TVEQHLRLSF GTVFRHWKKN
1260 1270 1280 1290 1300
PLTMENKACL RYQVSSLCGT DNEDKITTGK RKHEDDEPVF EQIENTANPS
1310 1320 1330 1340 1350
RCPVKMFECY LSKSPQNLNQ RMDVFYLQPE CSSSTDSPVW YTSTSLDRNT
1360 1370
LENMLVRVLL VKDIYDKDNY ELDEDTD
Length:1,377
Mass (Da):154,911
Last modified:May 1, 2000 - v1
Checksum:i2652D4C766492FF9
GO
Isoform 2 (identifier: Q9UBW7-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     165-251: Missing.
     529-549: KYGKLTTCTGCRTQCRFFDMT → VSRNVNGVQGLNIFEHCYYCH
     550-1377: Missing.

Show »
Length:462
Mass (Da):50,809
Checksum:iE906383363CB8D65
GO

Sequence cautioni

The sequence AAB88464 differs from that shown. Reason: Frameshift at positions 330, 966, 1009 and 1017. Curated
The sequence AAC23591 differs from that shown. Reason: Frameshift at position 330. Curated
The sequence CAA73875 differs from that shown. Reason: Frameshift at positions 388, 403, 406, 409 and 418. Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti102 – 1021S → P in CAH56193 (PubMed:11230166).Curated
Sequence conflicti110 – 1101K → E in AAH36372 (Ref. 7) Curated
Sequence conflicti304 – 3041L → V in CAB66556 (PubMed:11230166).Curated
Sequence conflicti330 – 3301T → S in CAB66556 (PubMed:11230166).Curated
Sequence conflicti411 – 4122DK → EQ in CAA73875 (PubMed:9576949).Curated
Sequence conflicti424 – 4241R → G in CAB66556 (PubMed:11230166).Curated
Sequence conflicti657 – 6593NKV → ASL in CAH70133 (PubMed:17974005).Curated
Sequence conflicti657 – 6593NKV → ASL in CAH71822 (PubMed:17974005).Curated
Sequence conflicti736 – 7361K → G in CAA73875 (PubMed:9576949).Curated
Sequence conflicti766 – 7672Missing in CAH70133 (PubMed:17974005).Curated
Sequence conflicti766 – 7672Missing in CAH71822 (PubMed:17974005).Curated
Sequence conflicti856 – 8561T → I in AAH36372 (Ref. 7) Curated
Sequence conflicti967 – 9671Missing in AAB88464 (PubMed:9694738).Curated
Sequence conflicti1009 – 10102DF → IS in AAB88464 (PubMed:9694738).Curated
Sequence conflicti1016 – 10161Missing in AAB88464 (PubMed:9694738).Curated
Sequence conflicti1259 – 12591C → R in AAH36372 (Ref. 7) Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei165 – 25187Missing in isoform 2. 1 PublicationVSP_039065Add
BLAST
Alternative sequencei529 – 54921KYGKL…FFDMT → VSRNVNGVQGLNIFEHCYYC H in isoform 2. 1 PublicationVSP_039066Add
BLAST
Alternative sequencei550 – 1377828Missing in isoform 2. 1 PublicationVSP_039067Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
Y13472 mRNA. Translation: CAA73875.1. Frameshift.
AJ224901 mRNA. Translation: CAA12204.1.
AJ007676
, AJ007677, AJ007678, AJ007679, AJ007680, AJ007681, AJ007682, AJ007683, AJ007684, AJ007685, AJ007686, AJ007687, AJ007688, AJ007689, AJ007690, AJ007691, AJ007692, AJ007693, AJ007694, AJ007695, AJ007696 Genomic DNA. Translation: CAA07604.1.
AL136621 mRNA. Translation: CAB66556.2.
BX647944 mRNA. Translation: CAH56193.1.
AL137119, AL138688 Genomic DNA. Translation: CAH70133.1.
AL137119 Genomic DNA. Translation: CAH70134.1.
AL138688, AL137119 Genomic DNA. Translation: CAH71822.1.
CH471075 Genomic DNA. Translation: EAX08244.1.
CH471075 Genomic DNA. Translation: EAX08247.1.
BC036372 mRNA. Translation: AAH36372.1.
AF060181 mRNA. Translation: AAC23591.1. Frameshift.
AF035374 mRNA. Translation: AAB88464.1. Frameshift.
AF012126 mRNA. Translation: AAC01561.1.
CCDSiCCDS45016.1. [Q9UBW7-1]
PIRiT45119.
RefSeqiNP_001177893.1. NM_001190964.2. [Q9UBW7-1]
NP_001177894.1. NM_001190965.2. [Q9UBW7-1]
NP_003444.1. NM_003453.4. [Q9UBW7-1]
NP_932072.1. NM_197968.3. [Q9UBW7-1]
XP_005266577.1. XM_005266520.3. [Q9UBW7-1]
XP_011533526.1. XM_011535224.2.
UniGeneiHs.507433.

Genome annotation databases

EnsembliENST00000382871; ENSP00000372324; ENSG00000121741. [Q9UBW7-1]
ENST00000382874; ENSP00000372327; ENSG00000121741. [Q9UBW7-1]
ENST00000610343; ENSP00000479904; ENSG00000121741. [Q9UBW7-1]
GeneIDi7750.
KEGGihsa:7750.
UCSCiuc031zxt.2. human. [Q9UBW7-1]

Keywords - Coding sequence diversityi

Alternative splicing, Chromosomal rearrangement

Cross-referencesi

Web resourcesi

Atlas of Genetics and Cytogenetics in Oncology and Haematology

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
Y13472 mRNA. Translation: CAA73875.1. Frameshift.
AJ224901 mRNA. Translation: CAA12204.1.
AJ007676
, AJ007677, AJ007678, AJ007679, AJ007680, AJ007681, AJ007682, AJ007683, AJ007684, AJ007685, AJ007686, AJ007687, AJ007688, AJ007689, AJ007690, AJ007691, AJ007692, AJ007693, AJ007694, AJ007695, AJ007696 Genomic DNA. Translation: CAA07604.1.
AL136621 mRNA. Translation: CAB66556.2.
BX647944 mRNA. Translation: CAH56193.1.
AL137119, AL138688 Genomic DNA. Translation: CAH70133.1.
AL137119 Genomic DNA. Translation: CAH70134.1.
AL138688, AL137119 Genomic DNA. Translation: CAH71822.1.
CH471075 Genomic DNA. Translation: EAX08244.1.
CH471075 Genomic DNA. Translation: EAX08247.1.
BC036372 mRNA. Translation: AAH36372.1.
AF060181 mRNA. Translation: AAC23591.1. Frameshift.
AF035374 mRNA. Translation: AAB88464.1. Frameshift.
AF012126 mRNA. Translation: AAC01561.1.
CCDSiCCDS45016.1. [Q9UBW7-1]
PIRiT45119.
RefSeqiNP_001177893.1. NM_001190964.2. [Q9UBW7-1]
NP_001177894.1. NM_001190965.2. [Q9UBW7-1]
NP_003444.1. NM_003453.4. [Q9UBW7-1]
NP_932072.1. NM_197968.3. [Q9UBW7-1]
XP_005266577.1. XM_005266520.3. [Q9UBW7-1]
XP_011533526.1. XM_011535224.2.
UniGeneiHs.507433.

3D structure databases

ProteinModelPortaliQ9UBW7.
SMRiQ9UBW7. Positions 316-360.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi113534. 61 interactions.
IntActiQ9UBW7. 39 interactions.
MINTiMINT-267007.
STRINGi9606.ENSP00000372322.

PTM databases

iPTMnetiQ9UBW7.
PhosphoSiteiQ9UBW7.

Polymorphism and mutation databases

BioMutaiZMYM2.
DMDMi17369677.

Proteomic databases

EPDiQ9UBW7.
MaxQBiQ9UBW7.
PaxDbiQ9UBW7.
PeptideAtlasiQ9UBW7.
PRIDEiQ9UBW7.

Protocols and materials databases

DNASUi7750.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000382871; ENSP00000372324; ENSG00000121741. [Q9UBW7-1]
ENST00000382874; ENSP00000372327; ENSG00000121741. [Q9UBW7-1]
ENST00000610343; ENSP00000479904; ENSG00000121741. [Q9UBW7-1]
GeneIDi7750.
KEGGihsa:7750.
UCSCiuc031zxt.2. human. [Q9UBW7-1]

Organism-specific databases

CTDi7750.
GeneCardsiZMYM2.
HGNCiHGNC:12989. ZMYM2.
HPAiHPA031765.
MIMi602221. gene.
neXtProtiNX_Q9UBW7.
PharmGKBiPA37569.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiENOG410IE8I. Eukaryota.
ENOG410XQR6. LUCA.
GeneTreeiENSGT00550000074408.
HOVERGENiHBG058385.
InParanoidiQ9UBW7.
OMAiAYGVNAW.
OrthoDBiEOG091G00KQ.
PhylomeDBiQ9UBW7.
TreeFamiTF336988.

Enzyme and pathway databases

ReactomeiR-HSA-1839117. Signaling by cytosolic FGFR1 fusion mutants.
R-HSA-5655302. Signaling by FGFR1 in disease.

Miscellaneous databases

ChiTaRSiZMYM2. human.
GeneWikiiZMYM2.
GenomeRNAii7750.
PROiQ9UBW7.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000121741.
ExpressionAtlasiQ9UBW7. baseline and differential.
GenevisibleiQ9UBW7. HS.

Family and domain databases

InterProiIPR021893. DUF3504.
IPR011017. TRASH_dom.
IPR010507. Znf_MYM.
[Graphical view]
PfamiPF12012. DUF3504. 1 hit.
PF06467. zf-FCS. 8 hits.
[Graphical view]
SMARTiSM00746. TRASH. 9 hits.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiZMYM2_HUMAN
AccessioniPrimary (citable) accession number: Q9UBW7
Secondary accession number(s): A6NDG0
, A6NI02, O43212, O43434, O60898, Q5W0Q4, Q5W0T3, Q63HP0, Q8NE39, Q9H0V5, Q9H538, Q9UEU2
Entry historyi
Integrated into UniProtKB/Swiss-Prot: November 16, 2001
Last sequence update: May 1, 2000
Last modified: September 7, 2016
This is version 150 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 13
    Human chromosome 13: entries, gene names and cross-references to MIM
  2. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  3. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.