Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Zinc finger SWIM domain-containing protein 8

Gene

Zswim8

Organism
Mus musculus (Mouse)
Status
Reviewed-Annotation score: Annotation score: 3 out of 5-Experimental evidence at protein leveli

Functioni

Regions

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Zinc fingeri172 – 20837SWIM-typePROSITE-ProRule annotationAdd
BLAST

GO - Molecular functioni

Complete GO annotation...

Keywords - Ligandi

Metal-binding, Zinc

Names & Taxonomyi

Protein namesi
Recommended name:
Zinc finger SWIM domain-containing protein 8
Gene namesi
Name:Zswim8
Synonyms:Kiaa0913
OrganismiMus musculus (Mouse)
Taxonomic identifieri10090 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresGliresRodentiaSciurognathiMuroideaMuridaeMurinaeMusMus
Proteomesi
  • UP000000589 Componenti: Chromosome 14

Organism-specific databases

MGIiMGI:1919156. Zswim8.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 18321832Zinc finger SWIM domain-containing protein 8PRO_0000311803Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei36 – 361PhosphoserineBy similarity
Modified residuei48 – 481PhosphoserineCombined sources
Modified residuei53 – 531PhosphoserineCombined sources
Modified residuei564 – 5641PhosphoserineBy similarity
Modified residuei1141 – 11411PhosphothreonineBy similarity
Modified residuei1155 – 11551PhosphoserineCombined sources
Modified residuei1158 – 11581PhosphoserineCombined sources
Modified residuei1162 – 11621PhosphoserineBy similarity
Modified residuei1831 – 18311PhosphoserineBy similarity

Keywords - PTMi

Phosphoprotein

Proteomic databases

EPDiQ3UHH1.
MaxQBiQ3UHH1.
PaxDbiQ3UHH1.
PRIDEiQ3UHH1.

PTM databases

iPTMnetiQ3UHH1.
PhosphoSiteiQ3UHH1.

Expressioni

Gene expression databases

BgeeiQ3UHH1.
CleanExiMM_2310021P13RIK.
GenevisibleiQ3UHH1. MM.

Interactioni

Protein-protein interaction databases

BioGridi234542. 3 interactions.
IntActiQ3UHH1. 5 interactions.
MINTiMINT-4115901.
STRINGi10090.ENSMUSP00000022358.

Structurei

3D structure databases

ProteinModelPortaliQ3UHH1.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Compositional bias

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Compositional biasi56 – 6510Poly-Gly
Compositional biasi1145 – 120662Ser-richAdd
BLAST
Compositional biasi1474 – 14774Poly-Ala
Compositional biasi1496 – 1653158Pro-richAdd
BLAST

Sequence similaritiesi

Contains 1 SWIM-type zinc finger.PROSITE-ProRule annotation

Zinc finger

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Zinc fingeri172 – 20837SWIM-typePROSITE-ProRule annotationAdd
BLAST

Keywords - Domaini

Zinc-finger

Phylogenomic databases

eggNOGiKOG3615. Eukaryota.
ENOG410XPG2. LUCA.
GeneTreeiENSGT00390000012572.
HOVERGENiHBG092246.
InParanoidiQ3UHH1.
OMAiQQMYIQC.
OrthoDBiEOG7TJ3H1.
PhylomeDBiQ3UHH1.
TreeFamiTF324881.

Family and domain databases

InterProiIPR007527. Znf_SWIM.
[Graphical view]
PROSITEiPS50966. ZF_SWIM. 1 hit.
[Graphical view]

Sequences (5)i

Sequence statusi: Complete.

This entry describes 5 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q3UHH1-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MELMFAEWED GERFSFEDSD RFEEDSLCSF ISEAESLCQN WRGWRKQSAG
60 70 80 90 100
PNSPTGGGGG GGSGGTRTRD GLVIPLVELS AKQVAFHIPF EVVEKVYPPV
110 120 130 140 150
PEQLQLRIAF WSFPENEEDI RLYSCLANGS ADEFQRGDQL FRMRAVKDPL
160 170 180 190 200
QIGFHLSATV VPPQMVPPKG AYNVAVMFDR CRVTSCSCTC GAGAKWCTHV
210 220 230 240 250
VALCLFRIHN ASAVCLRAPV SESLSRLQRD QLQKFAQYLI SELPQQILPT
260 270 280 290 300
AQRLLDELLS SQSTAINTVC GAPDPTAGPS ASDQSTWYLD ESTLTDNIKK
310 320 330 340 350
TLHKFCGPSP VVFSDVNSMY LSSTEPPAAA EWACLLRPLR GREPEGVWNL
360 370 380 390 400
LSIVREMFKR RDSNAAPLLE ILTDQCLTYE QITGWWYSVR TSASHSSASG
410 420 430 440 450
HTGRSNGQSE VAAHACASMC DEMVTLWRLA VLDPALSPQR RRELCAQLRQ
460 470 480 490 500
WQLKVIENVK RGQHKKTLER LFPGFRPAVE ACYFNWEEAY PLPGVTYSGT
510 520 530 540 550
DRKLALCWAR ALPARPGASR SGGLEESRPR PLPTEPAVRP KEPGAKRKGL
560 570 580 590 600
GEGISSQRGP RRLSAEGGDK ALHKMGPSGG KAKVLGGTGS GGKSSAGSGS
610 620 630 640 650
KRRLSSEDSS LEPDLAEMSL DDSSLALGAE ASTFGGFPES PPPCPSSVGS
660 670 680 690 700
RGPSTFLPEP PDTYEEDAGV YFSEGPEPPT ASADHPGLLP GEVCTRDDLP
710 720 730 740 750
STDDSGSGLH KTKEAAPAVG EEDDDYQAYY LNAQDGAGGE EEKAEGGTGE
760 770 780 790 800
EHDLFAGLKP LEQESRMEVL FACAEALHAH GYSNEASRLT VELAQDLLAN
810 820 830 840 850
PPDLKVEPPP AKGKKNKVST SRQTWVATNT LTKAAFLLTV LSERPEHHSL
860 870 880 890 900
AFRVGMFALE LQRPPASTKA LEVKLAYQES EVAALLKKIP RGPSEMSTIR
910 920 930 940 950
CRAEELREGT LCDYRPVLPL MLASFIFDVL CAPVVSLTGS RPPSRNWTNE
960 970 980 990 1000
MPGDEELGFE AAVAALGMKT TVSEAEHPLL CEGTRREKGD LALALMITYK
1010 1020 1030 1040 1050
DDQAKLKKIL DKLLDRESQT HKPQTLSSFY SSSRPATANQ RSPSKHGAPS
1060 1070 1080 1090 1100
APGALQPLTS SSAGPAQPGN VAGAGPGPTE GFTEKNVPES SPHSPCEGLP
1110 1120 1130 1140 1150
PEAALTPRPE GKVPSRLALG SRGGYNGRGW GSPGRPKKKH TGMASIDSSA
1160 1170 1180 1190 1200
PETTSDSSPT LSRRPLRGGW APTSWGRGQD SDSISSSSSD SLGSSSSSGS
1210 1220 1230 1240 1250
RRASASGGAR AKTVDVGRCY KGRRPESHAP HVPNQPSEAA AHFYFELAKT
1260 1270 1280 1290 1300
VLIKAGGNSS TSIFTHPSSS GGHQGPHRNL HLCAFEIGLY ALGLHNFVSP
1310 1320 1330 1340 1350
NWLSRTYSSH VSWITGQAME IGSAALTILV ECWDGHLTPP EVASLADRAS
1360 1370 1380 1390 1400
RARDSNMVRA AAELALSCLP HAHALNPNEI QRALVQCKEQ DNLMLEKACM
1410 1420 1430 1440 1450
AVEEAAKGGG VYPEVLFEVA HQWFWLYEET AGGSSTAREG ATSCSGSGMR
1460 1470 1480 1490 1500
AAGEAGRGLP EGRGAPGTEP VTVAAAAVTA AATVVPVISV GSSLYPGPGL
1510 1520 1530 1540 1550
GHGHSPGLHP YTALQPHLPC SPQYLTHPAH PAHPMPHMPR PAVFPVPSSA
1560 1570 1580 1590 1600
YPQGVHPAFL GAQYPYSVTP PSLAATAVSF PVPSMAPITV HPYHTEPGLP
1610 1620 1630 1640 1650
LPTSVALSSV HPASTFPAIQ GASLPALTTQ PSPLVSGGFP PPEEETHSQP
1660 1670 1680 1690 1700
VNPHSLHHLH AAYRVGMLAL EMLGRRAHND HPNNFSRSPP YTDDVKWLLG
1710 1720 1730 1740 1750
LAAKLGVNYV HQFCVGAAKG VLSPFVLQEI VMETLQRLNP IHAHNHLRAP
1760 1770 1780 1790 1800
AFHQLVQRCQ QAYMQYIHHR LIHLTPADYD DFVNAIRSAR SAFCLTPMGM
1810 1820 1830
MQFNDILQNL KRSKQTKELW QRVSLEITTF SP
Length:1,832
Mass (Da):197,061
Last modified:October 11, 2005 - v1
Checksum:i30BD7C1D3157F93B
GO
Isoform 2 (identifier: Q3UHH1-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     669-702: Missing.

Note: No experimental confirmation available.
Show »
Length:1,798
Mass (Da):193,566
Checksum:iBA7CFB3475442057
GO
Isoform 3 (identifier: Q3UHH1-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     806-812: Missing.

Note: No experimental confirmation available.
Show »
Length:1,825
Mass (Da):196,342
Checksum:i05981C64AFA4B499
GO
Isoform 4 (identifier: Q3UHH1-4) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-895: Missing.
     1707-1734: VNYVHQFCVGAAKGVLSPFVLQEIVMET → NTSPPQDHCPPVSLPFLSQTSFLALTQS
     1735-1832: Missing.

Note: No experimental confirmation available.
Show »
Length:839
Mass (Da):88,255
Checksum:i6C3D6B636108071C
GO
Isoform 5 (identifier: Q3UHH1-5) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-121: Missing.
     122-126: LYSCL → MKRTF
     770-771: LF → RG
     772-1832: Missing.

Note: No experimental confirmation available.
Show »
Length:650
Mass (Da):70,155
Checksum:i7EFE88D4F56742D3
GO

Sequence cautioni

The sequence BAB26298.1 differs from that shown. Reason: Erroneous initiation. Translation N-terminally extended.Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti599 – 5991G → E in BAC26361 (PubMed:16141072).Curated
Sequence conflicti1317 – 13182QA → KS in BAC41457 (PubMed:12465718).Curated
Sequence conflicti1664 – 16641R → L in AAH59058 (PubMed:15489334).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei1 – 895895Missing in isoform 4. 1 PublicationVSP_029592Add
BLAST
Alternative sequencei1 – 121121Missing in isoform 5. 1 PublicationVSP_029593Add
BLAST
Alternative sequencei122 – 1265LYSCL → MKRTF in isoform 5. 1 PublicationVSP_029594
Alternative sequencei669 – 70234Missing in isoform 2. 1 PublicationVSP_029595Add
BLAST
Alternative sequencei770 – 7712LF → RG in isoform 5. 1 PublicationVSP_029596
Alternative sequencei772 – 18321061Missing in isoform 5. 1 PublicationVSP_029597Add
BLAST
Alternative sequencei806 – 8127Missing in isoform 3. 1 PublicationVSP_029598
Alternative sequencei1707 – 173428VNYVH…IVMET → NTSPPQDHCPPVSLPFLSQT SFLALTQS in isoform 4. 1 PublicationVSP_029599Add
BLAST
Alternative sequencei1735 – 183298Missing in isoform 4. 1 PublicationVSP_029600Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AK009454 mRNA. Translation: BAB26298.1. Different initiation.
AK029263 mRNA. Translation: BAC26361.1.
AK147398 mRNA. Translation: BAE27886.1.
BC059058 mRNA. Translation: AAH59058.1.
BC085161 mRNA. Translation: AAH85161.1.
BC049362 mRNA. Translation: AAH49362.1.
BC151046 mRNA. Translation: AAI51047.1.
BC151056 mRNA. Translation: AAI51057.1.
BC151173 mRNA. Translation: AAI51174.1.
AB093273 mRNA. Translation: BAC41457.3.
CCDSiCCDS36820.1. [Q3UHH1-1]
RefSeqiNP_001239010.1. NM_001252081.1. [Q3UHH1-3]
NP_001239011.1. NM_001252082.1. [Q3UHH1-2]
NP_082272.1. NM_027996.3. [Q3UHH1-1]
UniGeneiMm.275082.

Genome annotation databases

EnsembliENSMUST00000022358; ENSMUSP00000022358; ENSMUSG00000021819. [Q3UHH1-1]
GeneIDi268721.
KEGGimmu:268721.
UCSCiuc007sko.2. mouse. [Q3UHH1-1]
uc007skp.2. mouse. [Q3UHH1-2]
uc007skq.1. mouse. [Q3UHH1-5]
uc007skr.3. mouse. [Q3UHH1-3]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AK009454 mRNA. Translation: BAB26298.1. Different initiation.
AK029263 mRNA. Translation: BAC26361.1.
AK147398 mRNA. Translation: BAE27886.1.
BC059058 mRNA. Translation: AAH59058.1.
BC085161 mRNA. Translation: AAH85161.1.
BC049362 mRNA. Translation: AAH49362.1.
BC151046 mRNA. Translation: AAI51047.1.
BC151056 mRNA. Translation: AAI51057.1.
BC151173 mRNA. Translation: AAI51174.1.
AB093273 mRNA. Translation: BAC41457.3.
CCDSiCCDS36820.1. [Q3UHH1-1]
RefSeqiNP_001239010.1. NM_001252081.1. [Q3UHH1-3]
NP_001239011.1. NM_001252082.1. [Q3UHH1-2]
NP_082272.1. NM_027996.3. [Q3UHH1-1]
UniGeneiMm.275082.

3D structure databases

ProteinModelPortaliQ3UHH1.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi234542. 3 interactions.
IntActiQ3UHH1. 5 interactions.
MINTiMINT-4115901.
STRINGi10090.ENSMUSP00000022358.

PTM databases

iPTMnetiQ3UHH1.
PhosphoSiteiQ3UHH1.

Proteomic databases

EPDiQ3UHH1.
MaxQBiQ3UHH1.
PaxDbiQ3UHH1.
PRIDEiQ3UHH1.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENSMUST00000022358; ENSMUSP00000022358; ENSMUSG00000021819. [Q3UHH1-1]
GeneIDi268721.
KEGGimmu:268721.
UCSCiuc007sko.2. mouse. [Q3UHH1-1]
uc007skp.2. mouse. [Q3UHH1-2]
uc007skq.1. mouse. [Q3UHH1-5]
uc007skr.3. mouse. [Q3UHH1-3]

Organism-specific databases

CTDi23053.
MGIiMGI:1919156. Zswim8.
RougeiSearch...

Phylogenomic databases

eggNOGiKOG3615. Eukaryota.
ENOG410XPG2. LUCA.
GeneTreeiENSGT00390000012572.
HOVERGENiHBG092246.
InParanoidiQ3UHH1.
OMAiQQMYIQC.
OrthoDBiEOG7TJ3H1.
PhylomeDBiQ3UHH1.
TreeFamiTF324881.

Miscellaneous databases

NextBioi392447.
PROiQ3UHH1.
SOURCEiSearch...

Gene expression databases

BgeeiQ3UHH1.
CleanExiMM_2310021P13RIK.
GenevisibleiQ3UHH1. MM.

Family and domain databases

InterProiIPR007527. Znf_SWIM.
[Graphical view]
PROSITEiPS50966. ZF_SWIM. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "The transcriptional landscape of the mammalian genome."
    Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C., Kodzius R., Shimokawa K., Bajic V.B., Brenner S.E., Batalov S., Forrest A.R., Zavolan M., Davis M.J.
    , Wilming L.G., Aidinis V., Allen J.E., Ambesi-Impiombato A., Apweiler R., Aturaliya R.N., Bailey T.L., Bansal M., Baxter L., Beisel K.W., Bersano T., Bono H., Chalk A.M., Chiu K.P., Choudhary V., Christoffels A., Clutterbuck D.R., Crowe M.L., Dalla E., Dalrymple B.P., de Bono B., Della Gatta G., di Bernardo D., Down T., Engstrom P., Fagiolini M., Faulkner G., Fletcher C.F., Fukushima T., Furuno M., Futaki S., Gariboldi M., Georgii-Hemming P., Gingeras T.R., Gojobori T., Green R.E., Gustincich S., Harbers M., Hayashi Y., Hensch T.K., Hirokawa N., Hill D., Huminiecki L., Iacono M., Ikeo K., Iwama A., Ishikawa T., Jakt M., Kanapin A., Katoh M., Kawasawa Y., Kelso J., Kitamura H., Kitano H., Kollias G., Krishnan S.P., Kruger A., Kummerfeld S.K., Kurochkin I.V., Lareau L.F., Lazarevic D., Lipovich L., Liu J., Liuni S., McWilliam S., Madan Babu M., Madera M., Marchionni L., Matsuda H., Matsuzawa S., Miki H., Mignone F., Miyake S., Morris K., Mottagui-Tabar S., Mulder N., Nakano N., Nakauchi H., Ng P., Nilsson R., Nishiguchi S., Nishikawa S., Nori F., Ohara O., Okazaki Y., Orlando V., Pang K.C., Pavan W.J., Pavesi G., Pesole G., Petrovsky N., Piazza S., Reed J., Reid J.F., Ring B.Z., Ringwald M., Rost B., Ruan Y., Salzberg S.L., Sandelin A., Schneider C., Schoenbach C., Sekiguchi K., Semple C.A., Seno S., Sessa L., Sheng Y., Shibata Y., Shimada H., Shimada K., Silva D., Sinclair B., Sperling S., Stupka E., Sugiura K., Sultana R., Takenaka Y., Taki K., Tammoja K., Tan S.L., Tang S., Taylor M.S., Tegner J., Teichmann S.A., Ueda H.R., van Nimwegen E., Verardo R., Wei C.L., Yagi K., Yamanishi H., Zabarovsky E., Zhu S., Zimmer A., Hide W., Bult C., Grimmond S.M., Teasdale R.D., Liu E.T., Brusic V., Quackenbush J., Wahlestedt C., Mattick J.S., Hume D.A., Kai C., Sasaki D., Tomaru Y., Fukuda S., Kanamori-Katayama M., Suzuki M., Aoki J., Arakawa T., Iida J., Imamura K., Itoh M., Kato T., Kawaji H., Kawagashira N., Kawashima T., Kojima M., Kondo S., Konno H., Nakano K., Ninomiya N., Nishio T., Okada M., Plessy C., Shibata K., Shiraki T., Suzuki S., Tagami M., Waki K., Watahiki A., Okamura-Oho Y., Suzuki H., Kawai J., Hayashizaki Y.
    Science 309:1559-1563(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 1 AND 5).
    Strain: C57BL/6J.
    Tissue: Head and Tongue.
  2. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 1 AND 4), NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 64-1832 (ISOFORM 2), NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 622-1832 (ISOFORM 3).
    Strain: C57BL/6J.
    Tissue: Brain and Eye.
  3. "Prediction of the coding sequences of mouse homologues of KIAA gene: I. The complete nucleotide sequences of 100 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries."
    Okazaki N., Kikuno R., Ohara R., Inamoto S., Hara Y., Nagase T., Ohara O., Koga H.
    DNA Res. 9:179-188(2002) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 769-1559 (ISOFORMS 1/2).
    Tissue: Brain.
  4. Okazaki N., Kikuno R., Nagase T., Ohara O., Koga H.
    Submitted (JUN-2003) to the EMBL/GenBank/DDBJ databases
    Cited for: SEQUENCE REVISION.
  5. Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Liver.
  6. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-48; SER-53; SER-1155 AND SER-1158, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Brain, Kidney, Liver, Pancreas, Spleen and Testis.

Entry informationi

Entry nameiZSWM8_MOUSE
AccessioniPrimary (citable) accession number: Q3UHH1
Secondary accession number(s): B2RX90
, Q5U4B9, Q6PCY6, Q80Y41, Q8CE12, Q8CHC3, Q9D789
Entry historyi
Integrated into UniProtKB/Swiss-Prot: December 4, 2007
Last sequence update: October 11, 2005
Last modified: April 13, 2016
This is version 78 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. MGD cross-references
    Mouse Genome Database (MGD) cross-references in UniProtKB/Swiss-Prot
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.