Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Ubiquitin carboxyl-terminal hydrolase 38

Gene

USP38

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 4 out of 5-Experimental evidence at protein leveli

Functioni

Deubiquitinating enzyme exhibiting a preference towards 'Lys-63'-linked Ubiquitin chains.1 Publication

Catalytic activityi

Thiol-dependent hydrolysis of ester, thioester, amide, peptide and isopeptide bonds formed by the C-terminal Gly of ubiquitin (a 76-residue protein attached to proteins as an intracellular targeting signal).1 Publication

Sites

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Active sitei454 – 4541NucleophilePROSITE-ProRule annotation
Active sitei857 – 8571Proton acceptorPROSITE-ProRule annotation

GO - Molecular functioni

  • cysteine-type endopeptidase activity Source: GO_Central
  • ubiquitin-specific protease activity Source: FlyBase

GO - Biological processi

Complete GO annotation...

Keywords - Molecular functioni

Hydrolase, Protease, Thiol protease

Keywords - Biological processi

Ubl conjugation pathway

Protein family/group databases

MEROPSiC19.056.

Names & Taxonomyi

Protein namesi
Recommended name:
Ubiquitin carboxyl-terminal hydrolase 38 (EC:3.4.19.12)
Alternative name(s):
Deubiquitinating enzyme 38
HP43.8KD
Ubiquitin thioesterase 38
Ubiquitin-specific-processing protease 38
Gene namesi
Name:USP38
Synonyms:KIAA1891
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
ProteomesiUP000005640 Componenti: Chromosome 4

Organism-specific databases

HGNCiHGNC:20067. USP38.

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA134880611.

Polymorphism and mutation databases

BioMutaiUSP38.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 10421042Ubiquitin carboxyl-terminal hydrolase 38PRO_0000080668Add
BLAST

Proteomic databases

MaxQBiQ8NB14.
PaxDbiQ8NB14.
PRIDEiQ8NB14.

PTM databases

PhosphoSiteiQ8NB14.

Expressioni

Tissue specificityi

Highly expressed in skeletal muscle. Expressed in adrenal gland.2 Publications

Gene expression databases

BgeeiQ8NB14.
CleanExiHS_USP38.
ExpressionAtlasiQ8NB14. baseline and differential.
GenevisibleiQ8NB14. HS.

Organism-specific databases

HPAiHPA047948.

Interactioni

Protein-protein interaction databases

BioGridi124165. 12 interactions.
IntActiQ8NB14. 7 interactions.
STRINGi9606.ENSP00000303434.

Structurei

Secondary structure

1
1042
Legend: HelixTurnBeta strand
Show more details
Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Helixi2 – 98Combined sources
Helixi15 – 2814Combined sources
Helixi35 – 5117Combined sources
Helixi55 – 7117Combined sources
Helixi73 – 797Combined sources
Helixi82 – 9110Combined sources
Helixi101 – 11414Combined sources
Helixi120 – 13617Combined sources
Helixi141 – 15313Combined sources
Helixi155 – 1573Combined sources
Helixi162 – 17514Combined sources
Helixi185 – 20824Combined sources
Helixi210 – 2123Combined sources
Helixi213 – 22412Combined sources
Helixi235 – 2406Combined sources
Helixi246 – 25813Combined sources
Helixi264 – 27613Combined sources
Helixi277 – 2793Combined sources
Helixi286 – 29914Combined sources
Helixi303 – 31917Combined sources
Helixi320 – 3223Combined sources
Turni324 – 3263Combined sources
Helixi327 – 34014Combined sources
Helixi346 – 3516Combined sources
Helixi352 – 3543Combined sources
Helixi355 – 3628Combined sources
Helixi368 – 38720Combined sources
Helixi392 – 40211Combined sources
Helixi410 – 4167Combined sources

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
4RXXX-ray2.06A1-424[»]
ProteinModelPortaliQ8NB14.
SMRiQ8NB14. Positions 1-425.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini445 – 949505USPAdd
BLAST

Sequence similaritiesi

Belongs to the peptidase C19 family.Curated
Contains 1 USP domain.Curated

Phylogenomic databases

eggNOGiCOG5560.
GeneTreeiENSGT00650000093027.
HOGENOMiHOG000007432.
HOVERGENiHBG060424.
InParanoidiQ8NB14.
KOiK11854.
OMAiHYYSYAR.
OrthoDBiEOG7327ND.
PhylomeDBiQ8NB14.
TreeFamiTF324529.

Family and domain databases

InterProiIPR001394. Peptidase_C19_UCH.
IPR018200. USP_CS.
IPR028889. USP_dom.
[Graphical view]
PfamiPF00443. UCH. 1 hit.
[Graphical view]
PROSITEiPS00972. USP_1. 1 hit.
PS00973. USP_2. 1 hit.
PS50235. USP_3. 1 hit.
[Graphical view]

Sequences (2)i

Sequence statusi: Complete.

This entry describes 2 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q8NB14-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MDKILEGLVS SSHPLPLKRV IVRKVVESAE HWLDEAQCEA MFDLTTRLIL
60 70 80 90 100
EGQDPFQRQV GHQVLEAYAR YHRPEFESFF NKTFVLGLLH QGYHSLDRKD
110 120 130 140 150
VAILDYIHNG LKLIMSCPSV LDLFSLLQVE VLRMVCERPE PQLCARLSDL
160 170 180 190 200
LTDFVQCIPK GKLSITFCQQ LVRTIGHFQC VSTQERELRE YVSQVTKVSN
210 220 230 240 250
LLQNIWKAEP ATLLPSLQEV FASISSTDAS FEPSVALASL VQHIPLQMIT
260 270 280 290 300
VLIRSLTTDP NVKDASMTQA LCRMIDWLSW PLAQHVDTWV IALLKGLAAV
310 320 330 340 350
QKFTILIDVT LLKIELVFNR LWFPLVRPGA LAVLSHMLLS FQHSPEAFHL
360 370 380 390 400
IVPHVVNLVH SFKNDGLPSS TAFLVQLTEL IHCMMYHYSG FPDLYEPILE
410 420 430 440 450
AIKDFPKPSE EKIKLILNQS AWTSQSNSLA SCLSRLSGKS ETGKTGLINL
460 470 480 490 500
GNTCYMNSVI QALFMATDFR RQVLSLNLNG CNSLMKKLQH LFAFLAHTQR
510 520 530 540 550
EAYAPRIFFE ASRPPWFTPR SQQDCSEYLR FLLDRLHEEE KILKVQASHK
560 570 580 590 600
PSEILECSET SLQEVASKAA VLTETPRTSD GEKTLIEKMF GGKLRTHIRC
610 620 630 640 650
LNCRSTSQKV EAFTDLSLAF CPSSSLENMS VQDPASSPSI QDGGLMQASV
660 670 680 690 700
PGPSEEPVVY NPTTAAFICD SLVNEKTIGS PPNEFYCSEN TSVPNESNKI
710 720 730 740 750
LVNKDVPQKP GGETTPSVTD LLNYFLAPEI LTGDNQYYCE NCASLQNAEK
760 770 780 790 800
TMQITEEPEY LILTLLRFSY DQKYHVRRKI LDNVSLPLVL ELPVKRITSF
810 820 830 840 850
SSLSESWSVD VDFTDLSENL AKKLKPSGTD EASCTKLVPY LLSSVVVHSG
860 870 880 890 900
ISSESGHYYS YARNITSTDS SYQMYHQSEA LALASSQSHL LGRDSPSAVF
910 920 930 940 950
EQDLENKEMS KEWFLFNDSR VTFTSFQSVQ KITSRFPKDT AYVLLYKKQH
960 970 980 990 1000
STNGLSGNNP TSGLWINGDP PLQKELMDAI TKDNKLYLQE QELNARARAL
1010 1020 1030 1040
QAASASCSFR PNGFDDNDPP GSCGPTGGGG GGGFNTVGRL VF
Length:1,042
Mass (Da):116,546
Last modified:October 10, 2003 - v2
Checksum:iD888D4F44C6E7251
GO
Isoform 2 (identifier: Q8NB14-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     990-1042: EQELNARARALQAASASCSFRPNGFDDNDPPGSCGPTGGGGGGGFNTVGRLVF → VSWKYKLYLLKILNN

Note: No experimental confirmation available.
Show »
Length:1,004
Mass (Da):113,157
Checksum:i89960B24E70287A4
GO

Sequence cautioni

The sequence AAK26248.1 differs from that shown. Reason: Erroneous initiation. Curated
The sequence BAB71627.1 differs from that shown. Reason: Erroneous initiation. Curated
The sequence BAC03730.1 differs from that shown.The absence of the residues from Tyr-840 to Ser-871 is not the result of an alternative splicing.Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti646 – 6461M → V in BAC03730 (PubMed:14702039).Curated
Sequence conflicti919 – 9191S → G in BAB71627 (PubMed:14702039).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei990 – 104253EQELN…GRLVF → VSWKYKLYLLKILNN in isoform 2. 1 PublicationVSP_054486Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AK057992 mRNA. Translation: BAB71627.1. Different initiation.
AK091712 mRNA. Translation: BAC03730.1. Sequence problems.
AK126943 mRNA. Translation: BAG54405.1.
AL833976 mRNA. Translation: CAD38820.1.
AC099549 Genomic DNA. No translation available.
AC116175 Genomic DNA. No translation available.
CH471056 Genomic DNA. Translation: EAX05075.1.
CH471056 Genomic DNA. Translation: EAX05076.1.
BC039115 mRNA. Translation: AAH39115.1.
BC068975 mRNA. Translation: AAH68975.1.
AB067478 mRNA. Translation: BAB67784.1.
AF211481 mRNA. Translation: AAK26248.1. Different initiation.
CCDSiCCDS3758.1. [Q8NB14-1]
RefSeqiNP_001277254.1. NM_001290325.1. [Q8NB14-2]
NP_001277255.1. NM_001290326.1.
NP_115946.2. NM_032557.6. [Q8NB14-1]
UniGeneiHs.480848.

Genome annotation databases

EnsembliENST00000307017; ENSP00000303434; ENSG00000170185. [Q8NB14-1]
ENST00000510377; ENSP00000427647; ENSG00000170185. [Q8NB14-2]
GeneIDi84640.
KEGGihsa:84640.
UCSCiuc003ija.4. human.
uc003ijb.3. human. [Q8NB14-1]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AK057992 mRNA. Translation: BAB71627.1. Different initiation.
AK091712 mRNA. Translation: BAC03730.1. Sequence problems.
AK126943 mRNA. Translation: BAG54405.1.
AL833976 mRNA. Translation: CAD38820.1.
AC099549 Genomic DNA. No translation available.
AC116175 Genomic DNA. No translation available.
CH471056 Genomic DNA. Translation: EAX05075.1.
CH471056 Genomic DNA. Translation: EAX05076.1.
BC039115 mRNA. Translation: AAH39115.1.
BC068975 mRNA. Translation: AAH68975.1.
AB067478 mRNA. Translation: BAB67784.1.
AF211481 mRNA. Translation: AAK26248.1. Different initiation.
CCDSiCCDS3758.1. [Q8NB14-1]
RefSeqiNP_001277254.1. NM_001290325.1. [Q8NB14-2]
NP_001277255.1. NM_001290326.1.
NP_115946.2. NM_032557.6. [Q8NB14-1]
UniGeneiHs.480848.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
4RXXX-ray2.06A1-424[»]
ProteinModelPortaliQ8NB14.
SMRiQ8NB14. Positions 1-425.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi124165. 12 interactions.
IntActiQ8NB14. 7 interactions.
STRINGi9606.ENSP00000303434.

Protein family/group databases

MEROPSiC19.056.

PTM databases

PhosphoSiteiQ8NB14.

Polymorphism and mutation databases

BioMutaiUSP38.

Proteomic databases

MaxQBiQ8NB14.
PaxDbiQ8NB14.
PRIDEiQ8NB14.

Protocols and materials databases

DNASUi84640.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000307017; ENSP00000303434; ENSG00000170185. [Q8NB14-1]
ENST00000510377; ENSP00000427647; ENSG00000170185. [Q8NB14-2]
GeneIDi84640.
KEGGihsa:84640.
UCSCiuc003ija.4. human.
uc003ijb.3. human. [Q8NB14-1]

Organism-specific databases

CTDi84640.
GeneCardsiGC04P144106.
HGNCiHGNC:20067. USP38.
HPAiHPA047948.
neXtProtiNX_Q8NB14.
PharmGKBiPA134880611.
HUGEiSearch...
GenAtlasiSearch...

Phylogenomic databases

eggNOGiCOG5560.
GeneTreeiENSGT00650000093027.
HOGENOMiHOG000007432.
HOVERGENiHBG060424.
InParanoidiQ8NB14.
KOiK11854.
OMAiHYYSYAR.
OrthoDBiEOG7327ND.
PhylomeDBiQ8NB14.
TreeFamiTF324529.

Miscellaneous databases

ChiTaRSiUSP38. human.
GenomeRNAii84640.
NextBioi74575.
PROiQ8NB14.

Gene expression databases

BgeeiQ8NB14.
CleanExiHS_USP38.
ExpressionAtlasiQ8NB14. baseline and differential.
GenevisibleiQ8NB14. HS.

Family and domain databases

InterProiIPR001394. Peptidase_C19_UCH.
IPR018200. USP_CS.
IPR028889. USP_dom.
[Graphical view]
PfamiPF00443. UCH. 1 hit.
[Graphical view]
PROSITEiPS00972. USP_1. 1 hit.
PS00973. USP_2. 1 hit.
PS50235. USP_3. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Complete sequencing and characterization of 21,243 full-length human cDNAs."
    Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R., Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H., Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.
    , Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K., Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A., Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M., Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y., Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M., Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K., Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S., Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J., Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y., Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N., Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S., Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S., Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O., Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H., Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B., Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y., Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T., Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y., Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S., Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T., Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M., Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T., Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K., Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R., Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.
    Nat. Genet. 36:40-45(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
    Tissue: Brain, Chondrocyte and Gastric mucosa.
  2. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
    Tissue: Testis.
  3. "Generation and annotation of the DNA sequences of human chromosomes 2 and 4."
    Hillier L.W., Graves T.A., Fulton R.S., Fulton L.A., Pepin K.H., Minx P., Wagner-McPherson C., Layman D., Wylie K., Sekhon M., Becker M.C., Fewell G.A., Delehaunty K.D., Miner T.L., Nash W.E., Kremitzki C., Oddy L., Du H.
    , Sun H., Bradshaw-Cordum H., Ali J., Carter J., Cordes M., Harris A., Isak A., van Brunt A., Nguyen C., Du F., Courtney L., Kalicki J., Ozersky P., Abbott S., Armstrong J., Belter E.A., Caruso L., Cedroni M., Cotton M., Davidson T., Desai A., Elliott G., Erb T., Fronick C., Gaige T., Haakenson W., Haglund K., Holmes A., Harkins R., Kim K., Kruchowski S.S., Strong C.M., Grewal N., Goyea E., Hou S., Levy A., Martinka S., Mead K., McLellan M.D., Meyer R., Randall-Maher J., Tomlinson C., Dauphin-Kohlberg S., Kozlowicz-Reilly A., Shah N., Swearengen-Shahid S., Snider J., Strong J.T., Thompson J., Yoakum M., Leonard S., Pearman C., Trani L., Radionenko M., Waligorski J.E., Wang C., Rock S.M., Tin-Wollam A.-M., Maupin R., Latreille P., Wendl M.C., Yang S.-P., Pohl C., Wallis J.W., Spieth J., Bieri T.A., Berkowicz N., Nelson J.O., Osborne J., Ding L., Meyer R., Sabo A., Shotland Y., Sinha P., Wohldmann P.E., Cook L.L., Hickenbotham M.T., Eldred J., Williams D., Jones T.A., She X., Ciccarelli F.D., Izaurralde E., Taylor J., Schmutz J., Myers R.M., Cox D.R., Huang X., McPherson J.D., Mardis E.R., Clifton S.W., Warren W.C., Chinwalla A.T., Eddy S.R., Marra M.A., Ovcharenko I., Furey T.S., Miller W., Eichler E.E., Bork P., Suyama M., Torrents D., Waterston R.H., Wilson R.K.
    Nature 434:724-731(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  4. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  5. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 1 AND 2).
    Tissue: Brain and Testis.
  6. "Prediction of the coding sequences of unidentified human genes. XXI. The complete sequences of 60 new cDNA clones from brain which code for large proteins."
    Nagase T., Kikuno R., Ohara O.
    DNA Res. 8:179-187(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 263-1042 (ISOFORM 1), TISSUE SPECIFICITY.
    Tissue: Brain.
  7. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 633-1042 (ISOFORM 1).
    Tissue: Adrenal gland.
  8. "Cloning and enzymatic analysis of 22 novel human ubiquitin-specific proteases."
    Quesada V., Diaz-Perales A., Gutierrez-Fernandez A., Garabaya C., Cal S., Lopez-Otin C.
    Biochem. Biophys. Res. Commun. 314:54-62(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: TISSUE SPECIFICITY, ENZYME ACTIVITY.
  9. "Profiling ubiquitin linkage specificities of deubiquitinating enzymes with branched ubiquitin isopeptide probes."
    Iphofer A., Kummer A., Nimtz M., Ritter A., Arnold T., Frank R., van den Heuvel J., Kessler B.M., Jansch L., Franke R.
    ChemBioChem 13:1416-1420(2012) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, LINKAGE SPECIFICITY.

Entry informationi

Entry nameiUBP38_HUMAN
AccessioniPrimary (citable) accession number: Q8NB14
Secondary accession number(s): B3KX93
, Q3ZCV1, Q8NDF5, Q96DK6, Q96PZ6, Q9BY55
Entry historyi
Integrated into UniProtKB/Swiss-Prot: October 10, 2003
Last sequence update: October 10, 2003
Last modified: June 24, 2015
This is version 108 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Reference proteome

Documents

  1. Human chromosome 4
    Human chromosome 4: entries, gene names and cross-references to MIM
  2. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  3. Peptidase families
    Classification of peptidase families and list of entries
  4. SIMILARITY comments
    Index of protein domains and families

External Data

Dasty 3

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into Uniref entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.