Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Protein HIRA homolog

Gene

Hira

Organism
Drosophila melanogaster (Fruit fly)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Required for the periodic repression of histone gene transcription during the cell cycle (By similarity). Required for replication-independent chromatin assembly. Promotes remodeling of sperm chromatin following fertilization via the incorporation of histone H3.3 and histone H4.By similarity4 Publications

GO - Molecular functioni

  • chromatin binding Source: UniProtKB

GO - Biological processi

  • chromatin remodeling Source: FlyBase
  • DNA replication-independent nucleosome assembly Source: FlyBase
  • fertilization, exchange of chromosomal proteins Source: FlyBase
  • regulation of transcription, DNA-templated Source: UniProtKB-KW
  • sperm chromatin decondensation Source: FlyBase
  • transcription, DNA-templated Source: UniProtKB-KW
Complete GO annotation...

Keywords - Molecular functioni

Chromatin regulator, Repressor

Keywords - Biological processi

Transcription, Transcription regulation

Enzyme and pathway databases

SignaLinkiO17468.

Names & Taxonomyi

Protein namesi
Recommended name:
Protein HIRA homolog
Alternative name(s):
Protein sesame
dHIRA
Gene namesi
Name:Hira
Synonyms:Dhh, ssm
ORF Names:CG12153
OrganismiDrosophila melanogaster (Fruit fly)
Taxonomic identifieri7227 [NCBI]
Taxonomic lineageiEukaryotaMetazoaEcdysozoaArthropodaHexapodaInsectaPterygotaNeopteraEndopterygotaDipteraBrachyceraMuscomorphaEphydroideaDrosophilidaeDrosophilaSophophora
Proteomesi
  • UP000000803 Componenti: Chromosome X

Organism-specific databases

FlyBaseiFBgn0022786. Hira.

Subcellular locationi

  • Nucleus 1 Publication

  • Note: Maternally contributed protein localizes specifically to the male nucleus in fertilized eggs. This localization persists from the initiation of sperm nucleus decondensation to the end of pronucleus formation.

GO - Cellular componenti

  • germinal vesicle Source: FlyBase
  • male germ cell nucleus Source: UniProtKB
  • nucleus Source: FlyBase
Complete GO annotation...

Keywords - Cellular componenti

Nucleus

Pathology & Biotechi

Mutagenesis

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Mutagenesisi225 – 2251R → K in allele ssm; maternal effect embryonic lethal mutation which impairs maternal histone deposition in the male pronucleus. 1 Publication

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 10471047Protein HIRA homologPRO_0000051022Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei519 – 5191Phosphoserine1 Publication

Keywords - PTMi

Phosphoprotein

Proteomic databases

PaxDbiO17468.

PTM databases

iPTMnetiO17468.

Expressioni

Developmental stagei

Expressed maternally and zygotically throughout development to adults (male and female).2 Publications

Gene expression databases

BgeeiO17468.
GenevisibleiO17468. DM.

Interactioni

Protein-protein interaction databases

BioGridi58155. 20 interactions.
IntActiO17468. 2 interactions.
STRINGi7227.FBpp0071028.

Structurei

3D structure databases

ProteinModelPortaliO17468.
SMRiO17468. Positions 12-352, 610-669.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Repeati11 – 5343WD 1Add
BLAST
Repeati68 – 10740WD 2Add
BLAST
Repeati127 – 16640WD 3Add
BLAST
Repeati170 – 20940WD 4Add
BLAST
Repeati218 – 26346WD 5Add
BLAST
Repeati264 – 31956WD 6Add
BLAST
Repeati323 – 36442WD 7Add
BLAST

Compositional bias

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Compositional biasi944 – 97229Poly-SerAdd
BLAST

Sequence similaritiesi

Belongs to the WD repeat HIR1 family.Curated
Contains 7 WD repeats.PROSITE-ProRule annotation

Keywords - Domaini

Repeat, WD repeat

Phylogenomic databases

eggNOGiKOG0973. Eukaryota.
ENOG410XP1H. LUCA.
GeneTreeiENSGT00550000074919.
InParanoidiO17468.
KOiK11293.
OMAiILTYSHT.
OrthoDBiEOG74J975.
PhylomeDBiO17468.

Family and domain databases

Gene3Di2.130.10.10. 2 hits.
InterProiIPR020472. G-protein_beta_WD-40_rep.
IPR031120. HIR1.
IPR011494. Hira.
IPR019015. HIRA_B_motif.
IPR015943. WD40/YVTN_repeat-like_dom.
IPR001680. WD40_repeat.
IPR019775. WD40_repeat_CS.
IPR017986. WD40_repeat_dom.
[Graphical view]
PANTHERiPTHR13831. PTHR13831. 1 hit.
PfamiPF07569. Hira. 1 hit.
PF09453. HIRA_B. 1 hit.
PF00400. WD40. 4 hits.
[Graphical view]
PRINTSiPR00320. GPROTEINBRPT.
SMARTiSM00320. WD40. 8 hits.
[Graphical view]
SUPFAMiSSF50978. SSF50978. 2 hits.
PROSITEiPS00678. WD_REPEATS_1. 1 hit.
PS50082. WD_REPEATS_2. 3 hits.
PS50294. WD_REPEATS_REGION. 1 hit.
[Graphical view]

Sequences (3)i

Sequence statusi: Complete.

This entry describes 3 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: O17468-1) [UniParc]FASTAAdd to basket

Also known as: Long

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MRLLKPAWVH HDDKQIFSVD IHKDCTKFAT GGQGSDCGRV VIWNLLPVLS
60 70 80 90 100
DKAEFDADVP KMLCQMDQHL ACVNCVRWSQ NGQNLASGSD DKLIMIWRKS
110 120 130 140 150
AGSSGVFGTG GMQKNHESWK CFYTLRGHDG DVLDLAWSPN DVYLASCSID
160 170 180 190 200
NTVIIWDAQA FPHSVATLKG HTGLVKGVSW DPLGRFLASQ SDDRSIKIWN
210 220 230 240 250
TMNWSLSHTI TEPFEECGGT THILRLSWSP DGQYLVSAHA MNGGGPTAQI
260 270 280 290 300
IEREGWKCDK DFVGHRKAVT CVRFHNSILS RQENDGSPSK PLQYCCLAVG
310 320 330 340 350
SRDRSLSVWM TALQRPMVVI HELFNASILD LTWGPQECLL MACSVDGSIA
360 370 380 390 400
CLKFTEEELG KAISEEEQNA IIRKMYGKNY VNGLGKSAPV LEHPQRLLLP
410 420 430 440 450
QGDKPTKFPL SNNNEANQRP ISKQTETRTK DGKRRITPMF IPLHEDGPTS
460 470 480 490 500
LSMNIVSSSG SSTTALTSCS AAIGTLPAAA PTESAATPLM PLEPLVSKID
510 520 530 540 550
LGRLDSRLKT QPASQRRQSL PFDPGQSNEL LRTPRLEEHQ SSTCSPSNLN
560 570 580 590 600
VTATGKSEFV KAALDYRLHV SNGHLKTQHG MLAKVTASDS KEMLWEFYVG
610 620 630 640 650
SPLVNLNLCE KYAMLCSLDG SMRLISMETG CPVFPAISLT SSAVHCAFSP
660 670 680 690 700
DNSLVGVLTE CGLLRIWDIA KKVVSLAAGC LELLNKHGTA AQFSVTNQGM
710 720 730 740 750
PLIGFPSGNS YSYSTSLQSW LVLATKDAIM YHGIRGTLPR DMDQMQQKFP
760 770 780 790 800
LLSMQASSQN YFSFTGSMEL RHSESWQQCA KIRFIENQIK LCEALQSLDE
810 820 830 840 850
LQHWHKMLTF QLATHGSEKR MRVFLDDLLS MPEPGISQFV PKLELMQCVL
860 870 880 890 900
DTLKPHSEWN RLHSEYTELL KECKSERQKD IFATPAPPQQ KTASSAGSSP
910 920 930 940 950
RSGEATGEEV TEKDGATAVA AAVVAGSRMA VTTGTSTTTT TTASSSLSSS
960 970 980 990 1000
GSSSSTSGSG SSSSSSSTSS LSVPQPAPSL SPEIQTLDSP TVCIDDEILS
1010 1020 1030 1040
ASSSLPPLDT SPVEVSPAST SGGAASTSPA ASVAGSAPVS SSKTDQT
Length:1,047
Mass (Da):113,415
Last modified:December 1, 2000 - v2
Checksum:i3614D5F411DC440C
GO
Isoform 2 (identifier: O17468-2) [UniParc]FASTAAdd to basket

Also known as: Short

The sequence of this isoform differs from the canonical sequence as follows:
     63-63: L → LPVLSDKAEFDADVPKML
     430-437: KDGKRRIT → LSLICKIF
     438-1047: Missing.

Show »
Length:454
Mass (Da):50,491
Checksum:i58AE96763EDA2093
GO
Isoform 3 (identifier: O17468-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     430-437: KDGKRRIT → LSLICKIF
     438-1047: Missing.

Show »
Length:437
Mass (Da):48,633
Checksum:iA19A2A76FC6357B4
GO

Sequence cautioni

The sequence AAC48360.1 differs from that shown. Reason: Frameshift at position 1043. Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti53 – 531A → G in AAC48360 (PubMed:9611274).Curated
Sequence conflicti58 – 581D → E in AAC48360 (PubMed:9611274).Curated
Sequence conflicti64 – 641C → G in AAC48360 (PubMed:9611274).Curated
Sequence conflicti72 – 721C → S in AAC48360 (PubMed:9611274).Curated
Sequence conflicti159 – 1635QAFPH → RHFHN in AAC64041 (PubMed:9712723).Curated
Sequence conflicti169 – 1691K → E in AAC64041 (PubMed:9712723).Curated
Sequence conflicti179 – 1791S → W in AAC48360 (PubMed:9611274).Curated
Sequence conflicti232 – 2321G → A in AAC64041 (PubMed:9712723).Curated
Sequence conflicti242 – 2421N → D in CAA10954 (PubMed:9712723).Curated
Sequence conflicti330 – 3301D → Y in AAC48360 (PubMed:9611274).Curated
Sequence conflicti416 – 4161A → V in AAC48360 (PubMed:9611274).Curated
Sequence conflicti417 – 4171N → I in AAV37052 (Ref. 6) Curated
Sequence conflicti451 – 4511L → M in AAV37052 (Ref. 6) Curated
Sequence conflicti453 – 4553MNI → LNF in AAC48360 (PubMed:9611274).Curated
Sequence conflicti459 – 4591S → R in CAA10954 (PubMed:9712723).Curated
Sequence conflicti536 – 5361L → V in AAC48360 (PubMed:9611274).Curated
Sequence conflicti830 – 8301S → T in AAV37052 (Ref. 6) Curated
Sequence conflicti890 – 8923QKT → PKA in AAV37052 (Ref. 6) Curated
Sequence conflicti1043 – 10431K → Q in AAC48360 (PubMed:9611274).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei63 – 631L → LPVLSDKAEFDADVPKML in isoform 2. 1 PublicationVSP_006775
Alternative sequencei430 – 4378KDGKRRIT → LSLICKIF in isoform 2 and isoform 3. 2 PublicationsVSP_006776
Alternative sequencei438 – 1047610Missing in isoform 2 and isoform 3. 2 PublicationsVSP_006777Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF031081 mRNA. Translation: AAC48360.1. Frameshift.
AJ222709 mRNA. Translation: CAA10954.1.
AF071881 mRNA. Translation: AAC64041.1.
AE014298 Genomic DNA. Translation: AAF46267.1.
AY069414 mRNA. Translation: AAL39559.1.
BT016167 mRNA. Translation: AAV37052.1.
PIRiA59246.
RefSeqiNP_572401.2. NM_132173.3. [O17468-1]

Genome annotation databases

EnsemblMetazoaiFBtr0071070; FBpp0071028; FBgn0022786. [O17468-1]
GeneIDi31680.
KEGGidme:Dmel_CG12153.

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AF031081 mRNA. Translation: AAC48360.1. Frameshift.
AJ222709 mRNA. Translation: CAA10954.1.
AF071881 mRNA. Translation: AAC64041.1.
AE014298 Genomic DNA. Translation: AAF46267.1.
AY069414 mRNA. Translation: AAL39559.1.
BT016167 mRNA. Translation: AAV37052.1.
PIRiA59246.
RefSeqiNP_572401.2. NM_132173.3. [O17468-1]

3D structure databases

ProteinModelPortaliO17468.
SMRiO17468. Positions 12-352, 610-669.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi58155. 20 interactions.
IntActiO17468. 2 interactions.
STRINGi7227.FBpp0071028.

PTM databases

iPTMnetiO17468.

Proteomic databases

PaxDbiO17468.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsemblMetazoaiFBtr0071070; FBpp0071028; FBgn0022786. [O17468-1]
GeneIDi31680.
KEGGidme:Dmel_CG12153.

Organism-specific databases

CTDi7290.
FlyBaseiFBgn0022786. Hira.

Phylogenomic databases

eggNOGiKOG0973. Eukaryota.
ENOG410XP1H. LUCA.
GeneTreeiENSGT00550000074919.
InParanoidiO17468.
KOiK11293.
OMAiILTYSHT.
OrthoDBiEOG74J975.
PhylomeDBiO17468.

Enzyme and pathway databases

SignaLinkiO17468.

Miscellaneous databases

ChiTaRSiHira. fly.
GenomeRNAii31680.
PROiO17468.

Gene expression databases

BgeeiO17468.
GenevisibleiO17468. DM.

Family and domain databases

Gene3Di2.130.10.10. 2 hits.
InterProiIPR020472. G-protein_beta_WD-40_rep.
IPR031120. HIR1.
IPR011494. Hira.
IPR019015. HIRA_B_motif.
IPR015943. WD40/YVTN_repeat-like_dom.
IPR001680. WD40_repeat.
IPR019775. WD40_repeat_CS.
IPR017986. WD40_repeat_dom.
[Graphical view]
PANTHERiPTHR13831. PTHR13831. 1 hit.
PfamiPF07569. Hira. 1 hit.
PF09453. HIRA_B. 1 hit.
PF00400. WD40. 4 hits.
[Graphical view]
PRINTSiPR00320. GPROTEINBRPT.
SMARTiSM00320. WD40. 8 hits.
[Graphical view]
SUPFAMiSSF50978. SSF50978. 2 hits.
PROSITEiPS00678. WD_REPEATS_1. 1 hit.
PS50082. WD_REPEATS_2. 3 hits.
PS50294. WD_REPEATS_REGION. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Isolation and characterization of a new gene encoding a member of the HIRA family of proteins from Drosophila melanogaster."
    Kirov N., Shtilbans A., Rushlow C.
    Gene 212:323-332(1998) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), DEVELOPMENTAL STAGE.
    Tissue: Embryo.
  2. "Cloning, chromosome mapping and expression analysis of the HIRA gene from Drosophila melanogaster."
    Llevadot R., Marques G., Pritchard M., Estivill X., Ferrus A., Scambler P.
    Biochem. Biophys. Res. Commun. 249:486-491(1998) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS 1 AND 2), ALTERNATIVE SPLICING, DEVELOPMENTAL STAGE.
    Tissue: Embryo.
  3. "The genome sequence of Drosophila melanogaster."
    Adams M.D., Celniker S.E., Holt R.A., Evans C.A., Gocayne J.D., Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., George R.A., Lewis S.E., Richards S., Ashburner M., Henderson S.N., Sutton G.G., Wortman J.R., Yandell M.D.
    , Zhang Q., Chen L.X., Brandon R.C., Rogers Y.-H.C., Blazej R.G., Champe M., Pfeiffer B.D., Wan K.H., Doyle C., Baxter E.G., Helt G., Nelson C.R., Miklos G.L.G., Abril J.F., Agbayani A., An H.-J., Andrews-Pfannkoch C., Baldwin D., Ballew R.M., Basu A., Baxendale J., Bayraktaroglu L., Beasley E.M., Beeson K.Y., Benos P.V., Berman B.P., Bhandari D., Bolshakov S., Borkova D., Botchan M.R., Bouck J., Brokstein P., Brottier P., Burtis K.C., Busam D.A., Butler H., Cadieu E., Center A., Chandra I., Cherry J.M., Cawley S., Dahlke C., Davenport L.B., Davies P., de Pablos B., Delcher A., Deng Z., Mays A.D., Dew I., Dietz S.M., Dodson K., Doup L.E., Downes M., Dugan-Rocha S., Dunkov B.C., Dunn P., Durbin K.J., Evangelista C.C., Ferraz C., Ferriera S., Fleischmann W., Fosler C., Gabrielian A.E., Garg N.S., Gelbart W.M., Glasser K., Glodek A., Gong F., Gorrell J.H., Gu Z., Guan P., Harris M., Harris N.L., Harvey D.A., Heiman T.J., Hernandez J.R., Houck J., Hostin D., Houston K.A., Howland T.J., Wei M.-H., Ibegwam C., Jalali M., Kalush F., Karpen G.H., Ke Z., Kennison J.A., Ketchum K.A., Kimmel B.E., Kodira C.D., Kraft C.L., Kravitz S., Kulp D., Lai Z., Lasko P., Lei Y., Levitsky A.A., Li J.H., Li Z., Liang Y., Lin X., Liu X., Mattei B., McIntosh T.C., McLeod M.P., McPherson D., Merkulov G., Milshina N.V., Mobarry C., Morris J., Moshrefi A., Mount S.M., Moy M., Murphy B., Murphy L., Muzny D.M., Nelson D.L., Nelson D.R., Nelson K.A., Nixon K., Nusskern D.R., Pacleb J.M., Palazzolo M., Pittman G.S., Pan S., Pollard J., Puri V., Reese M.G., Reinert K., Remington K., Saunders R.D.C., Scheeler F., Shen H., Shue B.C., Siden-Kiamos I., Simpson M., Skupski M.P., Smith T.J., Spier E., Spradling A.C., Stapleton M., Strong R., Sun E., Svirskas R., Tector C., Turner R., Venter E., Wang A.H., Wang X., Wang Z.-Y., Wassarman D.A., Weinstock G.M., Weissenbach J., Williams S.M., Woodage T., Worley K.C., Wu D., Yang S., Yao Q.A., Ye J., Yeh R.-F., Zaveri J.S., Zhan M., Zhang G., Zhao Q., Zheng L., Zheng X.H., Zhong F.N., Zhong W., Zhou X., Zhu S.C., Zhu X., Smith H.O., Gibbs R.A., Myers E.W., Rubin G.M., Venter J.C.
    Science 287:2185-2195(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
    Strain: Berkeley.
  4. Cited for: GENOME REANNOTATION.
    Strain: Berkeley.
  5. Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 3).
    Strain: Berkeley.
    Tissue: Embryo.
  6. Stapleton M., Carlson J.W., Chavez C., Frise E., George R.A., Pacleb J.M., Park S., Wan K.H., Yu C., Rubin G.M., Celniker S.E.
    Submitted (OCT-2004) to the EMBL/GenBank/DDBJ databases
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1).
    Strain: Berkeley.
    Tissue: Testis.
  7. "The maternal effect mutation sesame affects the formation of the male pronucleus in Drosophila melanogaster."
    Loppin B., Docquier M., Bonneton F., Couble P.
    Dev. Biol. 222:392-404(2000) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
  8. "The Drosophila maternal gene sesame is required for sperm chromatin remodeling at fertilization."
    Loppin B., Berger F., Couble P.
    Chromosoma 110:430-440(2001) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
  9. "Replacement by Drosophila melanogaster protamines and Mst77F of histones during chromatin condensation in late spermatids and role of sesame in the removal of these proteins from the male pronucleus."
    Jayaramaiah Raja S., Renkawitz-Pohl R.
    Mol. Cell. Biol. 25:6165-6177(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION.
  10. Erratum
    Jayaramaiah Raja S., Renkawitz-Pohl R.
    Mol. Cell. Biol. 26:3682-3682(2006)
  11. "The histone H3.3 chaperone HIRA is essential for chromatin assembly in the male pronucleus."
    Loppin B., Bonnefoy E., Anselme C., Laurencon A., Karr T.L., Couble P.
    Nature 437:1386-1390(2005) [PubMed] [Europe PMC] [Abstract]
    Cited for: FUNCTION, SUBCELLULAR LOCATION, MUTAGENESIS OF ARG-225.
  12. "An integrated chemical, mass spectrometric and computational strategy for (quantitative) phosphoproteomics: application to Drosophila melanogaster Kc167 cells."
    Bodenmiller B., Mueller L.N., Pedrioli P.G.A., Pflieger D., Juenger M.A., Eng J.K., Aebersold R., Tao W.A.
    Mol. Biosyst. 3:275-286(2007) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-519, IDENTIFICATION BY MASS SPECTROMETRY.

Entry informationi

Entry nameiHIRA_DROME
AccessioniPrimary (citable) accession number: O17468
Secondary accession number(s): O46105
, O77144, Q5U0S5, Q8T0C3, Q9W3Q3
Entry historyi
Integrated into UniProtKB/Swiss-Prot: July 15, 1998
Last sequence update: December 1, 2000
Last modified: July 6, 2016
This is version 145 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programDrosophila annotation project

Miscellaneousi

Caution

Was originally thought to be involved in protamine removal but this was shown to be incorrect in the subsequent published erratum.1 Publication

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Drosophila
    Drosophila: entries, gene names and cross-references to FlyBase
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.