Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Heterogeneous nuclear ribonucleoprotein D0

Gene

HNRNPD

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Binds with high affinity to RNA molecules that contain AU-rich elements (AREs) found within the 3'-UTR of many proto-oncogenes and cytokine mRNAs. Also binds to double- and single-stranded DNA sequences in a specific manner and functions a transcription factor. Each of the RNA-binding domains specifically can bind solely to a single-stranded non-monotonous 5'-UUAG-3' sequence and also weaker to the single-stranded 5'-TTAGGG-3' telomeric DNA repeat. Binds RNA oligonucleotides with 5'-UUAGGG-3' repeats more tightly than the telomeric single-stranded DNA 5'-TTAGGG-3' repeats. Binding of RRM1 to DNA inhibits the formation of DNA quadruplex structure which may play a role in telomere elongation. May be involved in translationally coupled mRNA turnover. Implicated with other RNA-binding proteins in the cytoplasmic deadenylation/translational and decay interplay of the FOS mRNA mediated by the major coding-region determinant of instability (mCRD) domain. May play a role in the regulation of the rhythmic expression of circadian clock core genes. Directly binds to the 3'UTR of CRY1 mRNA and induces CRY1 rhythmic translation. May also be involved in the regulation of PER2 translation.3 Publications

GO - Molecular functioni

  • AT DNA binding Source: Ensembl
  • chromatin binding Source: Ensembl
  • mRNA 3'-UTR AU-rich region binding Source: Ensembl
  • nucleotide binding Source: InterPro
  • poly(A) RNA binding Source: UniProtKB
  • RNA binding Source: UniProtKB
  • telomeric DNA binding Source: UniProtKB

GO - Biological processi

  • 3'-UTR-mediated mRNA destabilization Source: Ensembl
  • cellular response to amino acid stimulus Source: Ensembl
  • cellular response to estradiol stimulus Source: Ensembl
  • cellular response to nitric oxide Source: Ensembl
  • cellular response to putrescine Source: Ensembl
  • cerebellum development Source: Ensembl
  • circadian regulation of translation Source: UniProtKB
  • gene expression Source: Reactome
  • hepatocyte dedifferentiation Source: Ensembl
  • liver development Source: Ensembl
  • mRNA splicing, via spliceosome Source: Reactome
  • mRNA stabilization Source: Ensembl
  • positive regulation of transcription, DNA-templated Source: UniProtKB
  • positive regulation of translation Source: UniProtKB
  • regulation of circadian rhythm Source: UniProtKB
  • regulation of mRNA stability Source: Reactome
  • regulation of transcription, DNA-templated Source: UniProtKB
  • response to calcium ion Source: Ensembl
  • response to electrical stimulus Source: Ensembl
  • response to fluoxetine Source: Ensembl
  • response to rapamycin Source: Ensembl
  • response to sodium phosphate Source: Ensembl
  • RNA catabolic process Source: ProtInc
  • RNA processing Source: ProtInc
  • transcription, DNA-templated Source: UniProtKB-KW
Complete GO annotation...

Keywords - Molecular functioni

Ribonucleoprotein

Keywords - Biological processi

Biological rhythms, Transcription, Transcription regulation

Keywords - Ligandi

DNA-binding, RNA-binding

Enzyme and pathway databases

ReactomeiR-HSA-450408. AUF1 (hnRNP D0) binds and destabilizes mRNA.
R-HSA-72163. mRNA Splicing - Major Pathway.
R-HSA-72203. Processing of Capped Intron-Containing Pre-mRNA.
SIGNORiQ14103.

Names & Taxonomyi

Protein namesi
Recommended name:
Heterogeneous nuclear ribonucleoprotein D0
Short name:
hnRNP D0
Alternative name(s):
AU-rich element RNA-binding protein 1
Gene namesi
Name:HNRNPD
Synonyms:AUF1, HNRPD
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 4

Organism-specific databases

HGNCiHGNC:5036. HNRNPD.

Subcellular locationi

  • Nucleus
  • Cytoplasm

  • Note: Localized in cytoplasmic mRNP granules containing untranslated mRNAs. Component of ribonucleosomes. Cytoplasmic localization oscillates diurnally.

GO - Cellular componenti

  • cytosol Source: Reactome
  • extracellular exosome Source: UniProtKB
  • intracellular ribonucleoprotein complex Source: UniProtKB
  • nucleoplasm Source: HPA
  • nucleus Source: UniProtKB
Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Nucleus

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA29361.

Polymorphism and mutation databases

BioMutaiHNRNPD.
DMDMi13124489.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Initiator methionineiRemovedCombined sources
Chaini2 – 355354Heterogeneous nuclear ribonucleoprotein D0PRO_0000081849Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Modified residuei2 – 21N-acetylserineCombined sources
Modified residuei71 – 711PhosphoserineCombined sources
Modified residuei80 – 801PhosphoserineCombined sources
Modified residuei82 – 821PhosphoserineCombined sources
Modified residuei83 – 831PhosphoserineCombined sources
Modified residuei91 – 911PhosphothreonineCombined sources
Modified residuei119 – 1191N6-methyllysine1 Publication
Modified residuei127 – 1271PhosphothreonineCombined sources
Modified residuei165 – 1651N6-acetyllysineCombined sources
Modified residuei190 – 1901PhosphoserineCombined sources
Modified residuei193 – 1931PhosphothreonineCombined sources
Cross-linki197 – 197Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)Combined sources
Modified residuei243 – 2431N6-acetyllysineBy similarity
Modified residuei251 – 2511N6-acetyllysineCombined sources
Modified residuei271 – 2711PhosphoserineCombined sources
Modified residuei345 – 3451Dimethylated arginineCombined sources
Isoform 4 (identifier: Q14103-4)
Modified residuei273 – 2731N6-acetyllysineCombined sources
Isoform 3 (identifier: Q14103-3)
Modified residuei292 – 2921N6-acetyllysineCombined sources

Post-translational modificationi

Arg-345 is dimethylated, probably to asymmetric dimethylarginine.
Methylated by PRMT1, in an insulin-dependent manner. The PRMT1-mediated methylation regulates tyrosine phosphorylation (By similarity).By similarity

Keywords - PTMi

Acetylation, Isopeptide bond, Methylation, Phosphoprotein, Ubl conjugation

Proteomic databases

EPDiQ14103.
MaxQBiQ14103.
PaxDbiQ14103.
PeptideAtlasiQ14103.
PRIDEiQ14103.
TopDownProteomicsiQ14103-1. [Q14103-1]
Q14103-2. [Q14103-2]
Q14103-3. [Q14103-3]

2D gel databases

SWISS-2DPAGEQ14103.

PTM databases

iPTMnetiQ14103.
PhosphoSiteiQ14103.
SwissPalmiQ14103.

Miscellaneous databases

PMAP-CutDBQ14103.

Expressioni

Gene expression databases

BgeeiENSG00000138668.
ExpressionAtlasiQ14103. baseline and differential.
GenevisibleiQ14103. HS.

Organism-specific databases

HPAiHPA004911.

Interactioni

Subunit structurei

Identified in a IGF2BP1-dependent mRNP granule complex containing untranslated mRNAs. Part of a complex associated with the FOS mCRD domain and consisting of PABPC1, PAIP1, CSDE1/UNR and SYNCRIP. Interacts with IGF2BP2. Interacts with GTPBP1. Interacts with EIF4G1; the interaction requires RNA. Interacts with EIF3B and RPS3.5 Publications

Binary interactionsi

WithEntry#Exp.IntActNotes
EIF4G1Q046373EBI-432545,EBI-73711
IGF2BP2Q9Y6M14EBI-432545,EBI-1024419
ING4Q9UNL49EBI-299674,EBI-2866661
LDHAL6BQ9BYZ22EBI-299674,EBI-1108377
PABPC1P119402EBI-432545,EBI-81531
SFNP319477EBI-432545,EBI-476295
SYNCRIPO605063EBI-432545,EBI-1024357
YBX1P678093EBI-432545,EBI-354065

Protein-protein interaction databases

BioGridi109425. 275 interactions.
DIPiDIP-31163N.
IntActiQ14103. 141 interactions.
MINTiMINT-5001251.
STRINGi9606.ENSP00000313199.

Structurei

Secondary structure

1
355
Legend: HelixTurnBeta strand
Show more details
Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Beta strandi99 – 1024Combined sources
Helixi110 – 1189Combined sources
Beta strandi123 – 1275Combined sources
Turni132 – 1354Combined sources
Beta strandi139 – 1479Combined sources
Helixi148 – 1569Combined sources
Beta strandi168 – 1703Combined sources
Beta strandi184 – 1874Combined sources
Helixi195 – 20511Combined sources
Beta strandi209 – 2113Combined sources
Turni217 – 2193Combined sources
Beta strandi220 – 2223Combined sources
Beta strandi226 – 2294Combined sources
Beta strandi231 – 2333Combined sources
Helixi234 – 2407Combined sources
Beta strandi244 – 2474Combined sources
Beta strandi250 – 2534Combined sources
Beta strandi254 – 2563Combined sources
Beta strandi349 – 3513Combined sources

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
1HD0NMR-A98-172[»]
1HD1NMR-A98-172[»]
1IQTNMR-A183-257[»]
1WTBNMR-A181-259[»]
1X0FNMR-A181-259[»]
2Z5NX-ray3.20B332-355[»]
ProteinModelPortaliQ14103.
SMRiQ14103. Positions 98-259.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiQ14103.

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini97 – 17983RRM 1PROSITE-ProRule annotationAdd
BLAST
Domaini182 – 26180RRM 2PROSITE-ProRule annotationAdd
BLAST

Compositional bias

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Compositional biasi11 – 4535Ala-richAdd
BLAST
Compositional biasi270 – 34778Gly-richAdd
BLAST
Compositional biasi294 – 33239Tyr-richAdd
BLAST

Sequence similaritiesi

Contains 2 RRM (RNA recognition motif) domains.PROSITE-ProRule annotation

Keywords - Domaini

Repeat

Phylogenomic databases

eggNOGiKOG0118. Eukaryota.
COG0724. LUCA.
GeneTreeiENSGT00760000118873.
HOVERGENiHBG002295.
InParanoidiQ14103.
KOiK13044.
OMAiTFKDEEP.
OrthoDBiEOG091G1CPI.
PhylomeDBiQ14103.
TreeFamiTF314808.

Family and domain databases

Gene3Di3.30.70.330. 2 hits.
InterProiIPR012956. CARG-binding_factor_N.
IPR012677. Nucleotide-bd_a/b_plait.
IPR000504. RRM_dom.
[Graphical view]
PfamiPF08143. CBFNT. 1 hit.
PF00076. RRM_1. 2 hits.
[Graphical view]
SMARTiSM00360. RRM. 2 hits.
[Graphical view]
SUPFAMiSSF54928. SSF54928. 2 hits.
PROSITEiPS50102. RRM. 2 hits.
[Graphical view]

Sequences (4)i

Sequence statusi: Complete.

Sequence processingi: The displayed sequence is further processed into a mature form.

This entry describes 4 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q14103-1) [UniParc]FASTAAdd to basket
Also known as: p45, Dx9

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MSEEQFGGDG AAAAATAAVG GSAGEQEGAM VAATQGAAAA AGSGAGTGGG
60 70 80 90 100
TASGGTEGGS AESEGAKIDA SKNEEDEGHS NSSPRHSEAA TAQREEWKMF
110 120 130 140 150
IGGLSWDTTK KDLKDYFSKF GEVVDCTLKL DPITGRSRGF GFVLFKESES
160 170 180 190 200
VDKVMDQKEH KLNGKVIDPK RAKAMKTKEP VKKIFVGGLS PDTPEEKIRE
210 220 230 240 250
YFGGFGEVES IELPMDNKTN KRRGFCFITF KEEEPVKKIM EKKYHNVGLS
260 270 280 290 300
KCEIKVAMSK EQYQQQQQWG SRGGFAGRAR GRGGGPSQNW NQGYSNYWNQ
310 320 330 340 350
GYGNYGYNSQ GYGGYGGYDY TGYNNYYGYG DYSNQQSGYG KVSRRGGHQN

SYKPY
Length:355
Mass (Da):38,434
Last modified:November 1, 1996 - v1
Checksum:iD0B6EA177BEF789E
GO
Isoform 2 (identifier: Q14103-2) [UniParc]FASTAAdd to basket
Also known as: p42, Dx4

The sequence of this isoform differs from the canonical sequence as follows:
     79-97: Missing.

Show »
Length:336
Mass (Da):36,272
Checksum:iFEE18D61B7714B51
GO
Isoform 3 (identifier: Q14103-3) [UniParc]FASTAAdd to basket
Also known as: p40, Dx7

The sequence of this isoform differs from the canonical sequence as follows:
     285-334: GPSQNWNQGYSNYWNQGYGNYGYNSQGYGGYGGYDYTGYNNYYGYGDYSN → D

Show »
Length:306
Mass (Da):32,835
Checksum:iABCDD6ACF812F647
GO
Isoform 4 (identifier: Q14103-4) [UniParc]FASTAAdd to basket
Also known as: p37

The sequence of this isoform differs from the canonical sequence as follows:
     79-97: Missing.
     285-334: GPSQNWNQGYSNYWNQGYGNYGYNSQGYGGYGGYDYTGYNNYYGYGDYSN → D

Show »
Length:287
Mass (Da):30,672
Checksum:i98DF6E78EAF3BBC1
GO

Sequence cautioni

The sequence AAA35781 differs from that shown.Contaminating sequence. Sequence of unknown origin in the N-terminal part.Curated
The sequence AAA35781 differs from that shown. Reason: Frameshift at positions 45, 59 and 355. Curated
The sequence CAA27544 differs from that shown.Several sequence conflicts.Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti150 – 1501S → R AA sequence (PubMed:8321232).Curated
Sequence conflicti225 – 2251F → L in AAA35781 (PubMed:1433497).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei79 – 9719Missing in isoform 2 and isoform 4. 2 PublicationsVSP_005834Add
BLAST
Alternative sequencei285 – 33450GPSQN…GDYSN → D in isoform 3 and isoform 4. 4 PublicationsVSP_005835Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
D55671 mRNA. Translation: BAA09522.1.
D55672 mRNA. Translation: BAA09523.1.
D55673 mRNA. Translation: BAA09524.1.
D55674 mRNA. Translation: BAA09525.1.
AF026126 Genomic DNA. Translation: AAC23474.1.
AF026126 Genomic DNA. Translation: AAC23475.1.
AF026126 Genomic DNA. Translation: AAC23476.1.
AK292707 mRNA. Translation: BAF85396.1.
AC124016 Genomic DNA. Translation: AAY40913.1.
CH471057 Genomic DNA. Translation: EAX05874.1.
BC002401 mRNA. Translation: AAH02401.1.
BC023977 mRNA. Translation: AAH23977.1.
BC026015 mRNA. Translation: AAH26015.1.
X03910 mRNA. Translation: CAA27544.1. Sequence problems.
AF039575 mRNA. Translation: AAB96683.1.
M94630 mRNA. Translation: AAA35781.1. Sequence problems.
CCDSiCCDS3590.1. [Q14103-3]
CCDS3591.1. [Q14103-2]
CCDS3592.1. [Q14103-1]
PIRiA24016.
A44192.
B48138.
RefSeqiNP_001003810.1. NM_001003810.1. [Q14103-4]
NP_002129.2. NM_002138.3. [Q14103-3]
NP_112737.1. NM_031369.2. [Q14103-2]
NP_112738.1. NM_031370.2. [Q14103-1]
UniGeneiHs.480073.

Genome annotation databases

EnsembliENST00000313899; ENSP00000313199; ENSG00000138668. [Q14103-1]
ENST00000352301; ENSP00000305860; ENSG00000138668. [Q14103-2]
ENST00000353341; ENSP00000313327; ENSG00000138668. [Q14103-3]
GeneIDi3184.
KEGGihsa:3184.
UCSCiuc003hmm.2. human. [Q14103-1]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Web resourcesi

Atlas of Genetics and Cytogenetics in Oncology and Haematology

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
D55671 mRNA. Translation: BAA09522.1.
D55672 mRNA. Translation: BAA09523.1.
D55673 mRNA. Translation: BAA09524.1.
D55674 mRNA. Translation: BAA09525.1.
AF026126 Genomic DNA. Translation: AAC23474.1.
AF026126 Genomic DNA. Translation: AAC23475.1.
AF026126 Genomic DNA. Translation: AAC23476.1.
AK292707 mRNA. Translation: BAF85396.1.
AC124016 Genomic DNA. Translation: AAY40913.1.
CH471057 Genomic DNA. Translation: EAX05874.1.
BC002401 mRNA. Translation: AAH02401.1.
BC023977 mRNA. Translation: AAH23977.1.
BC026015 mRNA. Translation: AAH26015.1.
X03910 mRNA. Translation: CAA27544.1. Sequence problems.
AF039575 mRNA. Translation: AAB96683.1.
M94630 mRNA. Translation: AAA35781.1. Sequence problems.
CCDSiCCDS3590.1. [Q14103-3]
CCDS3591.1. [Q14103-2]
CCDS3592.1. [Q14103-1]
PIRiA24016.
A44192.
B48138.
RefSeqiNP_001003810.1. NM_001003810.1. [Q14103-4]
NP_002129.2. NM_002138.3. [Q14103-3]
NP_112737.1. NM_031369.2. [Q14103-2]
NP_112738.1. NM_031370.2. [Q14103-1]
UniGeneiHs.480073.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
EntryMethodResolution (Å)ChainPositionsPDBsum
1HD0NMR-A98-172[»]
1HD1NMR-A98-172[»]
1IQTNMR-A183-257[»]
1WTBNMR-A181-259[»]
1X0FNMR-A181-259[»]
2Z5NX-ray3.20B332-355[»]
ProteinModelPortaliQ14103.
SMRiQ14103. Positions 98-259.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi109425. 275 interactions.
DIPiDIP-31163N.
IntActiQ14103. 141 interactions.
MINTiMINT-5001251.
STRINGi9606.ENSP00000313199.

PTM databases

iPTMnetiQ14103.
PhosphoSiteiQ14103.
SwissPalmiQ14103.

Polymorphism and mutation databases

BioMutaiHNRNPD.
DMDMi13124489.

2D gel databases

SWISS-2DPAGEQ14103.

Proteomic databases

EPDiQ14103.
MaxQBiQ14103.
PaxDbiQ14103.
PeptideAtlasiQ14103.
PRIDEiQ14103.
TopDownProteomicsiQ14103-1. [Q14103-1]
Q14103-2. [Q14103-2]
Q14103-3. [Q14103-3]

Protocols and materials databases

DNASUi3184.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000313899; ENSP00000313199; ENSG00000138668. [Q14103-1]
ENST00000352301; ENSP00000305860; ENSG00000138668. [Q14103-2]
ENST00000353341; ENSP00000313327; ENSG00000138668. [Q14103-3]
GeneIDi3184.
KEGGihsa:3184.
UCSCiuc003hmm.2. human. [Q14103-1]

Organism-specific databases

CTDi3184.
GeneCardsiHNRNPD.
HGNCiHGNC:5036. HNRNPD.
HPAiHPA004911.
MIMi601324. gene.
neXtProtiNX_Q14103.
PharmGKBiPA29361.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG0118. Eukaryota.
COG0724. LUCA.
GeneTreeiENSGT00760000118873.
HOVERGENiHBG002295.
InParanoidiQ14103.
KOiK13044.
OMAiTFKDEEP.
OrthoDBiEOG091G1CPI.
PhylomeDBiQ14103.
TreeFamiTF314808.

Enzyme and pathway databases

ReactomeiR-HSA-450408. AUF1 (hnRNP D0) binds and destabilizes mRNA.
R-HSA-72163. mRNA Splicing - Major Pathway.
R-HSA-72203. Processing of Capped Intron-Containing Pre-mRNA.
SIGNORiQ14103.

Miscellaneous databases

ChiTaRSiHNRNPD. human.
EvolutionaryTraceiQ14103.
GeneWikiiHNRPD.
GenomeRNAii3184.
PMAP-CutDBQ14103.
PROiQ14103.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000138668.
ExpressionAtlasiQ14103. baseline and differential.
GenevisibleiQ14103. HS.

Family and domain databases

Gene3Di3.30.70.330. 2 hits.
InterProiIPR012956. CARG-binding_factor_N.
IPR012677. Nucleotide-bd_a/b_plait.
IPR000504. RRM_dom.
[Graphical view]
PfamiPF08143. CBFNT. 1 hit.
PF00076. RRM_1. 2 hits.
[Graphical view]
SMARTiSM00360. RRM. 2 hits.
[Graphical view]
SUPFAMiSSF54928. SSF54928. 2 hits.
PROSITEiPS50102. RRM. 2 hits.
[Graphical view]
ProtoNetiSearch...

Entry informationi

Entry nameiHNRPD_HUMAN
AccessioniPrimary (citable) accession number: Q14103
Secondary accession number(s): A8K9J2
, P07029, Q01858, Q14100, Q14101, Q14102, Q4W5A1, Q9UCE8, Q9UCE9
Entry historyi
Integrated into UniProtKB/Swiss-Prot: February 21, 2001
Last sequence update: November 1, 1996
Last modified: September 7, 2016
This is version 193 of the entry and version 1 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Direct protein sequencing, Reference proteome

Documents

  1. Human chromosome 4
    Human chromosome 4: entries, gene names and cross-references to MIM
  2. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  3. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  4. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.