Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

DNA excision repair protein ERCC-1

Gene

ERCC1

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Isoform 1: Non-catalytic component of a structure-specific DNA repair endonuclease responsible for the 5'-incision during DNA repair. Responsible, in conjunction with SLX4, for the first step in the repair of interstrand cross-links (ICL). Participates in the processing of anaphase bridge-generating DNA structures, which consist in incompletely processed DNA lesions arising during S or G2 phase, and can result in cytokinesis failure. Also required for homology-directed repair (HDR) of DNA double-strand breaks, in conjunction with SLX4.

Regions

Feature keyPosition(s)DescriptionActionsGraphical viewLength
DNA bindingi134 – 156Sequence analysisAdd BLAST23

GO - Molecular functioni

  • damaged DNA binding Source: UniProtKB
  • endonuclease activity Source: UniProtKB-KW
  • protein C-terminus binding Source: UniProtKB
  • protein domain specific binding Source: UniProtKB
  • single-stranded DNA binding Source: UniProtKB
  • TFIID-class transcription factor binding Source: Ensembl

GO - Biological processi

  • cell proliferation Source: Ensembl
  • DNA recombination Source: MGI
  • DNA repair Source: MGI
  • double-strand break repair via nonhomologous end joining Source: BHF-UCL
  • embryonic organ development Source: Ensembl
  • global genome nucleotide-excision repair Source: Reactome
  • interstrand cross-link repair Source: Reactome
  • isotype switching Source: Ensembl
  • male gonad development Source: Ensembl
  • meiotic mismatch repair Source: GO_Central
  • mitotic recombination Source: UniProtKB
  • multicellular organism aging Source: Ensembl
  • multicellular organism growth Source: Ensembl
  • negative regulation of protection from non-homologous end joining at telomere Source: BHF-UCL
  • negative regulation of telomere maintenance Source: UniProtKB
  • nucleotide-excision repair Source: MGI
  • nucleotide-excision repair, DNA incision Source: Reactome
  • nucleotide-excision repair, DNA incision, 3'-to lesion Source: UniProtKB
  • nucleotide-excision repair, DNA incision, 5'-to lesion Source: UniProtKB
  • nucleotide-excision repair, preincision complex stabilization Source: Reactome
  • oogenesis Source: Ensembl
  • positive regulation of t-circle formation Source: BHF-UCL
  • post-embryonic hemopoiesis Source: Ensembl
  • pyrimidine dimer repair by nucleotide-excision repair Source: Ensembl
  • replicative cell aging Source: Ensembl
  • response to nutrient Source: Ensembl
  • response to oxidative stress Source: UniProtKB
  • response to sucrose Source: Ensembl
  • response to X-ray Source: Ensembl
  • spermatogenesis Source: Ensembl
  • syncytium formation Source: Ensembl
  • t-circle formation Source: BHF-UCL
  • telomeric DNA-containing double minutes formation Source: BHF-UCL
  • transcription-coupled nucleotide-excision repair Source: Reactome
  • UV-damage excision repair Source: GO_Central
  • UV protection Source: Ensembl

Keywordsi

Molecular functionDNA-binding, Endonuclease, Hydrolase, Nuclease
Biological processDNA damage, DNA repair

Enzyme and pathway databases

ReactomeiR-HSA-5685938. HDR through Single Strand Annealing (SSA).
R-HSA-5696395. Formation of Incision Complex in GG-NER.
R-HSA-5696400. Dual Incision in GG-NER.
R-HSA-6782135. Dual incision in TC-NER.
R-HSA-6783310. Fanconi Anemia Pathway.
SIGNORiP07992.

Names & Taxonomyi

Protein namesi
Recommended name:
DNA excision repair protein ERCC-1
Gene namesi
Name:ERCC1
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 19

Organism-specific databases

HGNCiHGNC:3433. ERCC1.

Subcellular locationi

GO - Cellular componenti

  • cytosol Source: HPA
  • ERCC4-ERCC1 complex Source: BHF-UCL
  • nuclear chromosome, telomeric region Source: UniProtKB
  • nucleoplasm Source: HPA
  • nucleotide-excision repair complex Source: UniProtKB
  • nucleotide-excision repair factor 1 complex Source: UniProtKB
  • transcription factor TFIID complex Source: Ensembl

Keywords - Cellular componenti

Cytoplasm, Nucleus

Pathology & Biotechi

Involvement in diseasei

Cerebro-oculo-facio-skeletal syndrome 4 (COFS4)2 Publications
The disease is caused by mutations affecting the gene represented in this entry.
Disease descriptionA disorder of prenatal onset characterized by microcephaly, congenital cataracts, facial dysmorphism, neurogenic arthrogryposis, growth failure and severe psychomotor retardation. COFS is considered to be part of the nucleotide-excision repair disorders spectrum that include also xeroderma pigmentosum, trichothiodystrophy and Cockayne syndrome.
See also OMIM:610758
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_032776231F → L in COFS4; does not alter interaction with XPF/ERCC4 or GTF2H1. 2 PublicationsCorresponds to variant dbSNP:rs121913028Ensembl.1

Keywords - Diseasei

Cataract

Organism-specific databases

DisGeNETi2067.
MalaCardsiERCC1.
MIMi610758. phenotype.
OpenTargetsiENSG00000012061.
Orphaneti90322. Cockayne syndrome type 2.
1466. COFS syndrome.
PharmGKBiPA155.

Polymorphism and mutation databases

BioMutaiERCC1.
DMDMi119538.

PTM / Processingi

Molecule processing

Feature keyPosition(s)DescriptionActionsGraphical viewLength
ChainiPRO_00000870061 – 297DNA excision repair protein ERCC-1Add BLAST297

Amino acid modifications

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Modified residuei1N-acetylmethionineCombined sources1

Keywords - PTMi

Acetylation

Proteomic databases

EPDiP07992.
MaxQBiP07992.
PaxDbiP07992.
PeptideAtlasiP07992.
PRIDEiP07992.

PTM databases

iPTMnetiP07992.
PhosphoSitePlusiP07992.

Expressioni

Gene expression databases

BgeeiENSG00000012061.
CleanExiHS_ERCC1.
ExpressionAtlasiP07992. baseline and differential.
GenevisibleiP07992. HS.

Organism-specific databases

HPAiCAB004390.
CAB072859.
CAB072860.
HPA029773.
HPA050182.

Interactioni

Subunit structurei

Heterodimer composed of ERCC1 isoform 1 and XPF/ERRC4.3 Publications

Binary interactionsi

Show more details

GO - Molecular functioni

  • protein C-terminus binding Source: UniProtKB
  • protein domain specific binding Source: UniProtKB
  • TFIID-class transcription factor binding Source: Ensembl

Protein-protein interaction databases

BioGridi108379. 33 interactors.
DIPiDIP-24235N.
IntActiP07992. 14 interactors.
MINTiMINT-1457664.
STRINGi9606.ENSP00000013807.

Structurei

Secondary structure

1297
Legend: HelixTurnBeta strandPDB Structure known for this area
Show more details
Feature keyPosition(s)DescriptionActionsGraphical viewLength
Beta strandi101 – 103Combined sources3
Helixi105 – 107Combined sources3
Helixi111 – 115Combined sources5
Beta strandi121 – 123Combined sources3
Beta strandi127 – 133Combined sources7
Beta strandi136 – 142Combined sources7
Helixi143 – 148Combined sources6
Helixi152 – 160Combined sources9
Beta strandi163 – 172Combined sources10
Beta strandi175 – 177Combined sources3
Helixi179 – 192Combined sources14
Beta strandi195 – 201Combined sources7
Helixi202 – 213Combined sources12
Helixi222 – 226Combined sources5
Beta strandi242 – 244Combined sources3
Helixi247 – 257Combined sources11
Helixi260 – 264Combined sources5
Helixi268 – 272Combined sources5
Beta strandi274 – 276Combined sources3
Helixi280 – 290Combined sources11
Beta strandi293 – 295Combined sources3

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1Z00NMR-A220-297[»]
2A1IX-ray1.90A96-227[»]
2A1JX-ray2.70B220-296[»]
2JNWNMR-A96-214[»]
2JPDNMR-A96-219[»]
2MUTNMR-A220-297[»]
ProteinModelPortaliP07992.
SMRiP07992.
ModBaseiSearch...
MobiDBiSearch...

Miscellaneous databases

EvolutionaryTraceiP07992.

Family & Domainsi

Region

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Regioni220 – 297HhH2, dimerization with ERCC4Add BLAST78

Motif

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Motifi17 – 23Nuclear localization signalSequence analysis7

Sequence similaritiesi

Belongs to the ERCC1/RAD10/SWI10 family.Curated

Phylogenomic databases

eggNOGiKOG2841. Eukaryota.
COG5241. LUCA.
GeneTreeiENSGT00390000011275.
HOGENOMiHOG000037440.
HOVERGENiHBG051497.
InParanoidiP07992.
KOiK10849.
OMAiSWGKERA.
OrthoDBiEOG091G0IRO.
PhylomeDBiP07992.
TreeFamiTF101231.

Family and domain databases

InterProiView protein in InterPro
IPR004579. ERCC1/RAD10/SWI10.
IPR011335. Restrct_endonuc-II-like.
IPR010994. RuvA_2-like.
PANTHERiPTHR12749. PTHR12749. 1 hit.
PfamiView protein in Pfam
PF03834. Rad10. 1 hit.
ProDomiView protein in ProDom or Entries sharing at least one domain
PD013585. DNA_repair_Rad10. 1 hit.
SUPFAMiSSF47781. SSF47781. 1 hit.
SSF52980. SSF52980. 1 hit.
TIGRFAMsiTIGR00597. rad10. 1 hit.

Sequences (4)i

Sequence statusi: Complete.

This entry describes 4 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: P07992-1) [UniParc]FASTAAdd to basket
Also known as: 202

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MDPGKDKEGV PQPSGPPARK KFVIPLDEDE VPPGVAKPLF RSTQSLPTVD
60 70 80 90 100
TSAQAAPQTY AEYAISQPLE GAGATCPTGS EPLAGETPNQ ALKPGAKSNS
110 120 130 140 150
IIVSPRQRGN PVLKFVRNVP WEFGDVIPDY VLGQSTCALF LSLRYHNLHP
160 170 180 190 200
DYIHGRLQSL GKNFALRVLL VQVDVKDPQQ ALKELAKMCI LADCTLILAW
210 220 230 240 250
SPEEAGRYLE TYKAYEQKPA DLLMEKLEQD FVSRVTECLT TVKSVNKTDS
260 270 280 290
QTLLTTFGSL EQLIAASRED LALCPGLGPQ KARRLFDVLH EPFLKVP
Length:297
Mass (Da):32,562
Last modified:August 1, 1988 - v1
Checksum:i6FCE3615732349E5
GO
Isoform 2 (identifier: P07992-2) [UniParc]FASTAAdd to basket
Also known as: 203

The sequence of this isoform differs from the canonical sequence as follows:
     235-258: Missing.

Note: Not functional in the nucleotide excision repair pathway. Does not interact with XPF/ERCC4.
Show »
Length:273
Mass (Da):29,993
Checksum:i04DA21E774A33524
GO
Isoform 3 (identifier: P07992-3) [UniParc]FASTAAdd to basket
Also known as: 201

The sequence of this isoform differs from the canonical sequence as follows:
     282-297: ARRLFDVLHEPFLKVP → VRALGKNPRSWGKERAPNKHNLRPQSFKVKKEPKTRHSGFRL

Note: Not functional in the nucleotide excision repair pathway. Does not interact with XPF/ERCC4.
Show »
Length:323
Mass (Da):35,563
Checksum:iD99BFAC9CE8E912E
GO
Isoform 4 (identifier: P07992-4) [UniParc]FASTAAdd to basket
Also known as: 204

The sequence of this isoform differs from the canonical sequence as follows:
     36-107: Missing.

Note: Not functional in the nucleotide excision repair pathway. Does not interact with XPF/ERCC4.
Show »
Length:225
Mass (Da):25,211
Checksum:iAF78F4C26AC7DA7E
GO

Experimental Info

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Sequence conflicti53A → P in BAG37398 (PubMed:14702039).Curated1

Natural variant

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Natural variantiVAR_032776231F → L in COFS4; does not alter interaction with XPF/ERCC4 or GTF2H1. 2 PublicationsCorresponds to variant dbSNP:rs121913028Ensembl.1
Natural variantiVAR_019167266A → T1 PublicationCorresponds to variant dbSNP:rs3212977Ensembl.1

Alternative sequence

Feature keyPosition(s)DescriptionActionsGraphical viewLength
Alternative sequenceiVSP_05347436 – 107Missing in isoform 4. 1 PublicationAdd BLAST72
Alternative sequenceiVSP_042727235 – 258Missing in isoform 2. 1 PublicationAdd BLAST24
Alternative sequenceiVSP_043455282 – 297ARRLF…FLKVP → VRALGKNPRSWGKERAPNKH NLRPQSFKVKKEPKTRHSGF RL in isoform 3. 1 PublicationAdd BLAST16

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M13194 mRNA. Translation: AAA52394.1.
M26163 Genomic DNA. Translation: AAA52395.1.
M28650 mRNA. Translation: AAA35810.1.
AF001925 mRNA. Translation: AAC16253.1.
AB069681 mRNA. Translation: BAB62810.1.
BT019806 mRNA. Translation: AAV38609.1.
AF512555 Genomic DNA. Translation: AAM34796.1.
AK092039 mRNA. Translation: BAG52472.1.
AK314884 mRNA. Translation: BAG37398.1.
AC092309 Genomic DNA. No translation available.
AC138128 Genomic DNA. No translation available.
AC138534 Genomic DNA. No translation available.
AC139353 Genomic DNA. No translation available.
CH471126 Genomic DNA. Translation: EAW57349.1.
BC008930 mRNA. Translation: AAH08930.1.
BC052813 mRNA. Translation: AAH52813.1.
CCDSiCCDS12662.1. [P07992-1]
CCDS12663.1. [P07992-3]
CCDS54279.1. [P07992-2]
PIRiA32875. A24781.
RefSeqiNP_001159521.1. NM_001166049.1. [P07992-2]
NP_001974.1. NM_001983.3. [P07992-1]
NP_973730.1. NM_202001.2. [P07992-3]
XP_005258691.1. XM_005258634.1. [P07992-3]
XP_005258692.1. XM_005258635.2. [P07992-3]
XP_005258693.1. XM_005258636.4. [P07992-3]
XP_011524912.1. XM_011526610.2. [P07992-3]
XP_016881948.1. XM_017026459.1. [P07992-3]
XP_016881949.1. XM_017026460.1. [P07992-1]
XP_016881950.1. XM_017026461.1. [P07992-1]
XP_016881951.1. XM_017026462.1. [P07992-1]
XP_016881952.1. XM_017026463.1. [P07992-1]
XP_016881953.1. XM_017026464.1. [P07992-1]
XP_016881954.1. XM_017026465.1. [P07992-2]
XP_016881955.1. XM_017026466.1. [P07992-2]
UniGeneiHs.435981.

Genome annotation databases

EnsembliENST00000013807; ENSP00000013807; ENSG00000012061. [P07992-3]
ENST00000300853; ENSP00000300853; ENSG00000012061. [P07992-1]
ENST00000340192; ENSP00000345203; ENSG00000012061. [P07992-2]
ENST00000423698; ENSP00000394875; ENSG00000012061. [P07992-4]
ENST00000589165; ENSP00000468035; ENSG00000012061. [P07992-1]
GeneIDi2067.
KEGGihsa:2067.
UCSCiuc002pbs.3. human. [P07992-1]

Keywords - Coding sequence diversityi

Alternative splicing, Polymorphism

Cross-referencesi

Web resourcesi

Atlas of Genetics and Cytogenetics in Oncology and Haematology
NIEHS-SNPs

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
M13194 mRNA. Translation: AAA52394.1.
M26163 Genomic DNA. Translation: AAA52395.1.
M28650 mRNA. Translation: AAA35810.1.
AF001925 mRNA. Translation: AAC16253.1.
AB069681 mRNA. Translation: BAB62810.1.
BT019806 mRNA. Translation: AAV38609.1.
AF512555 Genomic DNA. Translation: AAM34796.1.
AK092039 mRNA. Translation: BAG52472.1.
AK314884 mRNA. Translation: BAG37398.1.
AC092309 Genomic DNA. No translation available.
AC138128 Genomic DNA. No translation available.
AC138534 Genomic DNA. No translation available.
AC139353 Genomic DNA. No translation available.
CH471126 Genomic DNA. Translation: EAW57349.1.
BC008930 mRNA. Translation: AAH08930.1.
BC052813 mRNA. Translation: AAH52813.1.
CCDSiCCDS12662.1. [P07992-1]
CCDS12663.1. [P07992-3]
CCDS54279.1. [P07992-2]
PIRiA32875. A24781.
RefSeqiNP_001159521.1. NM_001166049.1. [P07992-2]
NP_001974.1. NM_001983.3. [P07992-1]
NP_973730.1. NM_202001.2. [P07992-3]
XP_005258691.1. XM_005258634.1. [P07992-3]
XP_005258692.1. XM_005258635.2. [P07992-3]
XP_005258693.1. XM_005258636.4. [P07992-3]
XP_011524912.1. XM_011526610.2. [P07992-3]
XP_016881948.1. XM_017026459.1. [P07992-3]
XP_016881949.1. XM_017026460.1. [P07992-1]
XP_016881950.1. XM_017026461.1. [P07992-1]
XP_016881951.1. XM_017026462.1. [P07992-1]
XP_016881952.1. XM_017026463.1. [P07992-1]
XP_016881953.1. XM_017026464.1. [P07992-1]
XP_016881954.1. XM_017026465.1. [P07992-2]
XP_016881955.1. XM_017026466.1. [P07992-2]
UniGeneiHs.435981.

3D structure databases

Select the link destinations:
PDBei
RCSB PDBi
PDBji
Links Updated
PDB entryMethodResolution (Å)ChainPositionsPDBsum
1Z00NMR-A220-297[»]
2A1IX-ray1.90A96-227[»]
2A1JX-ray2.70B220-296[»]
2JNWNMR-A96-214[»]
2JPDNMR-A96-219[»]
2MUTNMR-A220-297[»]
ProteinModelPortaliP07992.
SMRiP07992.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi108379. 33 interactors.
DIPiDIP-24235N.
IntActiP07992. 14 interactors.
MINTiMINT-1457664.
STRINGi9606.ENSP00000013807.

PTM databases

iPTMnetiP07992.
PhosphoSitePlusiP07992.

Polymorphism and mutation databases

BioMutaiERCC1.
DMDMi119538.

Proteomic databases

EPDiP07992.
MaxQBiP07992.
PaxDbiP07992.
PeptideAtlasiP07992.
PRIDEiP07992.

Protocols and materials databases

DNASUi2067.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000013807; ENSP00000013807; ENSG00000012061. [P07992-3]
ENST00000300853; ENSP00000300853; ENSG00000012061. [P07992-1]
ENST00000340192; ENSP00000345203; ENSG00000012061. [P07992-2]
ENST00000423698; ENSP00000394875; ENSG00000012061. [P07992-4]
ENST00000589165; ENSP00000468035; ENSG00000012061. [P07992-1]
GeneIDi2067.
KEGGihsa:2067.
UCSCiuc002pbs.3. human. [P07992-1]

Organism-specific databases

CTDi2067.
DisGeNETi2067.
GeneCardsiERCC1.
GeneReviewsiERCC1.
HGNCiHGNC:3433. ERCC1.
HPAiCAB004390.
CAB072859.
CAB072860.
HPA029773.
HPA050182.
MalaCardsiERCC1.
MIMi126380. gene.
610758. phenotype.
neXtProtiNX_P07992.
OpenTargetsiENSG00000012061.
Orphaneti90322. Cockayne syndrome type 2.
1466. COFS syndrome.
PharmGKBiPA155.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiKOG2841. Eukaryota.
COG5241. LUCA.
GeneTreeiENSGT00390000011275.
HOGENOMiHOG000037440.
HOVERGENiHBG051497.
InParanoidiP07992.
KOiK10849.
OMAiSWGKERA.
OrthoDBiEOG091G0IRO.
PhylomeDBiP07992.
TreeFamiTF101231.

Enzyme and pathway databases

ReactomeiR-HSA-5685938. HDR through Single Strand Annealing (SSA).
R-HSA-5696395. Formation of Incision Complex in GG-NER.
R-HSA-5696400. Dual Incision in GG-NER.
R-HSA-6782135. Dual incision in TC-NER.
R-HSA-6783310. Fanconi Anemia Pathway.
SIGNORiP07992.

Miscellaneous databases

ChiTaRSiERCC1. human.
EvolutionaryTraceiP07992.
GeneWikiiERCC1.
GenomeRNAii2067.
PROiPR:P07992.
SOURCEiSearch...

Gene expression databases

BgeeiENSG00000012061.
CleanExiHS_ERCC1.
ExpressionAtlasiP07992. baseline and differential.
GenevisibleiP07992. HS.

Family and domain databases

InterProiView protein in InterPro
IPR004579. ERCC1/RAD10/SWI10.
IPR011335. Restrct_endonuc-II-like.
IPR010994. RuvA_2-like.
PANTHERiPTHR12749. PTHR12749. 1 hit.
PfamiView protein in Pfam
PF03834. Rad10. 1 hit.
ProDomiView protein in ProDom or Entries sharing at least one domain
PD013585. DNA_repair_Rad10. 1 hit.
SUPFAMiSSF47781. SSF47781. 1 hit.
SSF52980. SSF52980. 1 hit.
TIGRFAMsiTIGR00597. rad10. 1 hit.
ProtoNetiSearch...

Entry informationi

Entry nameiERCC1_HUMAN
AccessioniPrimary (citable) accession number: P07992
Secondary accession number(s): B2RC01
, B3KRR0, Q7Z7F5, Q96S40
Entry historyiIntegrated into UniProtKB/Swiss-Prot: August 1, 1988
Last sequence update: August 1, 1988
Last modified: May 10, 2017
This is version 174 of the entry and version 1 of the sequence. See complete history.
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

3D-structure, Complete proteome, Reference proteome

Documents

  1. Human chromosome 19
    Human chromosome 19: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. PDB cross-references
    Index of Protein Data Bank (PDB) cross-references
  6. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.