Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Trichohyalin

Gene

TCHH

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 5 out of 5-Experimental evidence at protein leveli

Functioni

Intermediate filament-associated protein that associates in regular arrays with keratin intermediate filaments (KIF) of the inner root sheath cells of the hair follicle and the granular layer of the epidermis. It later becomes cross-linked to KIF by isodipeptide bonds. It may serve as scaffold protein, together with involucrin, in the organization of the cell envelope or even anchor the cell envelope to the KIF network. It may be involved in its own calcium-dependent postsynthetic processing during terminal differentiation.

Regions

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Calcium bindingi22 – 33121; low affinityPROSITE-ProRule annotationAdd
BLAST
Calcium bindingi62 – 73122; high affinityPROSITE-ProRule annotationAdd
BLAST

GO - Molecular functioni

  • calcium ion binding Source: UniProtKB

GO - Biological processi

Complete GO annotation...

Keywords - Biological processi

Keratinization

Keywords - Ligandi

Calcium, Metal-binding

Names & Taxonomyi

Protein namesi
Recommended name:
Trichohyalin
Gene namesi
Name:TCHH
Synonyms:THH, THL, TRHY
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 1

Organism-specific databases

HGNCiHGNC:11791. TCHH.

Subcellular locationi

GO - Cellular componenti

  • cytoskeleton Source: UniProtKB
Complete GO annotation...

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA36503.

Polymorphism and mutation databases

DMDMi215273930.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 19431943TrichohyalinPRO_0000144042Add
BLAST

Post-translational modificationi

Substrate of transglutaminase. Some 200 arginines are probably converted to citrullines by peptidylarginine deimidase.

Keywords - PTMi

Citrullination

Proteomic databases

MaxQBiQ07283.
PaxDbiQ07283.
PRIDEiQ07283.

PTM databases

iPTMnetiQ07283.

Expressioni

Tissue specificityi

Found in the hard keratinizing tissues such as the inner root sheath (IRS) of hair follicles and medulla, and in the filiform papillae of dorsal tongue epithelium.

Developmental stagei

Expressed during late differentiation of the epidermis.

Gene expression databases

BgeeiQ07283.
CleanExiHS_TCHH.
GenevisibleiQ07283. HS.

Organism-specific databases

HPAiHPA028375.

Interactioni

Subunit structurei

Monomer.Curated

Protein-protein interaction databases

BioGridi112919. 6 interactions.
STRINGi9606.ENSP00000357794.

Structurei

3D structure databases

ProteinModelPortaliQ07283.
SMRiQ07283. Positions 1-86.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini23 – 4826EF-hand 1PROSITE-ProRule annotationAdd
BLAST
Domaini49 – 8436EF-hand 2PROSITE-ProRule annotationAdd
BLAST
Repeati314 – 326131-1; approximateAdd
BLAST
Repeati327 – 339131-2; approximateAdd
BLAST
Repeati340 – 351121-3; approximateAdd
BLAST
Repeati352 – 364131-4Add
BLAST
Repeati365 – 377131-5Add
BLAST
Repeati378 – 38362-1
Repeati384 – 38962-2
Repeati390 – 39562-3
Repeati396 – 40162-4
Repeati402 – 40762-5
Repeati408 – 41362-6
Repeati414 – 41962-7
Repeati420 – 42562-8
Repeati906 – 935304-1Add
BLAST
Repeati936 – 965304-2Add
BLAST
Repeati966 – 995304-3Add
BLAST
Repeati996 – 1025304-4Add
BLAST
Repeati1026 – 1055304-5Add
BLAST
Repeati1056 – 1085304-6Add
BLAST
Repeati1086 – 1115304-7Add
BLAST
Repeati1116 – 1145304-8Add
BLAST
Repeati1146 – 1175304-9Add
BLAST
Repeati1176 – 1204294-10Add
BLAST

Region

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Regioni1 – 9191S-100-likeAdd
BLAST
Regioni314 – 377645 X 13 AA tandem repeats of R-R-E-Q-E-E-E-R-R-E-Q-Q-LAdd
BLAST
Regioni378 – 425488 X 6 AA tandem repeats of R-R-E-Q-Q-LAdd
BLAST
Regioni425 – 6832599 X 28 AA approximate tandem repeatsAdd
BLAST
Regioni906 – 120429910 X 30 AA tandem repeatsAdd
BLAST
Regioni1292 – 189460323 X 26 AA approximate tandem repeatsAdd
BLAST

Domaini

Consists of nine domains. Domain 1 contains two EF-hand calcium-binding domains. Domains 2-4, 6, and 8 are almost entirely alpha-helical, configured as a series of peptide repeats of varying regularity, and are thought to form a single-stranded alpha-helical rod stabilized by ionic interactions. Domain 6 is the most regular and may bind KIF directly by ionic interactions. Domains 5 and 7 are less well organized and may induce folds in the molecule. Domain 9 contains the C-terminus, conserved among different species.

Sequence similaritiesi

Belongs to the S100-fused protein family.Curated
Contains 2 EF-hand domains.PROSITE-ProRule annotation

Keywords - Domaini

Repeat

Phylogenomic databases

eggNOGiENOG410KDDK. Eukaryota.
ENOG4111C3S. LUCA.
GeneTreeiENSGT00530000063634.
InParanoidiQ07283.
KOiK18626.
OMAiQYREAEQ.
OrthoDBiEOG7QC7VH.
TreeFamiTF344077.

Family and domain databases

Gene3Di1.10.238.10. 1 hit.
InterProiIPR011992. EF-hand-dom_pair.
IPR018247. EF_Hand_1_Ca_BS.
IPR002048. EF_hand_dom.
IPR001751. S100/CaBP-9k_CS.
IPR013787. S100_Ca-bd_sub.
IPR033200. TCHH.
[Graphical view]
PANTHERiPTHR34855:SF1. PTHR34855:SF1. 3 hits.
PfamiPF01023. S_100. 1 hit.
[Graphical view]
SMARTiSM01394. S_100. 1 hit.
[Graphical view]
SUPFAMiSSF47473. SSF47473. 1 hit.
PROSITEiPS00018. EF_HAND_1. 1 hit.
PS50222. EF_HAND_2. 1 hit.
PS00303. S100_CABP. 1 hit.
[Graphical view]

Sequencei

Sequence statusi: Complete.

Q07283-1 [UniParc]FASTAAdd to basket

« Hide

        10         20         30         40         50
MSPLLRSICD ITEIFNQYVS HDCDGAALTK KDLKNLLERE FGAVLRRPHD
60 70 80 90 100
PKTVDLILEL LDLDSNGRVD FNEFLLFIFK VAQACYYALG QATGLDEEKR
110 120 130 140 150
ARCDGKESLL QDRRQEEDQR RFEPRDRQLE EEPGQRRRQK RQEQERELAE
160 170 180 190 200
GEEQSEKQER LEQRDRQRRD EELWRQRQEW QEREERRAEE EQLQSCKGHE
210 220 230 240 250
TEEFPDEEQL RRRELLELRR KGREEKQQQR RERQDRVFQE EEEKEWRKRE
260 270 280 290 300
TVLRKEEEKL QEEEPQRQRE LQEEEEQLRK LERQELRRER QEEEQQQQRL
310 320 330 340 350
RREQQLRRKQ EEERREQQEE RREQQERREQ QEERREQQLR REQEERREQQ
360 370 380 390 400
LRREQEEERR EQQLRREQEE ERREQQLRRE QQLRREQQLR REQQLRREQQ
410 420 430 440 450
LRREQQLRRE QQLRREQQLR REQQLRREQE EERHEQKHEQ ERREQRLKRE
460 470 480 490 500
QEERRDWLKR EEETERHEQE RRKQQLKRDQ EEERRERWLK LEEEERREQQ
510 520 530 540 550
ERREQQLRRE QEERREQRLK RQEEEERLQQ RLRSEQQLRR EQEERREQLL
560 570 580 590 600
KREEEKRLEQ ERREQRLKRE QEERRDQLLK REEERRQQRL KREQEERLEQ
610 620 630 640 650
RLKREEVERL EQEERREQRL KREEPEEERR QQLLKSEEQE ERRQQQLRRE
660 670 680 690 700
QQERREQRLK REEEEERLEQ RLKREHEEER REQELAEEEQ EQARERIKSR
710 720 730 740 750
IPKWQWQLES EADARQSKVY SRPRKQEGQR RRQEQEEKRR RRESELQWQE
760 770 780 790 800
EERAHRQQQE EEQRRDFTWQ WQAEEKSERG RQRLSARPPL REQRERQLRA
810 820 830 840 850
EERQQREQRF LPEEEEKEQR RRQRREREKE LQFLEEEEQL QRRERAQQLQ
860 870 880 890 900
EEEDGLQEDQ ERRRSQEQRR DQKWRWQLEE ERKRRRHTLY AKPALQEQLR
910 920 930 940 950
KEQQLLQEEE EELQREEREK RRRQEQERQY REEEQLQQEE EQLLREEREK
960 970 980 990 1000
RRRQERERQY RKDKKLQQKE EQLLGEEPEK RRRQEREKKY REEEELQQEE
1010 1020 1030 1040 1050
EQLLREEREK RRRQEWERQY RKKDELQQEE EQLLREEREK RRLQERERQY
1060 1070 1080 1090 1100
REEEELQQEE EQLLGEERET RRRQELERQY RKEEELQQEE EQLLREEPEK
1110 1120 1130 1140 1150
RRRQERERQC REEEELQQEE EQLLREEREK RRRQELERQY REEEEVQQEE
1160 1170 1180 1190 1200
EQLLREEPEK RRRQELERQY REEEELQQEE EQLLREEQEK RRQERERQYR
1210 1220 1230 1240 1250
EEEELQRQKR KQRYRDEDQR SDLKWQWEPE KENAVRDNKV YCKGRENEQF
1260 1270 1280 1290 1300
RQLEDSQLRD RQSQQDLQHL LGEQQERDRE QERRRWQQRD RHFPEEEQLE
1310 1320 1330 1340 1350
REEQKEAKRR DRKSQEEKQL LREEREEKRR RQETDRKFRE EEQLLQEREE
1360 1370 1380 1390 1400
QPLRRQERDR KFREEELRHQ EQGRKFLEEE QRLRRQERER KFLKEEQQLR
1410 1420 1430 1440 1450
CQEREQQLRQ DRDRKFREEE QQLSRQERDR KFREEEQQVR RQERERKFLE
1460 1470 1480 1490 1500
EEQQLRQERH RKFREEEQLL QEREEQQLHR QERDRKFLEE EQQLRRQERD
1510 1520 1530 1540 1550
RKFREQELRS QEPERKFLEE EQQLHRQQRQ RKFLQEEQQL RRQERGQQRR
1560 1570 1580 1590 1600
QDRDRKFREE EQLRQEREEQ QLSRQERDRK FRLEEQKVRR QEQERKFMED
1610 1620 1630 1640 1650
EQQLRRQEGQ QQLRQERDRK FREDEQLLQE REEQQLHRQE RDRKFLEEEP
1660 1670 1680 1690 1700
QLRRQEREQQ LRHDRDRKFR EEEQLLQEGE EQQLRRQERD RKFREEEQQL
1710 1720 1730 1740 1750
RRQERERKFL QEEQQLRRQE LERKFREEEQ LRQETEQEQL RRQERYRKIL
1760 1770 1780 1790 1800
EEEQLRPERE EQQLRRQERD RKFREEEQLR QEREEQQLRS QESDRKFREE
1810 1820 1830 1840 1850
EQLRQEREEQ QLRPQQRDGK YRWEEEQLQL EEQEQRLRQE RDRQYRAEEQ
1860 1870 1880 1890 1900
FATQEKSRRE EQELWQEEEQ KRRQERERKL REEHIRRQQK EEQRHRQVGE
1910 1920 1930 1940
IKSQEGKGHG RLLEPGTHQF ASVPVRSSPL YEYIQEQRSQ YRP
Length:1,943
Mass (Da):253,925
Last modified:November 25, 2008 - v2
Checksum:i52FA297BD7AE3E53
GO

Sequence cautioni

The sequence AAA65582.1 differs from that shown. Reason: Erroneous gene model prediction. Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti115 – 1151Q → T in AAA65582 (PubMed:7685034).Curated
Sequence conflicti546 – 5461R → L in AAA65582 (PubMed:7685034).Curated
Sequence conflicti617 – 6182EQ → DE in AAA65582 (PubMed:7685034).Curated
Sequence conflicti631 – 6322QQ → HE in AAA65582 (PubMed:7685034).Curated
Sequence conflicti644 – 6452QQ → HE in AAA65582 (PubMed:7685034).Curated
Sequence conflicti821 – 8211R → G in AAA65582 (PubMed:7685034).Curated
Sequence conflicti865 – 8651Missing in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1289 – 12902RD → AN in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1354 – 13541R → L in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1368 – 13681R → L in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1385 – 13862RQ → E in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1401 – 14022CQ → LE in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1406 – 14061Missing in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1617 – 16171Missing in AAA65582 (PubMed:7685034).Curated
Sequence conflicti1782 – 17821E → G in AAA65582 (PubMed:7685034).Curated

Natural variant

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Natural varianti63 – 631L → R.1 Publication
Corresponds to variant rs2515663 [ dbSNP | Ensembl ].
VAR_047519
Natural varianti237 – 2371V → L.
Corresponds to variant rs3134814 [ dbSNP | Ensembl ].
VAR_047520
Natural varianti552 – 5521R → S.
Corresponds to variant rs6680692 [ dbSNP | Ensembl ].
VAR_047521
Natural varianti790 – 7901L → M.
Corresponds to variant rs11803731 [ dbSNP | Ensembl ].
VAR_047522
Natural varianti1258 – 12581L → V.1 Publication
Corresponds to variant rs2496253 [ dbSNP | Ensembl ].
VAR_047523
Natural varianti1400 – 14001R → P Found in a renal cell carcinoma sample; somatic mutation. 1 Publication
VAR_064757
Natural varianti1902 – 19021K → Q.1 Publication
Corresponds to variant rs1131471 [ dbSNP | Ensembl ].
VAR_047524

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
L09190 Genomic DNA. Translation: AAA65582.1. Sequence problems.
AL589986 Genomic DNA. Translation: CAH70024.1.
CCDSiCCDS41396.1.
PIRiA45973.
RefSeqiNP_009044.2. NM_007113.3.
UniGeneiHs.432416.

Genome annotation databases

EnsembliENST00000368804; ENSP00000357794; ENSG00000159450.
ENST00000614923; ENSP00000480484; ENSG00000159450.
GeneIDi7062.
KEGGihsa:7062.
UCSCiuc001ezp.3. human.

Keywords - Coding sequence diversityi

Polymorphism

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
L09190 Genomic DNA. Translation: AAA65582.1. Sequence problems.
AL589986 Genomic DNA. Translation: CAH70024.1.
CCDSiCCDS41396.1.
PIRiA45973.
RefSeqiNP_009044.2. NM_007113.3.
UniGeneiHs.432416.

3D structure databases

ProteinModelPortaliQ07283.
SMRiQ07283. Positions 1-86.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi112919. 6 interactions.
STRINGi9606.ENSP00000357794.

PTM databases

iPTMnetiQ07283.

Polymorphism and mutation databases

DMDMi215273930.

Proteomic databases

MaxQBiQ07283.
PaxDbiQ07283.
PRIDEiQ07283.

Protocols and materials databases

Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000368804; ENSP00000357794; ENSG00000159450.
ENST00000614923; ENSP00000480484; ENSG00000159450.
GeneIDi7062.
KEGGihsa:7062.
UCSCiuc001ezp.3. human.

Organism-specific databases

CTDi7062.
GeneCardsiTCHH.
HGNCiHGNC:11791. TCHH.
HPAiHPA028375.
MIMi190370. gene.
neXtProtiNX_Q07283.
PharmGKBiPA36503.
GenAtlasiSearch...

Phylogenomic databases

eggNOGiENOG410KDDK. Eukaryota.
ENOG4111C3S. LUCA.
GeneTreeiENSGT00530000063634.
InParanoidiQ07283.
KOiK18626.
OMAiQYREAEQ.
OrthoDBiEOG7QC7VH.
TreeFamiTF344077.

Miscellaneous databases

GeneWikiiTCHH.
GenomeRNAii7062.
NextBioi27613.
PROiQ07283.
SOURCEiSearch...

Gene expression databases

BgeeiQ07283.
CleanExiHS_TCHH.
GenevisibleiQ07283. HS.

Family and domain databases

Gene3Di1.10.238.10. 1 hit.
InterProiIPR011992. EF-hand-dom_pair.
IPR018247. EF_Hand_1_Ca_BS.
IPR002048. EF_hand_dom.
IPR001751. S100/CaBP-9k_CS.
IPR013787. S100_Ca-bd_sub.
IPR033200. TCHH.
[Graphical view]
PANTHERiPTHR34855:SF1. PTHR34855:SF1. 3 hits.
PfamiPF01023. S_100. 1 hit.
[Graphical view]
SMARTiSM01394. S_100. 1 hit.
[Graphical view]
SUPFAMiSSF47473. SSF47473. 1 hit.
PROSITEiPS00018. EF_HAND_1. 1 hit.
PS50222. EF_HAND_2. 1 hit.
PS00303. S100_CABP. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "The structure of human trichohyalin. Potential multiple roles as a functional EF-hand-like calcium-binding protein, a cornified cell envelope precursor, and an intermediate filament-associated (cross-linking) protein."
    Lee S.-C., Kim I.-G., Marekov L.N., O'Keefe E.J., Parry D.A.D., Steinert P.M.
    J. Biol. Chem. 268:12164-12176(1993) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [GENOMIC DNA], VARIANTS ARG-63; VAL-1258 AND GLN-1902.
  2. "The DNA sequence and biological annotation of human chromosome 1."
    Gregory S.G., Barlow K.F., McLay K.E., Kaul R., Swarbreck D., Dunham A., Scott C.E., Howe K.L., Woodfine K., Spencer C.C.A., Jones M.C., Gillson C., Searle S., Zhou Y., Kokocinski F., McDonald L., Evans R., Phillips K.
    , Atkinson A., Cooper R., Jones C., Hall R.E., Andrews T.D., Lloyd C., Ainscough R., Almeida J.P., Ambrose K.D., Anderson F., Andrew R.W., Ashwell R.I.S., Aubin K., Babbage A.K., Bagguley C.L., Bailey J., Beasley H., Bethel G., Bird C.P., Bray-Allen S., Brown J.Y., Brown A.J., Buckley D., Burton J., Bye J., Carder C., Chapman J.C., Clark S.Y., Clarke G., Clee C., Cobley V., Collier R.E., Corby N., Coville G.J., Davies J., Deadman R., Dunn M., Earthrowl M., Ellington A.G., Errington H., Frankish A., Frankland J., French L., Garner P., Garnett J., Gay L., Ghori M.R.J., Gibson R., Gilby L.M., Gillett W., Glithero R.J., Grafham D.V., Griffiths C., Griffiths-Jones S., Grocock R., Hammond S., Harrison E.S.I., Hart E., Haugen E., Heath P.D., Holmes S., Holt K., Howden P.J., Hunt A.R., Hunt S.E., Hunter G., Isherwood J., James R., Johnson C., Johnson D., Joy A., Kay M., Kershaw J.K., Kibukawa M., Kimberley A.M., King A., Knights A.J., Lad H., Laird G., Lawlor S., Leongamornlert D.A., Lloyd D.M., Loveland J., Lovell J., Lush M.J., Lyne R., Martin S., Mashreghi-Mohammadi M., Matthews L., Matthews N.S.W., McLaren S., Milne S., Mistry S., Moore M.J.F., Nickerson T., O'Dell C.N., Oliver K., Palmeiri A., Palmer S.A., Parker A., Patel D., Pearce A.V., Peck A.I., Pelan S., Phelps K., Phillimore B.J., Plumb R., Rajan J., Raymond C., Rouse G., Saenphimmachak C., Sehra H.K., Sheridan E., Shownkeen R., Sims S., Skuce C.D., Smith M., Steward C., Subramanian S., Sycamore N., Tracey A., Tromans A., Van Helmond Z., Wall M., Wallis J.M., White S., Whitehead S.L., Wilkinson J.E., Willey D.L., Williams H., Wilming L., Wray P.W., Wu Z., Coulson A., Vaudin M., Sulston J.E., Durbin R.M., Hubbard T., Wooster R., Dunham I., Carter N.P., McVean G., Ross M.T., Harrow J., Olson M.V., Beck S., Rogers J., Bentley D.R.
    Nature 441:315-321(2006) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  3. "Trichohyalin: a structural protein of hair, tongue, nail, and epidermis."
    O'Keefe E.J., Hamilton E.H., Lee S.-C., Steinert P.M.
    J. Invest. Dermatol. 101:65S-71S(1993) [PubMed] [Europe PMC] [Abstract]
    Cited for: PRELIMINARY NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 1776-1943, CHARACTERIZATION.
  4. Cited for: VARIANT PRO-1400.

Entry informationi

Entry nameiTRHY_HUMAN
AccessioniPrimary (citable) accession number: Q07283
Secondary accession number(s): Q5VUI3
Entry historyi
Integrated into UniProtKB/Swiss-Prot: October 1, 1994
Last sequence update: November 25, 2008
Last modified: May 11, 2016
This is version 137 of the entry and version 2 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 1
    Human chromosome 1: entries, gene names and cross-references to MIM
  2. Human entries with polymorphisms or disease mutations
    List of human entries with polymorphisms or disease mutations
  3. Human polymorphisms and disease mutations
    Index of human polymorphisms and disease mutations
  4. MIM cross-references
    Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot
  5. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.