Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.
Protein

Centrosomal protein of 170 kDa protein B

Gene

CEP170B

Organism
Homo sapiens (Human)
Status
Reviewed-Annotation score: Annotation score: 3 out of 5-Experimental evidence at protein leveli

Functioni

Plays a role in microtubule organization.By similarity

Names & Taxonomyi

Protein namesi
Recommended name:
Centrosomal protein of 170 kDa protein B
Alternative name(s):
Centrosomal protein 170B
Short name:
Cep170B
Gene namesi
Name:CEP170B
Synonyms:FAM68C, KIAA0284
OrganismiHomo sapiens (Human)
Taxonomic identifieri9606 [NCBI]
Taxonomic lineageiEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo
Proteomesi
  • UP000005640 Componenti: Chromosome 14

Organism-specific databases

HGNCiHGNC:20362. CEP170B.

Subcellular locationi

GO - Cellular componenti

Complete GO annotation...

Keywords - Cellular componenti

Cytoplasm, Cytoskeleton, Microtubule

Pathology & Biotechi

Organism-specific databases

PharmGKBiPA134863153.

Polymorphism and mutation databases

BioMutaiCEP170B.
DMDMi143342098.

PTM / Processingi

Molecule processing

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Chaini1 – 15891589Centrosomal protein of 170 kDa protein BPRO_0000282889Add
BLAST

Amino acid modifications

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Cross-linki289 – 289Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in ubiquitin)1 Publication
Modified residuei360 – 3601PhosphoserineBy similarity
Modified residuei480 – 4801PhosphoserineBy similarity
Modified residuei536 – 5361PhosphoserineBy similarity
Modified residuei542 – 5421PhosphothreonineCombined sources
Modified residuei597 – 5971PhosphoserineCombined sources
Modified residuei619 – 6191PhosphoserineCombined sources
Modified residuei655 – 6551PhosphoserineCombined sources
Modified residuei711 – 7111PhosphoserineBy similarity
Modified residuei721 – 7211PhosphoserineCombined sources
Modified residuei746 – 7461PhosphoserineCombined sources
Modified residuei748 – 7481PhosphoserineCombined sources
Modified residuei751 – 7511PhosphoserineCombined sources
Modified residuei753 – 7531PhosphoserineCombined sources
Modified residuei829 – 8291PhosphoserineCombined sources
Modified residuei853 – 8531PhosphoserineCombined sources
Modified residuei954 – 9541PhosphoserineCombined sources
Modified residuei972 – 9721PhosphoserineBy similarity
Modified residuei986 – 9861PhosphoserineCombined sources
Modified residuei988 – 9881PhosphoserineCombined sources
Modified residuei1135 – 11351PhosphoserineCombined sources
Modified residuei1179 – 11791PhosphoserineBy similarity
Modified residuei1199 – 11991PhosphoserineBy similarity
Modified residuei1304 – 13041PhosphothreonineCombined sources
Modified residuei1356 – 13561PhosphoserineBy similarity
Modified residuei1362 – 13621PhosphoserineBy similarity
Modified residuei1545 – 15451PhosphoserineCombined sources
Modified residuei1548 – 15481PhosphoserineCombined sources

Keywords - PTMi

Isopeptide bond, Phosphoprotein, Ubl conjugation

Proteomic databases

EPDiQ9Y4F5.
MaxQBiQ9Y4F5.
PaxDbiQ9Y4F5.
PRIDEiQ9Y4F5.

PTM databases

iPTMnetiQ9Y4F5.
PhosphoSiteiQ9Y4F5.

Expressioni

Gene expression databases

BgeeiQ9Y4F5.
CleanExiHS_KIAA0284.
ExpressionAtlasiQ9Y4F5. baseline and differential.
GenevisibleiQ9Y4F5. HS.

Organism-specific databases

HPAiHPA000871.
HPA059017.

Interactioni

Protein-protein interaction databases

BioGridi129628. 19 interactions.
IntActiQ9Y4F5. 14 interactions.
STRINGi9606.ENSP00000404151.

Structurei

3D structure databases

ProteinModelPortaliQ9Y4F5.
SMRiQ9Y4F5. Positions 1-120.
ModBaseiSearch...
MobiDBiSearch...

Family & Domainsi

Domains and Repeats

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Domaini23 – 7351FHAPROSITE-ProRule annotationAdd
BLAST

Sequence similaritiesi

Belongs to the CEP170 family.Curated
Contains 1 FHA domain.PROSITE-ProRule annotation

Phylogenomic databases

eggNOGiENOG410IJXX. Eukaryota.
ENOG410XQU2. LUCA.
GeneTreeiENSGT00640000091476.
HOGENOMiHOG000111524.
HOVERGENiHBG108016.
InParanoidiQ9Y4F5.
KOiK16463.
PhylomeDBiQ9Y4F5.
TreeFamiTF328469.

Family and domain databases

Gene3Di2.60.200.20. 1 hit.
InterProiIPR029300. CEP170_C.
IPR000253. FHA_dom.
IPR008984. SMAD_FHA_domain.
[Graphical view]
PfamiPF15308. CEP170_C. 1 hit.
PF00498. FHA. 1 hit.
[Graphical view]
SMARTiSM00240. FHA. 1 hit.
[Graphical view]
SUPFAMiSSF49879. SSF49879. 1 hit.
PROSITEiPS50006. FHA_DOMAIN. 1 hit.
[Graphical view]

Sequences (3)i

Sequence statusi: Complete.

This entry describes 3 isoformsi produced by alternative splicing. AlignAdd to basket

Isoform 1 (identifier: Q9Y4F5-1) [UniParc]FASTAAdd to basket

This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.

« Hide

        10         20         30         40         50
MSATSWFLVS SSGARHRLPR ELIFVGREEC ELMLQSRSVD KQHAVINYDQ
60 70 80 90 100
DRDEHWVKDL GSLNGTFVND MRIPDQKYVT LKLNDVIRFG YDSNMYVLER
110 120 130 140 150
VQHRVPEEAL KHEKYTSQLQ VSVKGLAPKR SEALPEHTPY CEASNPRPEK
160 170 180 190 200
GDRRPGTEAA SYRTPLYGQP SWWGEDDGST LPDAQRQGEP YPERPKGPVQ
210 220 230 240 250
QDGELHGFRA PAEPQGCSFR REPSYFEIPT KETPQPSQPP EVPAHEMPTK
260 270 280 290 300
DAEAGGGGAA PVVQSHASFT IEFDDCSPGK MKIKDHITKF SLRQRRPPGK
310 320 330 340 350
EATPGEMVSA ETKVADWLVQ NDPSLLHRVG PGDDRHSTKS DLPVHTRTLK
360 370 380 390 400
GHKHEDGTQS DSEDPLAKAA SAAGVPLEAS GEQVRLQRQI KRDPQELLHN
410 420 430 440 450
QQAFVIEFFD EDTPRKKRSQ SFTHSPSGDP KADKRRGPTP ADRDRPSVPA
460 470 480 490 500
PVQAGGRSSG PQRAGSLKRE KTEERLGSPS PASRTPARPF GSVGRRSRLA
510 520 530 540 550
QDFMAQCLRE SSPAARPSPE KVPPVLPAPL TPHGTSPVGP PTPPPAPTDP
560 570 580 590 600
QLTKARKQEE DDSLSDAGTY TIETEAQDTE VEEARKMIDQ VFGVLESPEL
610 620 630 640 650
SRASSATFRP VIRGDRDESD DGGVAQRMAL LQEFASRPLG AAPQAEHQGL
660 670 680 690 700
PVPGSPGGQK WVSRWASLAD SYSDPGLTED GLGRRGGEPE GSLPVRMRRR
710 720 730 740 750
LPQLPSERAD SPAGPESSRR SGPGPPELDS EQPSRLFGQE ELDPDSLSDA
760 770 780 790 800
SGSDGGRGPE PGVEPQDSRR RSPQEGPTWS RGRRSPRAPG EPTPASFFIG
810 820 830 840 850
DQNGDAVLSR KPLAAPGDGE GLGQTAQPSP PARDGVYVSA NGRMVIQLRP
860 870 880 890 900
GRSPEPDGPA PAFLRQESFT KEPASGPPAP GKPPHISSHP LLQDLAATRA
910 920 930 940 950
ARMDFHSQDT HLILKETETA LAALEARLLS NSVDAECEGG STPRPPEDAL
960 970 980 990 1000
SGDSDVDTAS TVSLRSGKSG PSPTTPQPLR AQKEMSPSPP AAQDPGGTAL
1010 1020 1030 1040 1050
VSAREQSSER QHHPLGPTDM GRGEPVRRSA IRRGHRPRGS LDWPSEERGP
1060 1070 1080 1090 1100
VLAHLPSSDV MASNHETPEA TGAGRLGSRR KPAAPPPSPA AREEQSRSSA
1110 1120 1130 1140 1150
SSQKGPQALT RSNSLSTPRP TRASRLRRAR LGDASDTEAA DGERGSLGNP
1160 1170 1180 1190 1200
EPVGRPAAEQ AKKLSRLDIL AMPRKRAGSF TGTSDPEAAP ARTSFSGRSV
1210 1220 1230 1240 1250
ELCCASRKPT MAEARAVSRK AANTATTTGP RQPFSRARSG SARYTSNTRR
1260 1270 1280 1290 1300
RQQGSDYTST SEEEYGSRHG SPKHTRSHTS TATQTPRAGS SSRARSRAPG
1310 1320 1330 1340 1350
PRDTDDDEEE PDPYGFIVQT AEIAEIARLS QTLVKDVAIL AQEIHDVAGD
1360 1370 1380 1390 1400
GDTLGSSEPA HSASLSNMPS TPASTISARE ELVQRIPEAS LNFQKVPPGS
1410 1420 1430 1440 1450
LNSRDFDQNM NDSCEDALAN KTRPRNREEV IFDNLMLNPV SQLSQAIREN
1460 1470 1480 1490 1500
TEHLAEKMKI LFQNTGRAWE DLEARINAEN EVPILKTSNK EISSILKELR
1510 1520 1530 1540 1550
RVQKQLEVIN AIVDPSGSLD LLTGNRSLAS SAQPGLGKGR VAAQSPPSPA
1560 1570 1580
SAEALLPALP LRNFPQRASC GPPSLPDPTF LPDAERFLI
Length:1,589
Mass (Da):171,688
Last modified:April 3, 2007 - v4
Checksum:iB232960C12A81244
GO
Isoform 2 (identifier: Q9Y4F5-2) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1247-1282: NTRRRQQGSDYTSTSEEEYGSRHGSPKHTRSHTSTA → T

Show »
Length:1,554
Mass (Da):167,714
Checksum:i1EBC4872698A20BF
GO
Isoform 3 (identifier: Q9Y4F5-3) [UniParc]FASTAAdd to basket

The sequence of this isoform differs from the canonical sequence as follows:
     1-70: Missing.

Note: No experimental confirmation available.
Show »
Length:1,519
Mass (Da):163,627
Checksum:i240FDF3FC1A9D934
GO

Sequence cautioni

The sequence AAH47913.1 differs from that shown. Reason: Frameshift at position 1379. Curated
The sequence BAA22953.3 differs from that shown. Reason: Erroneous initiation. Curated

Experimental Info

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Sequence conflicti315 – 3151A → T in BAA22953 (PubMed:9179496).Curated

Alternative sequence

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifierActions
Alternative sequencei1 – 7070Missing in isoform 3. 1 PublicationVSP_024247Add
BLAST
Alternative sequencei1247 – 128236NTRRR…HTSTA → T in isoform 2. 1 PublicationVSP_024248Add
BLAST

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB006622 mRNA. Translation: BAA22953.3. Different initiation.
AL583810 Genomic DNA. No translation available.
BC047913 mRNA. Translation: AAH47913.1. Frameshift.
BC112928 mRNA. Translation: AAI12929.1.
CCDSiCCDS45175.1. [Q9Y4F5-2]
CCDS45176.2. [Q9Y4F5-3]
PIRiT00037.
RefSeqiNP_001106197.1. NM_001112726.2. [Q9Y4F5-2]
NP_055820.2. NM_015005.2. [Q9Y4F5-3]
XP_005267607.1. XM_005267550.2. [Q9Y4F5-1]
XP_011534967.1. XM_011536665.1. [Q9Y4F5-1]
XP_011534968.1. XM_011536666.1. [Q9Y4F5-1]
UniGeneiHs.533721.

Genome annotation databases

EnsembliENST00000414716; ENSP00000404151; ENSG00000099814. [Q9Y4F5-2]
ENST00000556508; ENSP00000451249; ENSG00000099814. [Q9Y4F5-3]
GeneIDi283638.
KEGGihsa:283638.
UCSCiuc001yps.4. human. [Q9Y4F5-1]

Keywords - Coding sequence diversityi

Alternative splicing

Cross-referencesi

Sequence databases

Select the link destinations:
EMBLi
GenBanki
DDBJi
Links Updated
AB006622 mRNA. Translation: BAA22953.3. Different initiation.
AL583810 Genomic DNA. No translation available.
BC047913 mRNA. Translation: AAH47913.1. Frameshift.
BC112928 mRNA. Translation: AAI12929.1.
CCDSiCCDS45175.1. [Q9Y4F5-2]
CCDS45176.2. [Q9Y4F5-3]
PIRiT00037.
RefSeqiNP_001106197.1. NM_001112726.2. [Q9Y4F5-2]
NP_055820.2. NM_015005.2. [Q9Y4F5-3]
XP_005267607.1. XM_005267550.2. [Q9Y4F5-1]
XP_011534967.1. XM_011536665.1. [Q9Y4F5-1]
XP_011534968.1. XM_011536666.1. [Q9Y4F5-1]
UniGeneiHs.533721.

3D structure databases

ProteinModelPortaliQ9Y4F5.
SMRiQ9Y4F5. Positions 1-120.
ModBaseiSearch...
MobiDBiSearch...

Protein-protein interaction databases

BioGridi129628. 19 interactions.
IntActiQ9Y4F5. 14 interactions.
STRINGi9606.ENSP00000404151.

PTM databases

iPTMnetiQ9Y4F5.
PhosphoSiteiQ9Y4F5.

Polymorphism and mutation databases

BioMutaiCEP170B.
DMDMi143342098.

Proteomic databases

EPDiQ9Y4F5.
MaxQBiQ9Y4F5.
PaxDbiQ9Y4F5.
PRIDEiQ9Y4F5.

Protocols and materials databases

DNASUi283638.
Structural Biology KnowledgebaseSearch...

Genome annotation databases

EnsembliENST00000414716; ENSP00000404151; ENSG00000099814. [Q9Y4F5-2]
ENST00000556508; ENSP00000451249; ENSG00000099814. [Q9Y4F5-3]
GeneIDi283638.
KEGGihsa:283638.
UCSCiuc001yps.4. human. [Q9Y4F5-1]

Organism-specific databases

CTDi283638.
GeneCardsiCEP170B.
HGNCiHGNC:20362. CEP170B.
HPAiHPA000871.
HPA059017.
neXtProtiNX_Q9Y4F5.
PharmGKBiPA134863153.
HUGEiSearch...
GenAtlasiSearch...

Phylogenomic databases

eggNOGiENOG410IJXX. Eukaryota.
ENOG410XQU2. LUCA.
GeneTreeiENSGT00640000091476.
HOGENOMiHOG000111524.
HOVERGENiHBG108016.
InParanoidiQ9Y4F5.
KOiK16463.
PhylomeDBiQ9Y4F5.
TreeFamiTF328469.

Miscellaneous databases

ChiTaRSiCEP170B. human.
GenomeRNAii283638.
PROiQ9Y4F5.

Gene expression databases

BgeeiQ9Y4F5.
CleanExiHS_KIAA0284.
ExpressionAtlasiQ9Y4F5. baseline and differential.
GenevisibleiQ9Y4F5. HS.

Family and domain databases

Gene3Di2.60.200.20. 1 hit.
InterProiIPR029300. CEP170_C.
IPR000253. FHA_dom.
IPR008984. SMAD_FHA_domain.
[Graphical view]
PfamiPF15308. CEP170_C. 1 hit.
PF00498. FHA. 1 hit.
[Graphical view]
SMARTiSM00240. FHA. 1 hit.
[Graphical view]
SUPFAMiSSF49879. SSF49879. 1 hit.
PROSITEiPS50006. FHA_DOMAIN. 1 hit.
[Graphical view]
ProtoNetiSearch...

Publicationsi

« Hide 'large scale' publications
  1. "Construction and characterization of human brain cDNA libraries suitable for analysis of cDNA clones encoding relatively large proteins."
    Ohara O., Nagase T., Ishikawa K., Nakajima D., Ohira M., Seki N., Nomura N.
    DNA Res. 4:53-59(1997) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2).
    Tissue: Brain.
  2. Ohara O., Nagase T., Kikuno R., Nomura N.
    Submitted (AUG-2005) to the EMBL/GenBank/DDBJ databases
    Cited for: SEQUENCE REVISION.
  3. "The DNA sequence and analysis of human chromosome 14."
    Heilig R., Eckenberg R., Petit J.-L., Fonknechten N., Da Silva C., Cattolico L., Levy M., Barbe V., De Berardinis V., Ureta-Vidal A., Pelletier E., Vico V., Anthouard V., Rowen L., Madan A., Qin S., Sun H., Du H.
    , Pepin K., Artiguenave F., Robert C., Cruaud C., Bruels T., Jaillon O., Friedlander L., Samson G., Brottier P., Cure S., Segurens B., Aniere F., Samain S., Crespeau H., Abbasi N., Aiach N., Boscus D., Dickhoff R., Dors M., Dubois I., Friedman C., Gouyvenoux M., James R., Madan A., Mairey-Estrada B., Mangenot S., Martins N., Menard M., Oztas S., Ratcliffe A., Shaffer T., Trask B., Vacherie B., Bellemere C., Belser C., Besnard-Gonnet M., Bartol-Mavel D., Boutard M., Briez-Silla S., Combette S., Dufosse-Laurent V., Ferron C., Lechaplais C., Louesse C., Muselet D., Magdelenat G., Pateau E., Petit E., Sirvain-Trukniewicz P., Trybou A., Vega-Czarny N., Bataille E., Bluet E., Bordelais I., Dubois M., Dumont C., Guerin T., Haffray S., Hammadi R., Muanga J., Pellouin V., Robert D., Wunderle E., Gauguet G., Roy A., Sainte-Marthe L., Verdier J., Verdier-Discala C., Hillier L.W., Fulton L., McPherson J., Matsuda F., Wilson R., Scarpelli C., Gyapay G., Wincker P., Saurin W., Quetier F., Waterston R., Hood L., Weissenbach J.
    Nature 421:601-607(2003) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
  4. "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
    The MGC Project Team
    Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
    Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 3), NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 1404-1589 (ISOFORMS 1/2/3).
    Tissue: Eye and PNS.
  5. "Tryptic digestion of ubiquitin standards reveals an improved strategy for identifying ubiquitinated proteins by mass spectrometry."
    Denis N.J., Vasilescu J., Lambert J.-P., Smith J.C., Figeys D.
    Proteomics 7:868-874(2007) [PubMed] [Europe PMC] [Abstract]
    Cited for: UBIQUITINATION [LARGE SCALE ANALYSIS] AT LYS-289.
    Tissue: Mammary cancer.
  6. Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-597; SER-655; SER-721; SER-746; SER-748; SER-751; SER-753; SER-986; SER-988; SER-1135; THR-1304 AND SER-1545, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Cervix carcinoma.
  7. "Quantitative phosphoproteomic analysis of T cell receptor signaling reveals system-wide modulation of protein-protein interactions."
    Mayya V., Lundgren D.H., Hwang S.-I., Rezaul K., Wu L., Eng J.K., Rodionov V., Han D.K.
    Sci. Signal. 2:RA46-RA46(2009) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-986; SER-988; SER-1135 AND THR-1304, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Leukemic T-cell.
  8. "Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis."
    Olsen J.V., Vermeulen M., Santamaria A., Kumar C., Miller M.L., Jensen L.J., Gnad F., Cox J., Jensen T.S., Nigg E.A., Brunak S., Mann M.
    Sci. Signal. 3:RA3-RA3(2010) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-1545 AND SER-1548, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Cervix carcinoma.
  9. "System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation."
    Rigbolt K.T., Prokhorova T.A., Akimov V., Henningsen J., Johansen P.T., Kratchmarova I., Kassem M., Mann M., Olsen J.V., Blagoev B.
    Sci. Signal. 4:RS3-RS3(2011) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-986 AND SER-988, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
  10. "An enzyme assisted RP-RPLC approach for in-depth analysis of human liver phosphoproteome."
    Bian Y., Song C., Cheng K., Dong M., Wang F., Huang J., Sun D., Wang L., Ye M., Zou H.
    J. Proteomics 96:253-262(2014) [PubMed] [Europe PMC] [Abstract]
    Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT THR-542; SER-619; SER-655; SER-829; SER-853; SER-954 AND SER-1548, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
    Tissue: Liver.

Entry informationi

Entry nameiC170B_HUMAN
AccessioniPrimary (citable) accession number: Q9Y4F5
Secondary accession number(s): Q2KHR7, Q86TI7
Entry historyi
Integrated into UniProtKB/Swiss-Prot: April 3, 2007
Last sequence update: April 3, 2007
Last modified: June 8, 2016
This is version 113 of the entry and version 4 of the sequence. [Complete history]
Entry statusiReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Miscellaneousi

Keywords - Technical termi

Complete proteome, Reference proteome

Documents

  1. Human chromosome 14
    Human chromosome 14: entries, gene names and cross-references to MIM
  2. SIMILARITY comments
    Index of protein domains and families

Similar proteinsi

Links to similar proteins from the UniProt Reference Clusters (UniRef) at 100%, 90% and 50% sequence identity:
100%UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into one UniRef entry.
90%UniRef90 is built by clustering UniRef100 sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence (a.k.a seed sequence).
50%UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.