Skip Header

You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.
Contribute Send feedback
Read comments (?) or add your own

O15047 (SET1A_HUMAN) Reviewed, UniProtKB/Swiss-Prot

Last modified July 9, 2014. Version 125. Feed History...

Clusters with 100%, 90%, 50% identity | Documents (6) | Third-party data text xml rdf/xml gff fasta
to top of pageNames·Attributes·General annotation·Ontologies·Interactions·Sequence annotation·Sequences·References·Cross-refs·Entry info·DocumentsCustomize order

Names and origin

Protein namesRecommended name:
Histone-lysine N-methyltransferase SETD1A

EC=2.1.1.43
Alternative name(s):
Lysine N-methyltransferase 2F
SET domain-containing protein 1A
Short name=hSET1A
Set1/Ash2 histone methyltransferase complex subunit SET1
Gene names
Name:SETD1A
Synonyms:KIAA0339, KMT2F, SET1, SET1A
OrganismHomo sapiens (Human) [Reference proteome]
Taxonomic identifier9606 [NCBI]
Taxonomic lineageEukaryotaMetazoaChordataCraniataVertebrataEuteleostomiMammaliaEutheriaEuarchontogliresPrimatesHaplorrhiniCatarrhiniHominidaeHomo

Protein attributes

Sequence length1707 AA.
Sequence statusComplete.
Protein existenceEvidence at protein level

General annotation (Comments)

Function

Histone methyltransferase that specifically methylates 'Lys-4' of histone H3, when part of the SET1 histone methyltransferase (HMT) complex, but not if the neighboring 'Lys-9' residue is already methylated. H3 'Lys-4' methylation represents a specific tag for epigenetic transcriptional activation. The non-overalpping localization with SETD1B suggests that SETD1A and SETD1B make non-redundant contributions to the epigenetic control of chromatin structure and gene expression. Ref.4

Catalytic activity

S-adenosyl-L-methionine + L-lysine-[histone] = S-adenosyl-L-homocysteine + N(6)-methyl-L-lysine-[histone].

Subunit structure

Component of the SET1 complex, at least composed of the catalytic subunit (SETD1A or SETD1B), WDR5, WDR82, RBBP5, ASH2L/ASH2, CXXC1/CFP1, HCFC1 and DPY30. Interacts with HCFC1. Interacts with ASH2/ASH2L, CXXC1/CFP1, WDR5 and RBBP5. Interacts (via the RRM domain) with WDR82. Interacts (via the RRM domain) with hyperphosphorylated C-terminal domain (CTD) of RNA polymerase II large subunit (POLR2A) only in the presence of WDR82. Binds specifically to CTD heptad repeats phosphorylated on 'Ser-5' of each heptad. Interacts with ZNF335. Interacts with SUPT6H. Ref.4 Ref.5 Ref.7 Ref.8 Ref.9 Ref.15 Ref.16

Subcellular location

Nucleus speckle. Chromosome. Note: Localizes to a largely non-overlapping set of euchromatic nuclear speckles with SETD1B, suggesting that SETD1A and SETD1B each bind to a unique set of target genes. Ref.7

Sequence similarities

Belongs to the class V-like SAM-binding methyltransferase superfamily.

Contains 1 post-SET domain.

Contains 1 RRM (RNA recognition motif) domain.

Contains 1 SET domain.

Sequence caution

The sequence AAH35795.1 differs from that shown. Reason: Contaminating sequence. Potential poly-A sequence.

The sequence BAA20797.2 differs from that shown. Reason: Erroneous initiation. Translation N-terminally shortened.

Binary interactions

With

Entry

#Exp.

IntAct

Notes

HCFC1P516102EBI-540779,EBI-396176
RBBP5Q152913EBI-540779,EBI-592823

Sequence annotation (Features)

Feature keyPosition(s)LengthDescriptionGraphical viewFeature identifier

Molecule processing

Chain1 – 17071707Histone-lysine N-methyltransferase SETD1A
PRO_0000186056

Regions

Domain84 – 17289RRM
Domain1568 – 1685118SET
Domain1691 – 170717Post-SET
Region1415 – 145036Interaction with CFP1
Region1450 – 153788Interaction with ASH2L, RBBP5 and WDR5
Motif1299 – 13035HCFC1-binding motif (HBM)
Compositional bias244 – 362119Ser-rich
Compositional bias383 – 654272Pro-rich
Compositional bias899 – 1010112Glu-rich
Compositional bias1011 – 106252Ser-rich
Compositional bias1071 – 1194124Pro-rich
Compositional bias1334 – 137542Glu-rich
Compositional bias1403 – 141715Pro-rich

Amino acid modifications

Modified residue5081Phosphoserine Ref.12

Natural variations

Natural variant6391D → N.
Corresponds to variant rs897985 [ dbSNP | Ensembl ].
VAR_059318

Experimental info

Sequence conflict242 – 2487PCSQDTS → ACPVTHV in AAH35795. Ref.3
Sequence conflict1240 – 12423TEE → FLG in AAH27450. Ref.3

Secondary structure

..................... 1707
Helix Strand Turn

Details...

Sequences

Sequence LengthMass (Da)Tools
O15047 [UniParc].

Last modified June 21, 2005. Version 3.
Checksum: 0084217B0D425050

FASTA1,707186,034
        10         20         30         40         50         60 
MDQEGGGDGQ KAPSFQWRNY KLIVDPALDP ALRRPSQKVY RYDGVHFSVN DSKYIPVEDL 

        70         80         90        100        110        120 
QDPRCHVRSK NRDFSLPVPK FKLDEFYIGQ IPLKEVTFAR LNDNVRETFL KDMCRKYGEV 

       130        140        150        160        170        180 
EEVEILLHPR TRKHLGLARV LFTSTRGAKE TVKNLHLTSV MGNIIHAQLD IKGQQRMKYY 

       190        200        210        220        230        240 
ELIVNGSYTP QTVPTGGKAL SEKFQGSGAA TETAESRRRS SSDTAAYPAG TTAVGTPGNG 

       250        260        270        280        290        300 
TPCSQDTSFS SSRQDTPSSF GQFTPQSSQG TPYTSRGSTP YSQDSAYSSS TTSTSFKPRR 

       310        320        330        340        350        360 
SENSYQDAFS RRHFSASSAS TTASTAIAAT TAATASSSAS SSSLSSSSSS SSSSSSSQFR 

       370        380        390        400        410        420 
SSDANYPAYY ESWNRYQRHT SYPPRRATRE EPPGAPFAEN TAERFPPSYT SYLPPEPSRP 

       430        440        450        460        470        480 
TDQDYRPPAS EAPPPEPPEP GGGGGGGGPS PEREEVRTSP RPASPARSGS PAPETTNESV 

       490        500        510        520        530        540 
PFAQHSSLDS RIEMLLKEQR SKFSFLASDT EEEEENSSMV LGARDTGSEV PSGSGHGPCT 

       550        560        570        580        590        600 
PPPAPANFED VAPTGSGEPG ATRESPKANG QNQASPCSSG DDMEISDDDR GGSPPPAPTP 

       610        620        630        640        650        660 
PQQPPPPPPP PPPPPPYLAS LPLGYPPHQP AYLLPPRPDG PPPPEYPPPP PPPPHIYDFV 

       670        680        690        700        710        720 
NSLELMDRLG AQWGGMPMSF QMQTQMLTRL HQLRQGKGLI AASAGPPGGA FGEAFLPFPP 

       730        740        750        760        770        780 
PQEAAYGLPY ALYAQGQEGR GAYSREAYHL PMPMAAEPLP SSSVSGEEAR LPPREEAELA 

       790        800        810        820        830        840 
EGKTLPTAGT VGRVLAMLVQ EMKSIMQRDL NRKMVENVAF GAFDQWWESK EEKAKPFQNA 

       850        860        870        880        890        900 
AKQQAKEEDK EKTKLKEPGL LSLVDWAKSG GTTGIEAFAF GSGLRGALRL PSFKVKRKEP 

       910        920        930        940        950        960 
SEISEASEEK RPRPSTPAEE DEDDPEQEKE AGEPGRPGTK PPKRDEERGK TQGKHRKSFA 

       970        980        990       1000       1010       1020 
LDSEGEEASQ ESSSEKDEED DEEDEEDEDR EEAVDTTKKE TEVSDGEDEE SDSSSKCSLY 

      1030       1040       1050       1060       1070       1080 
ADSDGENDST SDSESSSSSS SSSSSSSSSS SSSSSSSSES SSEDEEEEER PAALPSASPP 

      1090       1100       1110       1120       1130       1140 
PREVPVPTPA PVEVPVPERV AGSPVTPLPE QEASPARPAG PTEESPPSAP LRPPEPPAGP 

      1150       1160       1170       1180       1190       1200 
PAPAPRPDER PSSPIPLLPP PKKRRKTVSF SAIEVVPAPE PPPATPPQAK FPGPASRKAP 

      1210       1220       1230       1240       1250       1260 
RGVERTIRNL PLDHASLVKS WPEEVSRGGR SRAGGRGRLT EEEEAEPGTE VDLAVLADLA 

      1270       1280       1290       1300       1310       1320 
LTPARRGLPA LPAVEDSEAT ETSDEAERPR PLLSHILLEH NYALAVKPTP PAPALRPPEP 

      1330       1340       1350       1360       1370       1380 
VPAPAALFSS PADEVLEAPE VVVAEAEEPK PQQLQQQREE GEEEGEEEGE EEEEESSDSS 

      1390       1400       1410       1420       1430       1440 
SSSDGEGALR RRSLRSHARR RRPPPPPPPP PPRAYEPRSE FEQMTILYDI WNSGLDSEDM 

      1450       1460       1470       1480       1490       1500 
SYLRLTYERL LQQTSGADWL NDTHWVHHTI TNLTTPKRKR RPQDGPREHQ TGSARSEGYY 

      1510       1520       1530       1540       1550       1560 
PISKKEKDKY LDVCPVSARQ LEGVDTQGTN RVLSERRSEQ RRLLSAIGTS AIMDSDLLKL 

      1570       1580       1590       1600       1610       1620 
NQLKFRKKKL RFGRSRIHEW GLFAMEPIAA DEMVIEYVGQ NIRQMVADMR EKRYVQEGIG 

      1630       1640       1650       1660       1670       1680 
SSYLFRVDHD TIIDATKCGN LARFINHCCT PNCYAKVITI ESQKKIVIYS KQPIGVDEEI 

      1690       1700 
TYDYKFPLED NKIPCLCGTE SCRGSLN 

« Hide

References

« Hide 'large scale' references
[1]"Prediction of the coding sequences of unidentified human genes. VII. The complete sequences of 100 new cDNA clones from brain which can code for large proteins in vitro."
Nagase T., Ishikawa K., Nakajima D., Ohira M., Seki N., Miyajima N., Tanaka A., Kotani H., Nomura N., Ohara O.
DNA Res. 4:141-150(1997) [PubMed] [Europe PMC] [Abstract]
Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
Tissue: Brain.
[2]"The sequence and analysis of duplication-rich human chromosome 16."
Martin J., Han C., Gordon L.A., Terry A., Prabhakar S., She X., Xie G., Hellsten U., Chan Y.M., Altherr M., Couronne O., Aerts A., Bajorek E., Black S., Blumer H., Branscomb E., Brown N.C., Bruno W.J. expand/collapse author list , Buckingham J.M., Callen D.F., Campbell C.S., Campbell M.L., Campbell E.W., Caoile C., Challacombe J.F., Chasteen L.A., Chertkov O., Chi H.C., Christensen M., Clark L.M., Cohn J.D., Denys M., Detter J.C., Dickson M., Dimitrijevic-Bussod M., Escobar J., Fawcett J.J., Flowers D., Fotopulos D., Glavina T., Gomez M., Gonzales E., Goodstein D., Goodwin L.A., Grady D.L., Grigoriev I., Groza M., Hammon N., Hawkins T., Haydu L., Hildebrand C.E., Huang W., Israni S., Jett J., Jewett P.B., Kadner K., Kimball H., Kobayashi A., Krawczyk M.-C., Leyba T., Longmire J.L., Lopez F., Lou Y., Lowry S., Ludeman T., Manohar C.F., Mark G.A., McMurray K.L., Meincke L.J., Morgan J., Moyzis R.K., Mundt M.O., Munk A.C., Nandkeshwar R.D., Pitluck S., Pollard M., Predki P., Parson-Quintana B., Ramirez L., Rash S., Retterer J., Ricke D.O., Robinson D.L., Rodriguez A., Salamov A., Saunders E.H., Scott D., Shough T., Stallings R.L., Stalvey M., Sutherland R.D., Tapia R., Tesmer J.G., Thayer N., Thompson L.S., Tice H., Torney D.C., Tran-Gyamfi M., Tsai M., Ulanovsky L.E., Ustaszewska A., Vo N., White P.S., Williams A.L., Wills P.L., Wu J.-R., Wu K., Yang J., DeJong P., Bruce D., Doggett N.A., Deaven L., Schmutz J., Grimwood J., Richardson P., Rokhsar D.S., Eichler E.E., Gilna P., Lucas S.M., Myers R.M., Rubin E.M., Pennacchio L.A.
Nature 432:988-994(2004) [PubMed] [Europe PMC] [Abstract]
Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
[3]"The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)."
The MGC Project Team
Genome Res. 14:2121-2127(2004) [PubMed] [Europe PMC] [Abstract]
Cited for: NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 1-248 AND 1239-1707.
Tissue: Brain and Duodenum.
[4]"Human Sin3 deacetylase and trithorax-related Set1/Ash2 histone H3-K4 methyltransferase are tethered together selectively by the cell-proliferation factor HCF-1."
Wysocka J., Myers M.P., Laherty C.D., Eisenman R.N., Herr W.
Genes Dev. 17:896-911(2003) [PubMed] [Europe PMC] [Abstract]
Cited for: FUNCTION, INTERACTION WITH HCFC1.
[5]"CpG-binding protein (CXXC finger protein 1) is a component of the mammalian Set1 histone H3-Lys4 methyltransferase complex, the analogue of the yeast Set1/COMPASS complex."
Lee J.-H., Skalnik D.G.
J. Biol. Chem. 280:41725-41731(2005) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION IN THE SET1 COMPLEX.
[6]"Global, in vivo, and site-specific phosphorylation dynamics in signaling networks."
Olsen J.V., Blagoev B., Gnad F., Macek B., Kumar C., Mortensen P., Mann M.
Cell 127:635-648(2006) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
Tissue: Cervix carcinoma.
[7]"Identification and characterization of the human Set1B histone H3-Lys4 methyltransferase complex."
Lee J.-H., Tate C.M., You J.-S., Skalnik D.G.
J. Biol. Chem. 282:13419-13428(2007) [PubMed] [Europe PMC] [Abstract]
Cited for: SUBCELLULAR LOCATION, IDENTIFICATION IN THE SET1 COMPLEX.
[8]"Wdr82 is a C-terminal domain-binding protein that recruits the Setd1A Histone H3-Lys4 methyltransferase complex to transcription start sites of transcribed human genes."
Lee J.H., Skalnik D.G.
Mol. Cell. Biol. 28:609-618(2008) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION IN SET1 COMPLEX, INTERACTION WITH ASH2L; RBBP5; CXXC1; HCFC1; WDR5; WDR82 AND POLR2A.
[9]"Molecular regulation of H3K4 trimethylation by Wdr82, a component of human Set1/COMPASS."
Wu M., Wang P.F., Lee J.S., Martin-Brown S., Florens L., Washburn M., Shilatifard A.
Mol. Cell. Biol. 28:7337-7344(2008) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION IN SET1 COMPLEX.
[10]"A quantitative atlas of mitotic phosphorylation."
Dephoure N., Zhou C., Villen J., Beausoleil S.A., Bakalarski C.E., Elledge S.J., Gygi S.P.
Proc. Natl. Acad. Sci. U.S.A. 105:10762-10767(2008) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
Tissue: Cervix carcinoma.
[11]"Lys-N and trypsin cover complementary parts of the phosphoproteome in a refined SCX-based approach."
Gauci S., Helbig A.O., Slijper M., Krijgsveld J., Heck A.J., Mohammed S.
Anal. Chem. 81:4493-4501(2009) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
[12]"Quantitative phosphoproteomic analysis of T cell receptor signaling reveals system-wide modulation of protein-protein interactions."
Mayya V., Lundgren D.H., Hwang S.-I., Rezaul K., Wu L., Eng J.K., Rodionov V., Han D.K.
Sci. Signal. 2:RA46-RA46(2009) [PubMed] [Europe PMC] [Abstract]
Cited for: PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-508, IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
Tissue: Leukemic T-cell.
[13]"Initial characterization of the human central proteome."
Burkard T.R., Planyavsky M., Kaupe I., Breitwieser F.P., Buerckstuemmer T., Bennett K.L., Superti-Furga G., Colinge J.
BMC Syst. Biol. 5:17-17(2011) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
[14]"System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation."
Rigbolt K.T., Prokhorova T.A., Akimov V., Henningsen J., Johansen P.T., Kratchmarova I., Kassem M., Mann M., Olsen J.V., Blagoev B.
Sci. Signal. 4:RS3-RS3(2011) [PubMed] [Europe PMC] [Abstract]
Cited for: IDENTIFICATION BY MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
[15]"Microcephaly gene links trithorax and REST/NRSF to control neural stem cell proliferation and differentiation."
Yang Y.J., Baltus A.E., Mathew R.S., Murphy E.A., Evrony G.D., Gonzalez D.M., Wang E.P., Marshall-Walker C.A., Barry B.J., Murn J., Tatarakis A., Mahajan M.A., Samuels H.H., Shi Y., Golden J.A., Mahajnah M., Shenhav R., Walsh C.A.
Cell 151:1097-1112(2012) [PubMed] [Europe PMC] [Abstract]
Cited for: INTERACTION WITH ZNF335.
[16]"The histone chaperone Spt6 is required for activation-induced cytidine deaminase target determination through H3K4me3 regulation."
Begum N.A., Stanlie A., Nakata M., Akiyama H., Honjo T.
J. Biol. Chem. 287:32415-32429(2012) [PubMed] [Europe PMC] [Abstract]
Cited for: INTERACTION WITH SUPT6H.
+Additional computationally mapped references.

Cross-references

Sequence databases

EMBL
GenBank
DDBJ
AB002337 mRNA. Translation: BAA20797.2. Different initiation.
AC135048 Genomic DNA. No translation available.
BC027450 mRNA. Translation: AAH27450.1.
BC035795 mRNA. Translation: AAH35795.1. Sequence problems.
CCDSCCDS32435.1.
RefSeqNP_055527.1. NM_014712.1.
XP_005255780.1. XM_005255723.1.
XP_006721169.1. XM_006721106.1.
XP_006721170.1. XM_006721107.1.
UniGeneHs.297483.

3D structure databases

PDBe
RCSB-PDB
PDBj
EntryMethodResolution (Å)ChainPositionsPDBsum
3S8SX-ray1.30A89-197[»]
3UVNX-ray1.79B/D1492-1502[»]
4EWRX-ray1.50C1488-1501[»]
ProteinModelPortalO15047.
SMRO15047. Positions 89-195.
ModBaseSearch...
MobiDBSearch...

Protein-protein interaction databases

BioGrid115088. 18 interactions.
DIPDIP-33494N.
IntActO15047. 12 interactions.
STRING9606.ENSP00000262519.

PTM databases

PhosphoSiteO15047.

Proteomic databases

MaxQBO15047.
PaxDbO15047.
PRIDEO15047.

Protocols and materials databases

DNASU9739.
StructuralBiologyKnowledgebaseSearch...

Genome annotation databases

EnsemblENST00000262519; ENSP00000262519; ENSG00000099381.
GeneID9739.
KEGGhsa:9739.
UCSCuc002ead.1. human.

Organism-specific databases

CTD9739.
GeneCardsGC16P030968.
HGNCHGNC:29010. SETD1A.
HPAHPA020646.
MIM611052. gene.
neXtProtNX_O15047.
PharmGKBPA128394556.
HUGESearch...
GenAtlasSearch...

Phylogenomic databases

eggNOGCOG2940.
HOGENOMHOG000154291.
HOVERGENHBG067119.
InParanoidO15047.
KOK11422.
OMAGYLRLTY.
OrthoDBEOG7GQXTT.
PhylomeDBO15047.
TreeFamTF106436.

Enzyme and pathway databases

BRENDA2.1.1.43. 2681.

Gene expression databases

ArrayExpressO15047.
BgeeO15047.
CleanExHS_SETD1A.
GenevestigatorO15047.

Family and domain databases

Gene3D3.30.70.330. 1 hit.
InterProIPR024657. COMPASS_Set1_N-SET.
IPR012677. Nucleotide-bd_a/b_plait.
IPR003616. Post-SET_dom.
IPR000504. RRM_dom.
IPR001214. SET_dom.
[Graphical view]
PfamPF11764. N-SET. 1 hit.
PF00076. RRM_1. 1 hit.
PF00856. SET. 1 hit.
[Graphical view]
SMARTSM00508. PostSET. 1 hit.
SM00360. RRM. 1 hit.
SM00317. SET. 1 hit.
[Graphical view]
PROSITEPS50868. POST_SET. 1 hit.
PS50102. RRM. 1 hit.
PS50280. SET. 1 hit.
[Graphical view]
ProtoNetSearch...

Other

ChiTaRSSETD1A. human.
GenomeRNAi9739.
NextBio36651.
PROO15047.
SOURCESearch...

Entry information

Entry nameSET1A_HUMAN
AccessionPrimary (citable) accession number: O15047
Secondary accession number(s): A6NP62, Q6PIF3, Q8TAJ6
Entry history
Integrated into UniProtKB/Swiss-Prot: June 21, 2005
Last sequence update: June 21, 2005
Last modified: July 9, 2014
This is version 125 of the entry and version 3 of the sequence. [Complete history]
Entry statusReviewed (UniProtKB/Swiss-Prot)
Annotation programChordata Protein Annotation Program
DisclaimerAny medical or genetic information present in this entry is provided for research, educational and informational purposes only. It is not in any way intended to be used as a substitute for professional medical advice, diagnosis, treatment or care.

Relevant documents

SIMILARITY comments

Index of protein domains and families

PDB cross-references

Index of Protein Data Bank (PDB) cross-references

MIM cross-references

Online Mendelian Inheritance in Man (MIM) cross-references in UniProtKB/Swiss-Prot

Human polymorphisms and disease mutations

Index of human polymorphisms and disease mutations

Human entries with polymorphisms or disease mutations

List of human entries with polymorphisms or disease mutations

Human chromosome 16

Human chromosome 16: entries, gene names and cross-references to MIM