annbioch.txt
----------------------------------------------------------------------------
UniProt - Swiss-Prot Protein Knowledgebase
Swiss Institute of Bioinformatics (SIB); Geneva, Switzerland
European Bioinformatics Institute (EBI); Hinxton, United Kingdom
Protein Information Resource (PIR); Washington DC, USA
----------------------------------------------------------------------------
Description: A primer on UniProtKB/Swiss-Prot annotation
Name: annbioch.txt
Release: 57.10 of 03-Nov-2009
----------------------------------------------------------------------------
============
Introduction
============
UniProtKB/Swiss-Prot is defined as an annotated protein sequence database.
We make every effort possible to ensure that all available biochemical
information accompanies the sequence data and that this information is as
complete and up-to-date as possible. This annotation is a labor-intensive
process that involves assessment of information from published articles
along with use of a variety of programs/algorithms. Use is also made of
Swiss-Prot itself in order to maintain standard nomenclature and
description comments. We describe here the steps we take to add all
relevant biochemical information to new entries going into Swiss-Prot.
There are different scenarios with respect to biochemical information that
accompanies sequence data reports. Sometimes scientists isolate and then
biochemically characterize the protein encoded by the gene they have
sequenced. Other times they infer this information through similarity to
other proteins within the same, conserved family. If it does not belong to a
particular family they infer through purely sequence similarity. Then we
have the genome sequence data that does not often have an accompanying
citation reporting any such classification. Below are the steps we use to
analyze these reports and how we assess what and how to add this information
to the sequence entries.
In all the scenarios below a new entry is taken from TrEMBL and, generally,
the first step is to get a copy of the article(s) given in the reference
lines. Then the sequence is aligned, using FastA or Blast, against all
existing Swiss-Prot and TrEMBL entries. This allows us, quickly and easily,
to assess if and how the sequence relates to existing families in SWISS-
PROT. The next step is to read the article(s), assess the information
given and add relevant comments and features to the entry.
It is important to note that the following is just an outline of the
annotation process. The whole process of assessing information for addition
into Swiss-Prot entries is MUCH MORE complex.
(I) Article(s) reports sequencing (nucleic acid and/or amino acid) and
biochemical characterization
Often from reading the abstract of the paper and analyzing the FastA
results, we can see that the protein belongs to a particular family. In
these cases, care is taken to look at other members of the family and to
become familiar with the annotation that already exists. Any standard
annotation that is common to the family, for example, the description
line(s) and the keywords, can be added to the new entry. Other comments and
features, specific to the family, can be added in conjunction with reading
the paper. Any additional information from the paper, for example
post-translational modifications, is added to the entry.
(II) Article(s) reports sequencing and with no biochemical characterization
In the majority of articles reporting gene sequencing, the gene is
translated to give the protein sequence but the in vivo protein is rarely
isolated and characterized. Often a probe from a similar organism is used to
pinpoint the gene and then the authors infer biochemical characteristics. In
these cases, curators assess what the authors imply with the results of the
alignments against Swiss-Prot and TrEMBL. When the sequence "hits" against a
particular family the description line(s), the similarity comments and
keywords specific to the family, can be added to the new entry. More care is
taken when looking at function, subunit and sequence features. This is the
first of the cases where we can introduce three of four adjectives commonly
found in Swiss-Prot, namely "probable", "potential" and "by similarity" (for
a description of "putative" please see later under Genome Data).
When a gene has been identified from probing with the gene from another
organism and that gene encodes a characterized protein the description line
will be copied over from the corresponding protein sequence entry. When
present in the existing entry and it is not species specific, the function
and other comment lines are added along with "by similarity" in parentheses.
It should be noted "by similarity" is used when the comment/feature in the
existing entry has been proved, categorically, to be so.
Examples:
a) Swiss-Prot entry where authors have biochemically characterized the
protein.
ID AMPA_ECOLI Reviewed; 503 AA.
AC P68767; P11648; Q2M649;
DT 21-DEC-2004, integrated into UniProtKB/Swiss-Prot.
DT 21-DEC-2004, sequence version 1.
DT 16-DEC-2008, entry version 43.
DE RecName: Full=Cytosol aminopeptidase;
DE EC=3.4.11.1;
DE AltName: Full=Leucine aminopeptidase;
DE Short=LAP;
DE AltName: Full=Leucyl aminopeptidase;
DE AltName: Full=Aminopeptidase A/I;
GN Name=pepA; Synonyms=carP, xerB; OrderedLocusNames=b4260, JW4217;
OS Escherichia coli (strain K12).
OC Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
OC Enterobacteriaceae; Escherichia.
OX NCBI_TaxID=83333;
RN [1]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND PROTEIN SEQUENCE OF 1-20.
RC STRAIN=K12;
RX MEDLINE=89356633; [Pubmed: 2670557]
RA Stirling C.J., Colloms S., Collins J.F., Szatmari G., Sherratt D.J.;
RT "xerB, an Escherichia coli gene required for plasmid ColE1 site-
RT specific recombination, is identical to pepA, encoding aminopeptidase
RT A, a protein with substantial similarity to bovine lens leucine
RT aminopeptidase.";
RL EMBO J. 8:1623-1627(1989).
RN [2]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RC STRAIN=K12;
RX MEDLINE=95341674; [Pubmed: 7616564] [Article from publisher]
RA Charlier D., Hassanzadeh G., Kholti A., Gigot D., Pierard A.,
RA Glansdorff N.;
RT "carP, involved in pyrimidine regulation of the Escherichia coli
RT carbamoylphosphate synthetase operon encodes a sequence-specific DNA-
RT binding protein identical to XerB and PepA, also required for
RT resolution of ColEI multimers.";
RL J. Mol. Biol. 250:392-406(1995).
RN [3]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=K12 / MG1655 / ATCC 47076;
RX MEDLINE=95334362; [Pubmed: 7610040] [Article from publisher]
RA Burland V.D., Plunkett G. III, Sofia H.J., Daniels D.L.,
RA Blattner F.R.;
RT "Analysis of the Escherichia coli genome VI: DNA sequence of the
RT region from 92.8 through 100 minutes.";
RL Nucleic Acids Res. 23:2105-2119(1995).
RN [4]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=K12 / MG1655 / ATCC 47076;
RX MEDLINE=97426617; [Pubmed: 9278503] [Article from publisher]
RA Blattner F.R., Plunkett G. III, Bloch C.A., Perna N.T., Burland V.,
RA Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F.,
RA Gregor J., Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J.,
RA Mau B., Shao Y.;
RT "The complete genome sequence of Escherichia coli K-12.";
RL Science 277:1453-1474(1997).
RN [5]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=K12 / W3110 / ATCC 27325 / DSM 5911;
RX [Pubmed: 16738553] [Article from publisher]
RA Hayashi K., Morooka N., Yamamoto Y., Fujita K., Isono K., Choi S.,
RA Ohtsubo E., Baba T., Wanner B.L., Mori H., Horiuchi T.;
RT "Highly accurate genome sequences of Escherichia coli K-12 strains
RT MG1655 and W3110.";
RL Mol. Syst. Biol. 2:E1-E5(2006).
RN [6]
RP MUTAGENESIS OF GLU-354.
RX MEDLINE=94335644; [Pubmed: 8057849]
RX [Article from publisher]
RA McCulloch R., Burke M.E., Sherratt D.J.;
RT "Peptidase activity of Escherichia coli aminopeptidase A is not
RT required for its role in Xer site-specific recombination.";
RL Mol. Microbiol. 12:241-251(1994).
RN [7]
RP X-RAY CRYSTALLOGRAPHY (2.5 ANGSTROMS).
RX [Pubmed: 10449417] [Article from publisher]
RA Strater N., Sherratt D.J., Colloms S.D.;
RT "X-ray structure of aminopeptidase A from Escherichia coli and a model
RT for the nucleoprotein complex in Xer site-specific recombination.";
RL EMBO J. 18:4513-4522(1999).
CC -!- FUNCTION: Presumably involved in the processing and regular
CC turnover of intracellular proteins. Catalyzes the removal of
CC unsubstituted N-terminal amino acids from various peptides.
CC Required for plasmid ColE1 site-specific recombination but not in
CC its aminopeptidase activity. Could act as a structural component
CC of the putative nucleoprotein complex in which the Xer
CC recombination reaction takes place.
CC -!- CATALYTIC ACTIVITY: Release of an N-terminal amino acid, Xaa-|-
CC Yaa-, in which Xaa is preferably Leu, but may be other amino acids
CC including Pro although not Arg or Lys, and Yaa may be Pro. Amino
CC acid amides and methyl esters are also readily hydrolyzed, but
CC rates on arylamides are exceedingly low.
CC -!- COFACTOR: Binds 2 manganese ions per subunit (By similarity).
CC -!- ENZYME REGULATION: Inhibited by zinc and EDTA.
CC -!- SUBUNIT: Homohexamer.
CC -!- SIMILARITY: Belongs to the peptidase M17 family.
CC -!- CAUTION: The ligation for manganese is based on the ligation for
CC zinc, an inhibitor, in the crystallographic structure reported in
CC PubMed:10449417. The ligation for manganese in the active form of
CC the enzyme may differ.
DR EMBL; X15130; CAA33225.1; -; Genomic_DNA.
DR EMBL; X86443; CAA60164.1; -; Genomic_DNA.
DR EMBL; U14003; AAA97157.1; -; Genomic_DNA.
DR EMBL; U00096; AAC77217.1; -; Genomic_DNA.
DR EMBL; AP009048; BAE78257.1; -; Genomic_DNA.
DR PIR; S04462; APECA.
DR RefSeq; AP_004756.1; -.
DR RefSeq; NP_418681.1; -.
DR PDB; 1GYT; X-ray; 2.50 A; A/B/C/D/E/F/G/H/I/J/K/L=1-503.
DR PDBsum; 1GYT; -.
DR GeneID; 948791; -.
DR GenomeReviews; AP009048_GR; JW4217.
DR GenomeReviews; U00096_GR; b4260.
DR KEGG; ecj:JW4217; -.
DR KEGG; eco:b4260; -.
DR EchoBASE; EB0688; -.
DR EcoGene; EG10694; pepA.
DR HOGENOM; P68767; -.
DR BioCyc; EcoCyc:EG10694-MON; -.
DR GO; GO:0005737; C:cytoplasm; IEA:HAMAP.
DR GO; GO:0004177; F:aminopeptidase activity; IEA:HAMAP.
DR GO; GO:0030145; F:manganese ion binding; IEA:HAMAP.
DR GO; GO:0008235; F:metalloexopeptidase activity; IEA:HAMAP.
DR GO; GO:0006508; P:proteolysis; IEA:InterPro.
DR HAMAP; MF_00181; -; 1.
DR InterPro; IPR011356; Peptidase_M17.
DR InterPro; IPR000819; Peptidase_M17_C.
DR InterPro; IPR008283; Peptidase_M17_N.
DR PANTHER; PTHR11963:SF3; Peptidase_M17; 1.
DR Pfam; PF00883; Peptidase_M17; 1.
DR Pfam; PF02789; Peptidase_M17_N; 1.
DR PRINTS; PR00481; LAMNOPPTDASE.
DR PROSITE; PS00631; CYTOSOL_AP; 1.
PE 1: Evidence at protein level;
KW 3D-structure; Aminopeptidase; Complete proteome;
KW Direct protein sequencing; Hydrolase; Manganese; Metal-binding;
KW Protease.
FT CHAIN 1 503 Cytosol aminopeptidase.
FT /FTId=PRO_0000165750.
FT ACT_SITE 282 282 Potential.
FT ACT_SITE 356 356 Potential.
FT METAL 270 270 Manganese 2 (Probable).
FT METAL 275 275 Manganese 1 (Probable).
FT METAL 275 275 Manganese 2 (Probable).
FT METAL 293 293 Manganese 2 (Probable).
FT METAL 352 352 Manganese 1 (Probable).
FT METAL 354 354 Manganese 1 (Probable).
FT METAL 354 354 Manganese 2 (Probable).
FT MUTAGEN 354 354 E->A: Loss of activity.
FT STRAND 2 6
FT HELIX 10 12
FT STRAND 18 23
FT TURN 24 26
FT HELIX 30 36
FT STRAND 39 41
FT HELIX 42 49
FT STRAND 59 64
FT STRAND 69 77
FT HELIX 86 102
FT STRAND 106 110
FT HELIX 112 114
FT HELIX 122 137
FT STRAND 156 160
FT HELIX 164 166
FT HELIX 167 192
FT TURN 195 197
FT HELIX 200 213
FT TURN 214 217
FT STRAND 218 223
FT HELIX 225 230
FT HELIX 234 241
FT STRAND 243 245
FT STRAND 248 255
FT STRAND 265 275
FT HELIX 287 294
FT HELIX 295 310
FT STRAND 313 325
FT STRAND 337 339
FT STRAND 345 347
FT HELIX 355 365
FT HELIX 366 369
FT STRAND 372 378
FT HELIX 382 388
FT TURN 389 391
FT STRAND 392 398
FT HELIX 400 413
FT STRAND 417 419
FT HELIX 424 427
FT HELIX 428 430
FT STRAND 433 439
FT HELIX 446 455
FT STRAND 463 467
FT TURN 469 471
FT STRAND 472 474
FT HELIX 476 478
FT HELIX 486 496
SQ SEQUENCE 503 AA; 54880 MW; 643DED17EAC44DCD CRC64;
MEFSVKSGSP EKQRSACIVV GVFEPRRLSP IAEQLDKISD GYISALLRRG ELEGKPGQTL
LLHHVPNVLS ERILLIGCGK ERELDERQYK QVIQKTINTL NDTGSMEAVC FLTELHVKGR
NNYWKVRQAV ETAKETLYSF DQLKTNKSEP RRPLRKMVFN VPTRRELTSG ERAIQHGLAI
AAGIKAAKDL GNMPPNICNA AYLASQARQL ADSYSKNVIT RVIGEQQMKE LGMHSYLAVG
QGSQNESLMS VIEYKGNASE DARPIVLVGK GLTFDSGGIS IKPSEGMDEM KYDMCGAAAV
YGVMRMVAEL QLPINVIGVL AGCENMPGGR AYRPGDVLTT MSGQTVEVLN TDAEGRLVLC
DVLTYVERFE PEAVIDVATL TGACVIALGH HITGLMANHN PLAHELIAAS EQSGDRAWRL
PLGDEYQEQL ESNFADMANI GGRPGGAITA GCFLSRFTRK YNWAHLDIAG TAWRSGKAKG
ATGRPVALLA QFLLNRAGFN GEE
//
b) Swiss-Prot entry where no characterization has taken place but where
information has been added because the sequences are highly comparable
and so we believe, beyond reasonable doubt, that it is such a protein.
The lines that have been indented are those where information has been
added.
ID AMPA_HAEIN Reviewed; 491 AA.
AC P45334;
DT 01-NOV-1995, integrated into UniProtKB/Swiss-Prot.
DT 01-NOV-1995, sequence version 1.
DT 20-JAN-2009, entry version 66.
DE RecName: Full=Cytosol aminopeptidase;
DE EC=3.4.11.1;
DE AltName: Full=Leucine aminopeptidase;
DE Short=LAP;
DE AltName: Full=Leucyl aminopeptidase;
GN Name=pepA; OrderedLocusNames=HI1705;
OS Haemophilus influenzae.
OC Bacteria; Proteobacteria; Gammaproteobacteria; Pasteurellales;
OC Pasteurellaceae; Haemophilus.
OX NCBI_TaxID=727;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 51907 / DSM 11121 / KW20 / Rd;
RX MEDLINE=95350630; [Pubmed: 7542800]
RA Fleischmann R.D., Adams M.D., White O., Clayton R.A., Kirkness E.F.,
RA Kerlavage A.R., Bult C.J., Tomb J.-F., Dougherty B.A., Merrick J.M.,
RA McKenney K., Sutton G.G., FitzHugh W., Fields C.A., Gocayne J.D.,
RA Scott J.D., Shirley R., Liu L.-I., Glodek A., Kelley J.M.,
RA Weidman J.F., Phillips C.A., Spriggs T., Hedblom E., Cotton M.D.,
RA Utterback T.R., Hanna M.C., Nguyen D.T., Saudek D.M., Brandon R.C.,
RA Fine L.D., Fritchman J.L., Fuhrmann J.L., Geoghagen N.S.M.,
RA Gnehm C.L., McDonald L.A., Small K.V., Fraser C.M., Smith H.O.,
RA Venter J.C.;
RT "Whole-genome random sequencing and assembly of Haemophilus influenzae
RT Rd.";
RL Science 269:496-512(1995).
CC -!- FUNCTION: Presumably involved in the processing and regular
CC turnover of intracellular proteins. Catalyzes the removal of
CC unsubstituted N-terminal amino acids from various peptides (By
CC similarity).
CC -!- CATALYTIC ACTIVITY: Release of an N-terminal amino acid, Xaa-|-
CC Yaa-, in which Xaa is preferably Leu, but may be other amino acids
CC including Pro although not Arg or Lys, and Yaa may be Pro. Amino
CC acid amides and methyl esters are also readily hydrolyzed, but
CC rates on arylamides are exceedingly low.
CC -!- COFACTOR: Binds 2 manganese ions per subunit (By similarity).
CC -!- SUBCELLULAR LOCATION: Cytoplasm (By similarity).
CC -!- SIMILARITY: Belongs to the peptidase M17 family.
DR EMBL; L42023; AAC23351.1; -; Genomic_DNA.
DR PIR; C64137; C64137.
DR RefSeq; NP_439847.1; -.
DR HSSP; P11648; 1GYT.
DR MEROPS; M17.003; -.
DR GeneID; 949712; -.
DR GenomeReviews; L42023_GR; HI1705.
DR KEGG; hin:HI1705; -.
DR NMPDR; fig|71421.1.peg.1618; -.
DR TIGR; HI1705; -.
DR HOGENOM; P45334; -.
DR BioCyc; HINF71421:HI_1705-MON; -.
DR BRENDA; 3.4.11.1; 109.
DR GO; GO:0005737; C:cytoplasm; IEA:HAMAP.
DR GO; GO:0004177; F:aminopeptidase activity; IEA:HAMAP.
DR GO; GO:0030145; F:manganese ion binding; IEA:HAMAP.
DR GO; GO:0008235; F:metalloexopeptidase activity; IEA:HAMAP.
DR GO; GO:0006508; P:proteolysis; IEA:InterPro.
DR HAMAP; MF_00181; -; 1.
DR InterPro; IPR011356; Peptidase_M17.
DR InterPro; IPR000819; Peptidase_M17_C.
DR InterPro; IPR008283; Peptidase_M17_N.
DR PANTHER; PTHR11963:SF3; Peptidase_M17; 1.
DR Pfam; PF00883; Peptidase_M17; 1.
DR Pfam; PF02789; Peptidase_M17_N; 1.
DR PRINTS; PR00481; LAMNOPPTDASE.
DR PROSITE; PS00631; CYTOSOL_AP; 1.
PE 3: Inferred from homology;
KW Aminopeptidase; Complete proteome; Cytoplasm; Hydrolase; Manganese;
KW Metal-binding; Protease.
FT CHAIN 1 491 Cytosol aminopeptidase.
FT /FTId=PRO_0000165758.
FT ACT_SITE 275 275 Potential.
FT ACT_SITE 349 349 Potential.
FT METAL 263 263 Manganese 2 (By similarity).
FT METAL 268 268 Manganese 1 (By similarity).
FT METAL 268 268 Manganese 2 (By similarity).
FT METAL 286 286 Manganese 2 (By similarity).
FT METAL 345 345 Manganese 1 (By similarity).
FT METAL 347 347 Manganese 1 (By similarity).
FT METAL 347 347 Manganese 2 (By similarity).
**
** ################# INTERNAL SECTION ##################
SQ SEQUENCE 491 AA; 53529 MW; 71376DDB1B0076EB CRC64;
MKYQAKNTAL SQATDCIVLG VYENNKFSKS FNEIDQLTQG YLNDLVKSGE LTGKLAQTVL
LRDLQGLSAK RLLIVGCGKK GELTERQYKQ IIQAVLKTLK ETNTREVISY LTEIELKDRD
LYWNIRFAIE TIEHTNYQFD HFKSQKAETS VLESFIFNTD CAQAQQAISH ANAISSGIKA
ARDIANMPPN ICNPAYLAEQ AKNLAENSTA LSLKVVDEEE MAKLGMNAYL AVSKGSENRA
YMSVLTFNNA PDKNAKPIVL VGKGLTFDAG GISLKPAADM DEMKYDMCGA ASVFGTMKTI
AQLNLPLNVI GVLAGCENLP DGNAYRPGDI LTTMNGLTVE VLNTDAEGRL VLCDTLTYVE
RFEPELVIDV ATLTGACVVA LGQHNSGLVS TDNNLANALL QAATETTDKA WRLPLSEEYQ
EQLKSPFADL ANIGGRWGGA ITAGAFLSNF TKKYRWAHLD IAGTAWLQGA NKGATGRPVS
LLTQFLINQV K
//
The alignment below shows that the degree of sequence similarity is such
that we can classify, beyond reasonable doubt, this protein as an
aminopeptidase A/I.
AMPA_ECOLI MEFSVKSGSPEKQRSACIVVGVFEPRRLSPIAEQLDKISDGYISALLRRG
AMPA_HAEIN MKYQAKN-TALSQATDCIVLGVYENNKFSKSFNEIDQLTQGYLNDLVKSG
*.. .*. .. .* ..***.**.* ...* ...*....**...*...*
AMPA_ECOLI ELEGKPGQTLLLHHVPNVLSERILLIGCGKERELDERQYKQVIQKTINTL
AMPA_HAEIN ELTGKLAQTVLLRDLQGLSAKRLLIVGCGKKGELTERQYKQIIQAVLKTL
**.** .**.**...... ..*.*..****. **.******.** ...**
AMPA_ECOLI NDTGSMEAVCFLTELHVKGRNNYWKVRQAVETAKETLYSFDQLKTNKSEP
AMPA_HAEIN KETNTREVISYLTEIELKDRDLYWNIRFAIETIEHTNYQFDHFKSQKAET
..*...*....***...*.*. **..* *.** ..* * **..*..*.*.
AMPA_ECOLI RRPLRKMVFNVPTRRELTSGERAIQHGLAIAAGIKAAKDLGNMPPNICNA
AMPA_HAEIN S-VLESFIFNTDC----AQAQQAISHANAISSGIKAARDIANMPPNICNP
. * ...**. . ...** *. **..*****.*..********.
AMPA_ECOLI AYLASQARQLADSYSKNVITRVIGEQQMKELGMHSYLAVGQGSQNESLMS
AMPA_HAEIN AYLAEQAKNLAEN-STALSLKVVDEEEMAKLGMNAYLAVSKGSENRAYMS
****.**..**.. *... .*..*..* .***..****..**.* . **
AMPA_ECOLI VIEYKGNASEDARPIVLVGKGLTFDSGGISIKPSEGMDEMKYDMCGAAAV
AMPA_HAEIN VLTFNNAPDKNAKPIVLVGKGLTFDAGGISLKPAADMDEMKYDMCGAASV
*..........*.************.****.**...************.*
AMPA_ECOLI YGVMRMVAELQLPINVIGVLAGCENMPGGRAYRPGDVLTTMSGQTVEVLN
AMPA_HAEIN FGTMKTIAQLNLPLNVIGVLAGCENLPDGNAYRPGDILTTMNGLTVEVLN
.*.*. .*.*.**.***********.*.*.******.****.* ******
AMPA_ECOLI TDAEGRLVLCDVLTYVERFEPEAVIDVATLTGACVIALGHHITGLMANHN
AMPA_HAEIN TDAEGRLVLCDTLTYVERFEPELVIDVATLTGACVVALGQHNSGLVSTDN
***********.********** ************.***.* .**....*
AMPA_ECOLI PLAHELIAASEQSGDRAWRLPLGDEYQEQLESNFADMANIGGRPGGAITA
AMPA_HAEIN NLANALLQAATETTDKAWRLPLSEEYQEQLKSPFADLANIGGRWGGAITA
**..*..*.....*.******..******.* ***.****** ******
AMPA_ECOLI GCFLSRFTRKYNWAHLDIAGTAWRSGKAKGATGRPVALLAQFLLNRAGFNGEE
AMPA_HAEIN GAFLSNFTKKYRWAHLDIAGTAWLQGANKGATGRPVSLLTQFLINQVK
* ***.**.**.*********** * .********.**.***.*..
(III) Protein sequence data from translation of genome sequencing data
Genome sequencing has caused a massive influx of data into the nucleotide
sequence databases and this has lead to the same influx into TrEMBL giving
thousands of entries waiting to go into Swiss-Prot. This sequence data is
submitted to the nucleotide sequence databases and is reported in
publications that show the entire genome sequence as well as genes that are
predicted by a number of methods. Apart from these gene designations the
papers rarely include experimental information about any of the predicted
proteins from these analyses. By making use of what is reported coupled to
the assessment of results from sequence alignments, that hit against both
characterized and part-characterized protein sequences (see above), we make
an effort to add relevant biochemical information to these translated
protein sequences.
The first step here is to align the translated sequences against Swiss-Prot
and TrEMBL. (We run against TrEMBL as an additional check for exact matches
so helping in the attempt to reduce redundancy in our data and to pick up
PROSITE/Pfam information that may be missing from the entry that is being
worked with). This is described fully further on. The results give rise to a
number of scenarios and they are:
1. identical to an existing sequence in Swiss-Prot from the same organism,
2. identical to an existing sequence in Swiss-Prot from a different
organism which may or may not be related
3. strong similarity (i.e. many residues are conserved residues), over the
entire sequence, to an existing entry (from a related or different
organism)
4. strong similarity only at regions in the sequence (from same, related
or different organism)
5. some similarity to one or more existing entries
6. no similarity to any existing entries
Here is a detailed description of all the above scenario.
1) Identical to an existing sequence in Swiss-Prot from the same organism
Update the existing Swiss-Prot entry by adding the new reference and new
EMBL DR line. Check new reference for any additional information.
2) Identical to an existing sequence in Swiss-Prot from a different organism
which may or may not be related
We create a new entry based on the template entry. The majority of the
annotation information (comments, features, etc) are copied with the
qualifier "By similarity" added. For example, the entry shown below has
been annotated based on the 100% identical (at protein level) entry from
E.coli which was shown in section II above.
ID AMPA_ECO57 Reviewed; 503 AA.
AC P68768; P11648;
DT 21-DEC-2004, integrated into UniProtKB/Swiss-Prot.
DT 21-DEC-2004, sequence version 1.
DT 16-DEC-2008, entry version 29.
DE RecName: Full=Cytosol aminopeptidase;
DE EC=3.4.11.1;
DE AltName: Full=Leucine aminopeptidase;
DE Short=LAP;
DE AltName: Full=Leucyl aminopeptidase;
DE AltName: Full=Aminopeptidase A/I;
GN Name=pepA; Synonyms=carP, xerB; OrderedLocusNames=Z5872, ECs5237;
OS Escherichia coli O157:H7.
OC Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
OC Enterobacteriaceae; Escherichia.
OX NCBI_TaxID=83334;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=O157:H7 / EDL933 / ATCC 700927 / EHEC;
RX MEDLINE=21074935; [Pubmed: 11206551] [Article from publisher]
RA Perna N.T., Plunkett G. III, Burland V., Mau B., Glasner J.D.,
RA Rose D.J., Mayhew G.F., Evans P.S., Gregor J., Kirkpatrick H.A.,
RA Posfai G., Hackett J., Klink S., Boutin A., Shao Y., Miller L.,
RA Grotbeck E.J., Davis N.W., Lim A., Dimalanta E.T., Potamousis K.,
RA Apodaca J., Anantharaman T.S., Lin J., Yen G., Schwartz D.C.,
RA Welch R.A., Blattner F.R.;
RT "Genome sequence of enterohaemorrhagic Escherichia coli O157:H7.";
RL Nature 409:529-533(2001).
RN [2]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=O157:H7 / Sakai / RIMD 0509952 / EHEC;
RX MEDLINE=21156231; [Pubmed: 11258796] [Article from publisher]
RA Hayashi T., Makino K., Ohnishi M., Kurokawa K., Ishii K., Yokoyama K.,
RA Han C.-G., Ohtsubo E., Nakayama K., Murata T., Tanaka M., Tobe T.,
RA Iida T., Takami H., Honda T., Sasakawa C., Ogasawara N., Yasunaga T.,
RA Kuhara S., Shiba T., Hattori M., Shinagawa H.;
RT "Complete genome sequence of enterohemorrhagic Escherichia coli
RT O157:H7 and genomic comparison with a laboratory strain K-12.";
RL DNA Res. 8:11-22(2001).
CC -!- FUNCTION: Presumably involved in the processing and regular
CC turnover of intracellular proteins. Catalyzes the removal of
CC unsubstituted N-terminal amino acids from various peptides.
CC Required for plasmid ColE1 site-specific recombination but not in
CC its aminopeptidase activity. Could act as a structural component
CC of the putative nucleoprotein complex in which the Xer
CC recombination reaction takes place (By similarity).
CC -!- CATALYTIC ACTIVITY: Release of an N-terminal amino acid, Xaa-|-
CC Yaa-, in which Xaa is preferably Leu, but may be other amino acids
CC including Pro although not Arg or Lys, and Yaa may be Pro. Amino
CC acid amides and methyl esters are also readily hydrolyzed, but
CC rates on arylamides are exceedingly low.
CC -!- COFACTOR: Binds 2 manganese ions per subunit (By similarity).
CC -!- ENZYME REGULATION: Inhibited by zinc and EDTA (By similarity).
CC -!- SUBUNIT: Homohexamer (By similarity).
CC -!- SIMILARITY: Belongs to the peptidase M17 family.
DR EMBL; AE005174; AAG59459.1; -; Genomic_DNA.
DR EMBL; BA000007; BAB38660.1; -; Genomic_DNA.
DR PIR; E91283; E91283.
DR PIR; G86124; G86124.
DR RefSeq; NP_290893.1; -.
DR RefSeq; NP_313264.1; -.
DR SMR; P68768; 1-503.
DR GeneID; 913804; -.
DR GeneID; 959777; -.
DR GenomeReviews; AE005174_GR; Z5872.
DR GenomeReviews; BA000007_GR; ECs5237.
DR KEGG; ece:Z5872; -.
DR KEGG; ecs:ECs5237; -.
DR HOGENOM; P68768; -.
DR BioCyc; ECOL83334:ECS5237-MON; -.
DR GO; GO:0005737; C:cytoplasm; IEA:HAMAP.
DR GO; GO:0004177; F:aminopeptidase activity; IEA:HAMAP.
DR GO; GO:0030145; F:manganese ion binding; IEA:HAMAP.
DR GO; GO:0008235; F:metalloexopeptidase activity; IEA:HAMAP.
DR GO; GO:0006508; P:proteolysis; IEA:InterPro.
DR HAMAP; MF_00181; -; 1.
DR InterPro; IPR011356; Peptidase_M17.
DR InterPro; IPR000819; Peptidase_M17_C.
DR InterPro; IPR008283; Peptidase_M17_N.
DR PANTHER; PTHR11963:SF3; Peptidase_M17; 1.
DR Pfam; PF00883; Peptidase_M17; 1.
DR Pfam; PF02789; Peptidase_M17_N; 1.
DR PRINTS; PR00481; LAMNOPPTDASE.
DR PROSITE; PS00631; CYTOSOL_AP; 1.
PE 3: Inferred from homology;
KW Aminopeptidase; Complete proteome; Hydrolase; Manganese;
KW Metal-binding; Protease.
FT CHAIN 1 503 Cytosol aminopeptidase.
FT /FTId=PRO_0000165752.
FT ACT_SITE 282 282 Potential.
FT ACT_SITE 356 356 Potential.
FT METAL 270 270 Manganese 2 (By similarity).
FT METAL 275 275 Manganese 1 (By similarity).
FT METAL 275 275 Manganese 2 (By similarity).
FT METAL 293 293 Manganese 2 (By similarity).
FT METAL 352 352 Manganese 1 (By similarity).
FT METAL 354 354 Manganese 1 (By similarity).
FT METAL 354 354 Manganese 2 (By similarity).
SQ SEQUENCE 503 AA; 54880 MW; 643DED17EAC44DCD CRC64;
MEFSVKSGSP EKQRSACIVV GVFEPRRLSP IAEQLDKISD GYISALLRRG ELEGKPGQTL
LLHHVPNVLS ERILLIGCGK ERELDERQYK QVIQKTINTL NDTGSMEAVC FLTELHVKGR
NNYWKVRQAV ETAKETLYSF DQLKTNKSEP RRPLRKMVFN VPTRRELTSG ERAIQHGLAI
AAGIKAAKDL GNMPPNICNA AYLASQARQL ADSYSKNVIT RVIGEQQMKE LGMHSYLAVG
QGSQNESLMS VIEYKGNASE DARPIVLVGK GLTFDSGGIS IKPSEGMDEM KYDMCGAAAV
YGVMRMVAEL QLPINVIGVL AGCENMPGGR AYRPGDVLTT MSGQTVEVLN TDAEGRLVLC
DVLTYVERFE PEAVIDVATL TGACVIALGH HITGLMANHN PLAHELIAAS EQSGDRAWRL
PLGDEYQEQL ESNFADMANI GGRPGGAITA GCFLSRFTRK YNWAHLDIAG TAWRSGKAKG
ATGRPVALLA QFLLNRAGFN GEE
//
3) Strong similarity (i.e. many residues are identical or conserved), over
the entire sequence, to an existing entry (from a related or different
organism)
There is no fixed cut-off point in percentage sequence similarity. It is
from experience that the curators assess whether similarity is considered to
be strong or weak. For each individual case, we must also look to see
whether sequences are highly conserved between species. To exhibit this,
please look at the following example.
This entry has been created from data submitted from the Schizosaccharomyces
pombe genome project.
ID CHMU_SCHPO Reviewed; 251 AA.
AC O13739;
DT 15-JUL-1998, integrated into UniProtKB/Swiss-Prot.
DT 01-JAN-1998, sequence version 1.
DT 20-JAN-2009, entry version 59.
DE RecName: Full=Probable chorismate mutase;
DE Short=CM;
DE EC=5.4.99.5;
GN ORFNames=SPAC16E8.04c;
OS Schizosaccharomyces pombe (Fission yeast).
OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC Schizosaccharomycetes; Schizosaccharomycetales;
OC Schizosaccharomycetaceae; Schizosaccharomyces.
OX NCBI_TaxID=4896;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 38366 / 972;
RX MEDLINE=21848401; [Pubmed: 11859360] [Article from publisher]
RA Wood V., Gwilliam R., Rajandream M.A., Lyne M.H., Lyne R., Stewart A.,
RA Sgouros J.G., Peat N., Hayles J., Baker S.G., Basham D., Bowman S.,
RA Brooks K., Brown D., Brown S., Chillingworth T., Churcher C.M.,
RA Collins M., Connor R., Cronin A., Davis P., Feltwell T., Fraser A.,
RA Gentles S., Goble A., Hamlin N., Harris D.E., Hidalgo J., Hodgson G.,
RA Holroyd S., Hornsby T., Howarth S., Huckle E.J., Hunt S., Jagels K.,
RA James K.D., Jones L., Jones M., Leather S., McDonald S., McLean J.,
RA Mooney P., Moule S., Mungall K.L., Murphy L.D., Niblett D., Odell C.,
RA Oliver K., O'Neil S., Pearson D., Quail M.A., Rabbinowitsch E.,
RA Rutherford K.M., Rutter S., Saunders D., Seeger K., Sharp S.,
RA Skelton J., Simmonds M.N., Squares R., Squares S., Stevens K.,
RA Taylor K., Taylor R.G., Tivey A., Walsh S.V., Warren T., Whitehead S.,
RA Woodward J.R., Volckaert G., Aert R., Robben J., Grymonprez B.,
RA Weltjens I., Vanstreels E., Rieger M., Schaefer M., Mueller-Auer S.,
RA Gabel C., Fuchs M., Duesterhoeft A., Fritzc C., Holzer E., Moestl D.,
RA Hilbert H., Borzym K., Langer I., Beck A., Lehrach H., Reinhardt R.,
RA Pohl T.M., Eger P., Zimmermann W., Wedler H., Wambutt R., Purnelle B.,
RA Goffeau A., Cadieu E., Dreano S., Gloux S., Lelaure V., Mottier S.,
RA Galibert F., Aves S.J., Xiang Z., Hunt C., Moore K., Hurst S.M.,
RA Lucas M., Rochet M., Gaillardin C., Tallada V.A., Garzon A., Thode G.,
RA Daga R.R., Cruzado L., Jimenez J., Sanchez M., del Rey F., Benito J.,
RA Dominguez A., Revuelta J.L., Moreno S., Armstrong J., Forsburg S.L.,
RA Cerutti L., Lowe T., McCombie W.R., Paulsen I., Potashkin J.,
RA Shpakovski G.V., Ussery D., Barrell B.G., Nurse P.;
RT "The genome sequence of Schizosaccharomyces pombe.";
RL Nature 415:871-880(2002).
CC -!- CATALYTIC ACTIVITY: Chorismate = prephenate.
CC -!- ENZYME REGULATION: Allosterically regulated.
CC -!- PATHWAY: Metabolic intermediate biosynthesis; prephenate
CC biosynthesis; prephenate from chorismate: step 1/1.
CC -!- SUBUNIT: Homodimer (By similarity).
CC -!- SIMILARITY: Contains 1 chorismate mutase domain.
DR EMBL; CU329670; CAB11033.1; -; Genomic_DNA.
DR PIR; T37784; T37784.
DR RefSeq; NP_594216.1; -.
DR HSSP; P32178; 5CSM.
DR GeneID; 2542334; -.
DR KEGG; spo:SPAC16E8.04c; -.
DR NMPDR; fig|4896.1.peg.4186; -.
DR GeneDB_Spombe; SPAC16E8.04c; -.
DR BioCyc; SPOM-XXX-01:SPOM-XXX-01-001828-MON; -.
DR BRENDA; 5.4.99.5; 653.
DR ArrayExpress; O13739; -.
DR GO; GO:0005829; C:cytosol; IDA:GeneDB_SPombe.
DR GO; GO:0005634; C:nucleus; IDA:GeneDB_SPombe.
DR GO; GO:0004106; F:chorismate mutase activity; IEA:InterPro.
DR GO; GO:0009073; P:aromatic amino acid family biosynthetic pro...; IEA:InterPro.
DR InterPro; IPR002701; Chorismate_mut.
DR InterPro; IPR008238; Chorismate_mutase_AroQ_euk.
DR Gene3D; G3DSA:1.10.590.10; Chor_mut_AroQ_eu; 1.
DR PANTHER; PTHR21145; Chor_mut_AroQ_eu; 1.
DR Pfam; PF01817; CM_2; 1.
DR PIRSF; PIRSF017318; Chor_mut_AroQ_eu; 1.
DR TIGRFAMs; TIGR01802; CM_pl-yst; 1.
DR PROSITE; PS51169; CHORISMATE_MUT_3; 1.
PE 2: Evidence at transcript level;
KW Allosteric enzyme; Amino-acid biosynthesis;
KW Aromatic amino acid biosynthesis; Complete proteome; Isomerase.
FT CHAIN 1 251 Probable chorismate mutase.
FT /FTId=PRO_0000119205.
FT DOMAIN 1 251 Chorismate mutase.
SQ SEQUENCE 251 AA; 29050 MW; 1AC18AE4C1E6C4B7 CRC64;
MSLVNEKLKL ENIRSALIRQ EDTIIFNFLE RAQFPRNEKV YKSGKEGCLN LENYDGSFLN
YLLHEEEKVY ALVRRYASPE EYPFTDNLPE PILPKFSGKF PLHPNNVNVN SEILEYYINE
IVPKISSPGD DFDNYGSTVV CDIRCLQSLS RRIHYGKFVA EAKYLANPEK YKKLILARDI
KGIENEIVDA AQEERVLKRL HYKALNYGRD AADPTKPSDR INADCVASIY KDYVIPMTKK
VEVDYLLARL L
//
When aligned to its closest homolog in Swiss-Prot and TrEMBL the
following results are obtained:
CHMU_YEAST MDFTKPETVLNLQNIRDELVRMEDSIIFKFIERSHFATCPSVYEANHPG-
CHMU_SCHPO MSLVNEK--LKLENIRSALIRQEDTIIFNFLERAQFPRNEKVYKSGKEGC
*.... . *.*.***..*.* **.***.*.**..*. .**.... *
CHMU_YEAST LEIPNFKGSFLDWALSNLEIAHSRIRRFESPDETPFFPDKIQKSFLPSIN
CHMU_SCHPO LNLENYDGSFLNYLLHEEEKVYALVRRYASPEEYPF-TDNLPEPILP--K
*.. *..****.. * . * ... .**..**.* ** .*......** .
CHMU_YEAST YPQILAPYAPEVNYNDKIKKVYIEKIIPLISKRDGDDKNNFGSVATRDIE
CHMU_SCHPO FSGKFPLHPNNVNVNSEILEYYINEIVPKISSP-GDDFDNYGSTVVCDIR
.. .. .. .** *..* . **..*.* **.. *** .*.**... **
CHMU_YEAST CLQSLSRRIHFGKFVAEAKFQSDIPLYTKLIKSKDVEGIMKNITNSAVEE
CHMU_SCHPO CLQSLSRRIHYGKFVAEAKYLANPEKYKKLILARDIKGIENEIVDAAQEE
**********.********. .. *.*** ..*..** ..*...* **
CHMU_YEAST KILERLTKKAEVYGVDPTNES-GERRITPEYLVKIYKEIVIPITKEVEVE
CHMU_SCHPO RVLKRLHYKALNYGRDAADPTKPSDRINADCVASIYKDYVIPMTKKVEVD
..*.** ** ** *... . . **.......***. ***.**.***.
CHMU_YEAST YLLRRLEE
CHMU_SCHPO YLLARLL
*** **
The sequences show a high degree of similarity over their entire lengths and
so it is highly likely that the sequence from the Schizosaccharomyces pombe
genome project is indeed a chorismate mutase. This allows us to add the
standard description line as well as comments describing catalytic activity,
the pathway the enzyme is involved in as well as the relevant keywords. We
can also add a subunit comment but here we add "(by similarity)" to show
that this information has come from a characterized protein(s) (in this case
from CHMU_YEAST (P32178)) and has not been experimentally determined in S.
pombe. In addition, due to the fact that this protein has been biochemically
characterized we add "probable" to the DE line to indicate this e.g.
"Probable chorismate mutase."
4) Strong similarity only at regions in the sequence (from same, related
or different organism)
These cases often pick up on areas within a sequence responsible for binding
sites of, for example, cofactors, metals, DNA-binding and ATP/GTP. Here, a
function can often be assigned leading to description lines, comments and
keywords being added to the new entry. In some cases, however, even though
areas are conserved there is no evidence to characterize the protein. It
should be noted that we also make use of domain/families databases such as
PROSITE and Pfam in these cases. Below are examples of both these cases.
The entry below is again from the S.pombe genome project.
ID PPK14_SCHPO Reviewed; 566 AA.
AC Q09831;
DT 01-FEB-1996, integrated into UniProtKB/Swiss-Prot.
DT 01-FEB-1996, sequence version 1.
DT 20-JAN-2009, entry version 60.
DE RecName: Full=Serine/threonine-protein kinase ppk14;
DE EC=2.7.11.1;
GN Name=ppk14; ORFNames=SPAC4G8.05;
OS Schizosaccharomyces pombe (Fission yeast).
OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC Schizosaccharomycetes; Schizosaccharomycetales;
OC Schizosaccharomycetaceae; Schizosaccharomyces.
OX NCBI_TaxID=4896;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 38366 / 972;
RX MEDLINE=21848401; [Pubmed: 11859360] [Article from publisher]
RA Wood V., Gwilliam R., Rajandream M.A., Lyne M.H., Lyne R., Stewart A.,
RA Sgouros J.G., Peat N., Hayles J., Baker S.G., Basham D., Bowman S.,
RA Brooks K., Brown D., Brown S., Chillingworth T., Churcher C.M.,
RA Collins M., Connor R., Cronin A., Davis P., Feltwell T., Fraser A.,
RA Gentles S., Goble A., Hamlin N., Harris D.E., Hidalgo J., Hodgson G.,
RA Holroyd S., Hornsby T., Howarth S., Huckle E.J., Hunt S., Jagels K.,
RA James K.D., Jones L., Jones M., Leather S., McDonald S., McLean J.,
RA Mooney P., Moule S., Mungall K.L., Murphy L.D., Niblett D., Odell C.,
RA Oliver K., O'Neil S., Pearson D., Quail M.A., Rabbinowitsch E.,
RA Rutherford K.M., Rutter S., Saunders D., Seeger K., Sharp S.,
RA Skelton J., Simmonds M.N., Squares R., Squares S., Stevens K.,
RA Taylor K., Taylor R.G., Tivey A., Walsh S.V., Warren T., Whitehead S.,
RA Woodward J.R., Volckaert G., Aert R., Robben J., Grymonprez B.,
RA Weltjens I., Vanstreels E., Rieger M., Schaefer M., Mueller-Auer S.,
RA Gabel C., Fuchs M., Duesterhoeft A., Fritzc C., Holzer E., Moestl D.,
RA Hilbert H., Borzym K., Langer I., Beck A., Lehrach H., Reinhardt R.,
RA Pohl T.M., Eger P., Zimmermann W., Wedler H., Wambutt R., Purnelle B.,
RA Goffeau A., Cadieu E., Dreano S., Gloux S., Lelaure V., Mottier S.,
RA Galibert F., Aves S.J., Xiang Z., Hunt C., Moore K., Hurst S.M.,
RA Lucas M., Rochet M., Gaillardin C., Tallada V.A., Garzon A., Thode G.,
RA Daga R.R., Cruzado L., Jimenez J., Sanchez M., del Rey F., Benito J.,
RA Dominguez A., Revuelta J.L., Moreno S., Armstrong J., Forsburg S.L.,
RA Cerutti L., Lowe T., McCombie W.R., Paulsen I., Potashkin J.,
RA Shpakovski G.V., Ussery D., Barrell B.G., Nurse P.;
RT "The genome sequence of Schizosaccharomyces pombe.";
RL Nature 415:871-880(2002).
RN [2]
RP IDENTIFICATION.
RX [Pubmed: 15821139] [Article from publisher]
RA Bimbo A., Jia Y., Poh S.L., Karuturi R.K.M., den Elzen N., Peng X.,
RA Zheng L., O'Connell M., Liu E.T., Balasubramanian M.K., Liu J.;
RT "Systematic deletion analysis of fission yeast protein kinases.";
RL Eukaryot. Cell 4:799-813(2005).
RN [3]
RP PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT THR-379; SER-381 AND
RP THR-385, AND MASS SPECTROMETRY.
RX [Pubmed: 18257517] [Article from publisher]
RA Wilson-Grady J.T., Villen J., Gygi S.P.;
RT "Phosphoproteome analysis of fission yeast.";
RL J. Proteome Res. 7:1088-1097(2008).
CC -!- CATALYTIC ACTIVITY: ATP + a protein = ADP + a phosphoprotein.
CC -!- SIMILARITY: Belongs to the protein kinase superfamily. Ser/Thr
CC protein kinase family. KIN82 subfamily.
CC -!- SIMILARITY: Contains 1 protein kinase domain.
DR EMBL; CU329670; CAA91206.1; -; Genomic_DNA.
DR PIR; S62482; S62482.
DR RefSeq; NP_593065.1; -.
DR HSSP; P31751; 1GZK.
DR GeneID; 2543432; -.
DR KEGG; spo:SPAC4G8.05; -.
DR NMPDR; fig|4896.1.peg.3035; -.
DR GeneDB_Spombe; SPAC4G8.05; -.
DR BioCyc; SPOM-XXX-01:SPOM-XXX-01-000780-MON; -.
DR BRENDA; 2.7.11.1; 653.
DR ArrayExpress; Q09831; -.
DR GO; GO:0005524; F:ATP binding; IEA:InterPro.
DR GO; GO:0004674; F:protein serine/threonine kinase activity; IEA:InterPro.
DR GO; GO:0006468; P:protein amino acid phosphorylation; IEA:InterPro.
DR InterPro; IPR000719; Prot_kinase_core.
DR InterPro; IPR017441; Protein_kinase_ATP_bd_CS.
DR InterPro; IPR017442; Se/Thr_pkinase-rel.
DR InterPro; IPR008271; Ser_thr_pkin_AS.
DR InterPro; IPR002290; Ser_thr_pkinase.
DR Pfam; PF00069; Pkinase; 1.
DR ProDom; PD000001; Prot_kinase; 1.
DR SMART; SM00220; S_TKc; 1.
DR PROSITE; PS00107; PROTEIN_KINASE_ATP; FALSE_NEG.
DR PROSITE; PS50011; PROTEIN_KINASE_DOM; 1.
DR PROSITE; PS00108; PROTEIN_KINASE_ST; 1.
PE 1: Evidence at protein level;
KW ATP-binding; Complete proteome; Kinase; Nucleotide-binding;
KW Phosphoprotein; Serine/threonine-protein kinase; Transferase.
FT CHAIN 1 566 Serine/threonine-protein kinase ppk14.
FT /FTId=PRO_0000086043.
FT DOMAIN 195 485 Protein kinase.
FT NP_BIND 201 209 ATP (By similarity).
FT ACT_SITE 320 320 Proton acceptor (By similarity).
FT BINDING 224 224 ATP (By similarity).
FT MOD_RES 379 379 Phosphothreonine.
FT MOD_RES 381 381 Phosphoserine.
FT MOD_RES 385 385 Phosphothreonine.
SQ SEQUENCE 566 AA; 63482 MW; 3D18B4F84E10AA13 CRC64;
MNELHDGESS EEGRINVEDH LEEAKKDDTG HWKHSGTAKP SKFRAFIRLH FKDSRKFAFS
RKKEKELTSE DSDAANQSPS GAPESQTEEE SDRKIDGTGS SAEGGDGSGT DSISVIKKSF
FKSGRKKKDV PKSRNVSRSN GADTSVQREK LKDIFSPHGK EKELAHIKKT VATRARTYSS
NSIKICDVEV GPSSFEKVFL LGKGDVGRVY LVREKKSGKF YAMKVLSKQE MIKRNKSKRA
FAEQHILATS NHPFIVTLYH SFQSDEYLYL CMEYCMGGEF FRALQRRPGR CLSENEAKFY
IAEVTAALEY LHLMGFIYRD LKPENILLHE SGHIMLSDFD LSKQSNSAGA PTVIQARNAP
SAQNAYALDT KSCIADFRTN SFVGTEEYIA PEVIKGCGHT SAVDWWTLGI LFYEMLYATT
PFKGKNRNMT FSNILHKDVI FPEYADAPSI SSLCKNLIRK LLVKDENDRL GSQAGAADVK
LHPFFKNVQW ALLRHTEPPI IPKLAPIDEK GNPNISHLKE SKSLDITHSP QNTQTVEVPL
SNLSGADHGD DPFESFNSVT VHHEWD
//
By looking at the alignment we can see that the areas conserved are around
ATP-binding sites (which is picked up by PROSITE and Pfam too) and the
active site is also conserved. Hence we can add this information to the
entry as can be seen in the feature table by similarity. This shows that
there is no experimental proof but that it is very likely to be a
serine/threonine protein kinase because conserved features of that family of
proteins are present in the sequence.
Below is the alignment to highlight this.
PPK14_SCHPO MNELHDGESSEEGRINVEDHLEEAKKDD---TGHWKHSGTAKPSKFRAFIRLHFKDSR
NRC2_NEUCR MPSTKNANGEGHFPSRIKQFFRINSGSKDHKDRDAHTTSSSHGGAPRADAKTPSGFRQSR
.: :**. .**: :::...**. .* . *. . ..: * *::**
PPK14_SCHPO KFAFSRKKEKELTSED-------SDAANQSPSGAPESQ--TEEESD-----RKIDGTGSS
NRC2_NEUCR FFSVGRLRSTTVVSEGNPLDESMSPTAHANPYFAHQGQPGLRHHNDGSVPPSPPDTPSLK
*:..* :.. :.**. * :*: .* * :.* ....* * .. .
PPK14_SCHPO AEGGDGSGTDSISVIKKSFFKSGRKKKDVPKSRNVS---RSNG---ADTSVQRE---KLK
NRC2_NEUCR VDGPEGS-QQPTAATKEELARKLRRVASAPNAQGLFSKGQGNGDRPATAELSKEPLEESK
.:* :** :. :. *:.: :. *: ..*:::.: :.** * :.:.:* : *
PPK14_SCHPO DIFSPHGKEKE--------------------LAHIKKTVATRARTYSSNSIKICDVEVGP
NRC2_NEUCR DSNTVGFAEQKPNNDSSTSLAAPDADGLGALPPPIRQSPLAFRRTYSSNSIKVRNVEVGP
* : *:: . *::: : *********: :*****
PPK14_SCHPO SSFEKVFLLGKGDVGRVYLVREKKSGKFYAMKVLSKQEMIKRNKSKRAFAEQHILATSNH
NRC2_NEUCR QSFDKIKLIGKGDVGKVYLVKEKKSGRLYAMKVLSKKEMIKRNKIKRALAEQEILATSNH
.**:*: *:******:****:*****::********:******* ***:***.*******
PPK14_SCHPO PFIVTLYHSFQSDEYLYLCMEYCMGGEFFRALQRRPGRCLSENEAKFYIAEVTAALEYLH
NRC2_NEUCR PFIVTLYHSFQSEDYLYLCMEYCSGGEFFRALQTRPGKCIPEDDARFYAAEVTAALEYLH
************::********* ********* ***:*:.*::*:** ***********
PPK14_SCHPO LMGFIYRDLKPENILLHESGHIMLSDFDLSKQSNSAGAPTVIQARNAPSAQNAYALDTKS
NRC2_NEUCR LMGFIYRDLKPENILLHQSGHIMLSDFDLSKQSDPGGKPTMIIGKNGTSTSSLPTIDTKS
*****************:***************:..* **:* .:*..*:.. ::****
PPK14_SCHPO CIADFRTNSFVGTEEYIAPEVIKGCGHTSAVDWWTLGILFYEMLYATTPFKGKNRNMTFS
NRC2_NEUCR CIANFRTNSFVGTEEYIAPEVIKGSGHTSAVDWWTLGILIYEMLYGTTPFKGKNRNATFA
***:********************.**************:*****.********** **:
PPK14_SCHPO NILHKDVIFPEYADAPSISSLCKNLIRKLLVKDENDRLGSQAGAADVKLHPFFKNVQWAL
NRC2_NEUCR NILREDIPFPDHAGAPQISNLCKSLIRKLLIKDENRRLGARAGASDIKTHPFFRTTQWAL
***::*: **::*.**.**.***.******:**** ***::***:*:* ****:..****
PPK14_SCHPO LRHTEPPIIPKLAPIDEKGNPNISHLKESKSLDITHSPQNTQTVEVPLSNLSG-ADHGDD
NRC2_NEUCR IRHMKPPIVPNQGRG--IDTLNFRNVKESESVDISGSRQMGLKGEPLESGMVTPGENAVD
:** :***:*: . .. *: ::***:*:**: * * . * *.: .::. *
PPK14_SCHPO PFESFNSVTVHHEWD
NRC2_NEUCR PFEEFNSVTLHHDGDEEYHSDAYEKR
***.*****:**: *
5) Some similarity to one or more existing entries
It is in this category that the adjective "putative" comes into play. For
these cases, again there is no experimental proof that the protein exists
and there is only limited evidence to point the protein to a particular
family. Again, we have no fixed rules on what is "limited" and what isn't.
It is a judgement that we make based on which family it is and which, if
any, areas are conserved. Below is one example of many that exist in
Swiss-Prot. From the alignments and from hits to the pattern databases we
attempt to add any information so that it is not lost. By using putative in
the description line we are showing that there is evidence within the
sequence data but that we do not want to classify indefinitely until
experimental proof is available. When it is, the entry will be updated
accordingly. Staying with the S.pombe project the following shows this.
ID YA55_SCHPO Reviewed; 513 AA.
AC Q09735;
DT 01-NOV-1995, integrated into UniProtKB/Swiss-Prot.
DT 01-NOV-1995, sequence version 1.
DT 25-NOV-2008, entry version 63.
DE RecName: Full=Putative aminopeptidase C13A11.05;
DE EC=3.4.11.-;
GN ORFNames=SPAC13A11.05;
OS Schizosaccharomyces pombe (Fission yeast).
OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC Schizosaccharomycetes; Schizosaccharomycetales;
OC Schizosaccharomycetaceae; Schizosaccharomyces.
OX NCBI_TaxID=4896;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 38366 / 972;
RX MEDLINE=21848401; [Pubmed: 11859360] [Article from publisher]
RA Wood V., Gwilliam R., Rajandream M.A., Lyne M.H., Lyne R., Stewart A.,
RA Sgouros J.G., Peat N., Hayles J., Baker S.G., Basham D., Bowman S.,
RA Brooks K., Brown D., Brown S., Chillingworth T., Churcher C.M.,
RA Collins M., Connor R., Cronin A., Davis P., Feltwell T., Fraser A.,
RA Gentles S., Goble A., Hamlin N., Harris D.E., Hidalgo J., Hodgson G.,
RA Holroyd S., Hornsby T., Howarth S., Huckle E.J., Hunt S., Jagels K.,
RA James K.D., Jones L., Jones M., Leather S., McDonald S., McLean J.,
RA Mooney P., Moule S., Mungall K.L., Murphy L.D., Niblett D., Odell C.,
RA Oliver K., O'Neil S., Pearson D., Quail M.A., Rabbinowitsch E.,
RA Rutherford K.M., Rutter S., Saunders D., Seeger K., Sharp S.,
RA Skelton J., Simmonds M.N., Squares R., Squares S., Stevens K.,
RA Taylor K., Taylor R.G., Tivey A., Walsh S.V., Warren T., Whitehead S.,
RA Woodward J.R., Volckaert G., Aert R., Robben J., Grymonprez B.,
RA Weltjens I., Vanstreels E., Rieger M., Schaefer M., Mueller-Auer S.,
RA Gabel C., Fuchs M., Duesterhoeft A., Fritzc C., Holzer E., Moestl D.,
RA Hilbert H., Borzym K., Langer I., Beck A., Lehrach H., Reinhardt R.,
RA Pohl T.M., Eger P., Zimmermann W., Wedler H., Wambutt R., Purnelle B.,
RA Goffeau A., Cadieu E., Dreano S., Gloux S., Lelaure V., Mottier S.,
RA Galibert F., Aves S.J., Xiang Z., Hunt C., Moore K., Hurst S.M.,
RA Lucas M., Rochet M., Gaillardin C., Tallada V.A., Garzon A., Thode G.,
RA Daga R.R., Cruzado L., Jimenez J., Sanchez M., del Rey F., Benito J.,
RA Dominguez A., Revuelta J.L., Moreno S., Armstrong J., Forsburg S.L.,
RA Cerutti L., Lowe T., McCombie W.R., Paulsen I., Potashkin J.,
RA Shpakovski G.V., Ussery D., Barrell B.G., Nurse P.;
RT "The genome sequence of Schizosaccharomyces pombe.";
RL Nature 415:871-880(2002).
CC -!- COFACTOR: Binds 2 zinc ions per subunit (By similarity).
CC -!- SUBCELLULAR LOCATION: Cytoplasm (By similarity).
CC -!- SIMILARITY: Belongs to the peptidase M17 family.
DR EMBL; CU329670; CAA90806.1; -; Genomic_DNA.
DR PIR; T37612; T37612.
DR RefSeq; NP_592993.1; -.
DR HSSP; P00727; 1BPN.
DR MEROPS; M17.009; -.
DR GeneID; 2542130; -.
DR KEGG; spo:SPAC13A11.05; -.
DR NMPDR; fig|4896.1.peg.2963; -.
DR GeneDB_Spombe; SPAC13A11.05; -.
DR BioCyc; SPOM-XXX-01:SPOM-XXX-01-000717-MON; -.
DR ArrayExpress; Q09735; -.
DR GO; GO:0005737; C:cytoplasm; IEA:InterPro.
DR GO; GO:0004177; F:aminopeptidase activity; IEA:InterPro.
DR GO; GO:0030145; F:manganese ion binding; IEA:InterPro.
DR GO; GO:0008235; F:metalloexopeptidase activity; IEA:InterPro.
DR GO; GO:0008270; F:zinc ion binding; IEA:UniProtKB-KW.
DR GO; GO:0006508; P:proteolysis; IEA:InterPro.
DR InterPro; IPR011356; Peptidase_M17.
DR InterPro; IPR000819; Peptidase_M17_C.
DR InterPro; IPR008283; Peptidase_M17_N.
DR PANTHER; PTHR11963:SF3; Peptidase_M17; 1.
DR Pfam; PF00883; Peptidase_M17; 1.
DR Pfam; PF02789; Peptidase_M17_N; 1.
DR PRINTS; PR00481; LAMNOPPTDASE.
DR PROSITE; PS00631; CYTOSOL_AP; 1.
PE 2: Evidence at transcript level;
KW Aminopeptidase; Complete proteome; Cytoplasm; Hydrolase;
KW Metal-binding; Protease; Zinc.
FT CHAIN 1 513 Putative aminopeptidase C13A11.05.
FT /FTId=PRO_0000165853.
FT ACT_SITE 292 292 Potential.
FT ACT_SITE 366 366 Potential.
FT METAL 280 280 Zinc 2 (By similarity).
FT METAL 285 285 Zinc 1 (By similarity).
FT METAL 285 285 Zinc 2 (By similarity).
FT METAL 303 303 Zinc 2 (By similarity).
FT METAL 362 362 Zinc 1 (By similarity).
FT METAL 364 364 Zinc 1 (By similarity).
FT METAL 364 364 Zinc 2 (By similarity).
SQ SEQUENCE 513 AA; 56195 MW; F904CC0607502018 CRC64;
MKGLGLSTRT FNWSSLSSIL LPRIPLATTK ADSLILAVRH DKQVFSEDYR QVVDQYFETS
PKKNDIRLFW NTQGFVRLAI VQLEENVSEK SVRSAAAEAA KILKSNGAKS IAVDGMGFPK
DAALGAALAT YDFSLRRDHL SVYQDEKVVE KENLFTSPAP ERLTFQLLSN TSEKKTATAE
ENAFKVGLIE AAAQNLARSL MECPANYMTS LQFCHFAQEL FQNSSKVKVF VHDEKWIDEQ
KMNGLLTVNA GSDIPPRFLE VQYIGKEKSK DDGWLGLVGK GVTFDSGGIS IKPSQNMKEM
RADMGGAAVM LSSIYALEQL SIPVNAVFVT PLTENLPSGS AAKPGDVIFM RNGLSVEIDN
TDAEGRLILA DAVHYVSSQY KTKAVIEAST LTGAMLVALG NVFTGAFVQG EELWKNLETA
SHDAGDLFWR MPFHEAYLKQ LTSSSNADLC NVSRAGGGCC TAAAFIKCFL AQKDLSFAHL
DIAGVMDKQL NSWDCDGMSG RPVRTIIEVA RKY
//
The alignment shows that all the functional sites are conserved i.e. metal
ion binding sites and the active sites between the S. pombe sequence and the
bovine one. However, because of the nature of the family it is not possible,
with the evidence available, to classify this completely. Hence all
available information is added and the entry is referred to as a "putative"
aminopeptidase.
YA55_SCHPO MKGLGLSTRTFNWSSLSSILLPRIPLATTKADSLIL-AVRHDKQVFSEDYRQVVDQYFET
AMPL_BOVIN TKGLVLGIYSKEKEEDE----PQFTSAGENFNKLVSGKLREILNISGPSLKAGKTRTFYG
*** *. : : .. . *::. * : :.*: :*. :: . . : : *
YA55_SCHPO SPKKNDIRLFWNTQGFVRLAIVQLEENVSE--KSVRSAAAEAAKILKSNGAKSIAVDGMG
AMPL_BOVIN --LHEDFPSVVVVGLGKKTAGIDEQENWHEGKENIRAAVAAGCRQIQDLEIPSVEVDPCG
::*: . . : * :: :** * :.:*:*.* ..: ::. *: ** *
YA55_SCHPO FPKDAALGAALATYDFSLRRDHLSVYQDEKVVEKENLFTSPAPERLTFQLLSNTSEKKTA
AMPL_BOVIN DAQAAAEGAVLGLYEYDDLK------QKRKVVVSAKLHGSEDQE----------------
.: ** **.*. *::. : *..*** . :*. * *
YA55_SCHPO TAEENAFKVGLIEAAAQNLARSLMECPANYMTSLQFCHFAQELFQ-NSSKVKVFVHDEKW
AMPL_BOVIN -----AWQRGVLFASGQNLARRLMETPANEMTPTKFAEIVEENLKSASIKTDVFIRPKSW
*:: *:: *:.***** *** *** **. :*..:.:* :: * *..**:: :.*
YA55_SCHPO IDEQKMNGLLTVNAGSDIPPRFLEVQYIGKEKSKDDGWLGLVGKGVTFDSGGISIKPSQN
AMPL_BOVIN IEEQEMGSFLSVAKGSEEPPVFLEIHYKGSPNASE-PPLVFVGKGITFDSGGISIKAAAN
*:**:*..:*:* **: ** ***::* *. ::.: * :****:**********.: *
YA55_SCHPO MKEMRADMGGAAVMLSSIYALEQLSIPVNAVFVTPLTENLPSGSAAKPGDVIFMRNGLSV
AMPL_BOVIN MDLMRADMGGAATICSAIVSAAKLDLPINIVGLAPLCENMPSGKANKPGDVVRARNGKTI
*. *********.: *:* : :*.:*:* * ::** **:***.* *****: *** ::
YA55_SCHPO EIDNTDAEGRLILADAVHYVSSQYKTKAVIEASTLTGAMLVALGNVFTGAFVQGEELWKN
AMPL_BOVIN QVDNTDAEGRLILADALCYAHT-FNPKVIINAATLTGAMDIALGSGATGVFTNSSWLWNK
::**************: *. : ::.*.:*:*:****** :***. **.*.:.. **::
YA55_SCHPO LETASHDAGDLFWRMPFHEAYLKQLTSSSNADLCNVSRAG-GGCCTAAAFIKCFLAQKDL
AMPL_BOVIN LFEASIETGDRVWRMPLFEHYTRQVIDCQLADVNNIGKYRSAGACTAAAFLKEFVTHP--
* ** ::** .****:.* * :*: ... **: *:.: .*.******:* *:::
YA55_SCHPO SFAHLDIAGVMD-KQLNSWDCDGMSGRPVRTIIEVARKY-----
AMPL_BOVIN KWAHLDIAGVMTNKDEVPYLRKGMAGRPTRTLIEFLFRFSQDSA
.:********* *: .: .**:***.**:**. ::
6) No similarity to any existing entries
From the genome sequencing data the majority of proteins translated from
predicted open reading frames have no sequence similarity to any existing
proteins. In these cases the proteins remain "hypothetical". It should be
noted here that we analyze these sequences by a number of programs so that
we can at least add some potential information, rather than having just an
entry containing submission and sequence data. Again, in these cases, care
is taken to show that this information is potential so that it cannot be
mixed up with data from classified proteins.
The features we currently look for are signal sequences, transmembrane
regions, coiled coil domains and a number of conserved domains described
in PROSITE, Pfam and SMART.
a) Signal sequence prediction
We make use of the SignalP program [R1] in its latest implementation
(version 3.0). The method incorporates a prediction of cleavage sites and
a signal peptide/non-signal peptide prediction based on a combination of
several artificial neural networks and hidden Markov models. The result in
the entry is of the type:
FT SIGNAL 1 x Potential.
FT CHAIN x y
b) Transmembrane region prediction
Transmembrane helices are predicted using the TMHMM (version 2.0) program
[R2] which we have found [R3] to give the best results. In some cases we
complement the results of this method with predictions obtained with two
other programs, ESKM [R4] and MEMSAT [R5].
Predicted transmembrane helices are indicated as:
FT TRANSMEM x y Potential.
c) Coiled coil prediction
We make use of a program based on the algorithm of Lupas et al [R6] that
predicts coiled coil regions within the sequence. A positive result of this
program is:
FT DOMAIN x y Coiled coil (Potential).
d) REP
The program REP [R7] is used to annotate a number of well defined, yet
very variable protein repeats. The program currently recognize the
following types of repeats: Ankyrin, Armadillo, HAT, HEAT, HEAT_AAA,
HEAT_ADB, HEAT_IMB, Kelch, Leucine-rich Repeats, PFTA, PFTB, RCC1, TPR
and WD40.
Repeats detected by this program are annotated at the level of the
feature tables, specific keywords and CC lines are also added to the
entry. In the following example the lines that have been indented are
those where information has been added following the detection of
a repeat:
ID YEX2_SCHPO Reviewed; 361 AA.
AC O13856; Q1K9B5;
DT 16-AUG-2004, integrated into UniProtKB/Swiss-Prot.
DT 01-JAN-1998, sequence version 1.
DT 03-MAR-2009, entry version 46.
DE RecName: Full=Uncharacterized WD repeat-containing protein C1A6.02;
GN ORFNames=SPAC1A6.02, SPAC23C4.21;
OS Schizosaccharomyces pombe (Fission yeast).
OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC Schizosaccharomycetes; Schizosaccharomycetales;
OC Schizosaccharomycetaceae; Schizosaccharomyces.
OX NCBI_TaxID=4896;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 38366 / 972;
RX MEDLINE=21848401; [Pubmed: 11859360] [Article from publisher]
RA Wood V., Gwilliam R., Rajandream M.A., Lyne M.H., Lyne R., Stewart A.,
RA Sgouros J.G., Peat N., Hayles J., Baker S.G., Basham D., Bowman S.,
RA Brooks K., Brown D., Brown S., Chillingworth T., Churcher C.M.,
RA Collins M., Connor R., Cronin A., Davis P., Feltwell T., Fraser A.,
RA Gentles S., Goble A., Hamlin N., Harris D.E., Hidalgo J., Hodgson G.,
RA Holroyd S., Hornsby T., Howarth S., Huckle E.J., Hunt S., Jagels K.,
RA James K.D., Jones L., Jones M., Leather S., McDonald S., McLean J.,
RA Mooney P., Moule S., Mungall K.L., Murphy L.D., Niblett D., Odell C.,
RA Oliver K., O'Neil S., Pearson D., Quail M.A., Rabbinowitsch E.,
RA Rutherford K.M., Rutter S., Saunders D., Seeger K., Sharp S.,
RA Skelton J., Simmonds M.N., Squares R., Squares S., Stevens K.,
RA Taylor K., Taylor R.G., Tivey A., Walsh S.V., Warren T., Whitehead S.,
RA Woodward J.R., Volckaert G., Aert R., Robben J., Grymonprez B.,
RA Weltjens I., Vanstreels E., Rieger M., Schaefer M., Mueller-Auer S.,
RA Gabel C., Fuchs M., Duesterhoeft A., Fritzc C., Holzer E., Moestl D.,
RA Hilbert H., Borzym K., Langer I., Beck A., Lehrach H., Reinhardt R.,
RA Pohl T.M., Eger P., Zimmermann W., Wedler H., Wambutt R., Purnelle B.,
RA Goffeau A., Cadieu E., Dreano S., Gloux S., Lelaure V., Mottier S.,
RA Galibert F., Aves S.J., Xiang Z., Hunt C., Moore K., Hurst S.M.,
RA Lucas M., Rochet M., Gaillardin C., Tallada V.A., Garzon A., Thode G.,
RA Daga R.R., Cruzado L., Jimenez J., Sanchez M., del Rey F., Benito J.,
RA Dominguez A., Revuelta J.L., Moreno S., Armstrong J., Forsburg S.L.,
RA Cerutti L., Lowe T., McCombie W.R., Paulsen I., Potashkin J.,
RA Shpakovski G.V., Ussery D., Barrell B.G., Nurse P.;
RT "The genome sequence of Schizosaccharomyces pombe.";
RL Nature 415:871-880(2002).
RN [2]
RP SUBCELLULAR LOCATION [LARGE SCALE ANALYSIS].
RX [Pubmed: 16823372] [Article from publisher]
RA Matsuyama A., Arai R., Yashiroda Y., Shirai A., Kamata A., Sekido S.,
RA Kobayashi Y., Hashimoto A., Hamamoto M., Hiraoka Y., Horinouchi S.,
RA Yoshida M.;
RT "ORFeome cloning and global analysis of protein localization in the
RT fission yeast Schizosaccharomyces pombe.";
RL Nat. Biotechnol. 24:841-847(2006).
CC -!- SUBCELLULAR LOCATION: Nucleus, nucleolus.
CC -!- SIMILARITY: Contains 6 WD repeats.
DR EMBL; CU329670; CAB16892.2; -; Genomic_DNA.
DR PIR; T38005; T38005.
DR RefSeq; XP_001713048.1; -.
DR GeneID; 3361484; -.
DR KEGG; spo:SPAC1A6.02; -.
DR GeneDB_Spombe; SPAC1A6.02; -.
DR ArrayExpress; O13856; -.
DR GO; GO:0005730; C:nucleolus; IDA:GeneDB_SPombe.
DR InterPro; IPR015943; WD40/YVTN_repeat-like.
DR InterPro; IPR001680; WD40_repeat.
DR InterPro; IPR017986; WD40_repeat_region.
DR InterPro; IPR017422; WD_repeat_p55.
DR Gene3D; G3DSA:2.130.10.10; WD40/YVTN_repeat-like; 1.
DR PIRSF; PIRSF038169; WD_repeat_p55; 1.
DR SMART; SM00320; WD40; 5.
DR PROSITE; PS00678; WD_REPEATS_1; FALSE_NEG.
DR PROSITE; PS50082; WD_REPEATS_2; FALSE_NEG.
DR PROSITE; PS50294; WD_REPEATS_REGION; 1.
PE 2: Evidence at transcript level;
KW Complete proteome; Nucleus; Repeat; WD repeat.
FT CHAIN 1 361 Uncharacterized WD repeat-containing
FT protein C1A6.02.
FT /FTId=PRO_0000051486.
FT REPEAT 57 96 WD 1.
FT REPEAT 103 142 WD 2.
FT REPEAT 146 184 WD 3.
FT REPEAT 187 229 WD 4.
FT REPEAT 237 275 WD 5.
FT REPEAT 280 318 WD 6.
SQ SEQUENCE 361 AA; 39780 MW; 38DD785710325C03 CRC64;
MGGTINAAIK QKFENEIFDL ACFGENQVLL GFSNGRVSSY QYDVAQISLV EQWSTKRHKK
SCRNISVNES GTEFISVGSD GVLKIADTST GRVSSKWIVD KNKEISPYSV VQWIENDMVF
ATGDDNGCVS VWDKRTEGGI IHTHNDHIDY ISSISPFEER YFVATSGDGV LSVIDARNFK
KPILSEEQDE EMTCGAFTRD QHSKKKFAVG TASGVITLFT KGDWGDHTDR ILSPIRSHDF
SIETITRADS DSLYVGGSDG CIRLLHILPN KYERIIGQHS SRSTVDAVDV TTEGNFLVSC
SGTELAFWPV DQKEGDESSS SDNLDSDEDS SSDSEFSSPK KKKKVGNQGK KPLGTDFFDG
L
//
e) PROSITE
PROSITE (http://www.expasy.org/prosite/), the database of protein domains
and families, plays a very big role in the addition of features in
Swiss-Prot entries, especially when no other information is available for
the sequence. Where patterns are matched this can lead to the addition of
comment lines, keywords, features either individually or in any
combination. As an example:
ID NOP12_SCHPO Reviewed; 438 AA.
AC O13741;
DT 02-NOV-2001, integrated into UniProtKB/Swiss-Prot.
DT 01-JAN-1998, sequence version 1.
DT 03-MAR-2009, entry version 59.
DE RecName: Full=Nucleolar protein 12;
GN Name=nop12; ORFNames=SPAC16E8.06c;
OS Schizosaccharomyces pombe (Fission yeast).
OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC Schizosaccharomycetes; Schizosaccharomycetales;
OC Schizosaccharomycetaceae; Schizosaccharomyces.
OX NCBI_TaxID=4896;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 38366 / 972;
RX MEDLINE=21848401; [Pubmed: 11859360] [Article from publisher]
RA Wood V., Gwilliam R., Rajandream M.A., Lyne M.H., Lyne R., Stewart A.,
RA Sgouros J.G., Peat N., Hayles J., Baker S.G., Basham D., Bowman S.,
RA Brooks K., Brown D., Brown S., Chillingworth T., Churcher C.M.,
RA Collins M., Connor R., Cronin A., Davis P., Feltwell T., Fraser A.,
RA Gentles S., Goble A., Hamlin N., Harris D.E., Hidalgo J., Hodgson G.,
RA Holroyd S., Hornsby T., Howarth S., Huckle E.J., Hunt S., Jagels K.,
RA James K.D., Jones L., Jones M., Leather S., McDonald S., McLean J.,
RA Mooney P., Moule S., Mungall K.L., Murphy L.D., Niblett D., Odell C.,
RA Oliver K., O'Neil S., Pearson D., Quail M.A., Rabbinowitsch E.,
RA Rutherford K.M., Rutter S., Saunders D., Seeger K., Sharp S.,
RA Skelton J., Simmonds M.N., Squares R., Squares S., Stevens K.,
RA Taylor K., Taylor R.G., Tivey A., Walsh S.V., Warren T., Whitehead S.,
RA Woodward J.R., Volckaert G., Aert R., Robben J., Grymonprez B.,
RA Weltjens I., Vanstreels E., Rieger M., Schaefer M., Mueller-Auer S.,
RA Gabel C., Fuchs M., Duesterhoeft A., Fritzc C., Holzer E., Moestl D.,
RA Hilbert H., Borzym K., Langer I., Beck A., Lehrach H., Reinhardt R.,
RA Pohl T.M., Eger P., Zimmermann W., Wedler H., Wambutt R., Purnelle B.,
RA Goffeau A., Cadieu E., Dreano S., Gloux S., Lelaure V., Mottier S.,
RA Galibert F., Aves S.J., Xiang Z., Hunt C., Moore K., Hurst S.M.,
RA Lucas M., Rochet M., Gaillardin C., Tallada V.A., Garzon A., Thode G.,
RA Daga R.R., Cruzado L., Jimenez J., Sanchez M., del Rey F., Benito J.,
RA Dominguez A., Revuelta J.L., Moreno S., Armstrong J., Forsburg S.L.,
RA Cerutti L., Lowe T., McCombie W.R., Paulsen I., Potashkin J.,
RA Shpakovski G.V., Ussery D., Barrell B.G., Nurse P.;
RT "The genome sequence of Schizosaccharomyces pombe.";
RL Nature 415:871-880(2002).
RN [2]
RP PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-94 AND SER-95, AND MASS
RP SPECTROMETRY.
RX [Pubmed: 18257517] [Article from publisher]
RA Wilson-Grady J.T., Villen J., Gygi S.P.;
RT "Phosphoproteome analysis of fission yeast.";
RL J. Proteome Res. 7:1088-1097(2008).
CC -!- FUNCTION: Involved in pre-25S rRNA processing (By similarity).
CC -!- SUBCELLULAR LOCATION: Nucleus, nucleolus (By similarity).
CC -!- SIMILARITY: Belongs to the RRM RBM34 family.
CC -!- SIMILARITY: Contains 2 RRM (RNA recognition motif) domains.
DR EMBL; CU329670; CAB11047.1; -; Genomic_DNA.
DR PIR; T37786; T37786.
DR RefSeq; NP_594218.1; -.
DR HSSP; P33240; 1P1T.
DR GeneID; 2542314; -.
DR KEGG; spo:SPAC16E8.06c; -.
DR NMPDR; fig|4896.1.peg.4188; -.
DR GeneDB_Spombe; SPAC16E8.06c; -.
DR BioCyc; SPOM-XXX-01:SPOM-XXX-01-001830-MON; -.
DR ArrayExpress; O13741; -.
DR GO; GO:0005730; C:nucleolus; IDA:GeneDB_SPombe.
DR GO; GO:0000166; F:nucleotide binding; IEA:InterPro.
DR GO; GO:0003723; F:RNA binding; IEA:UniProtKB-KW.
DR GO; GO:0006364; P:rRNA processing; IEA:UniProtKB-KW.
DR InterPro; IPR012677; a_b_plait_nuc_bd.
DR InterPro; IPR000504; RRM_RNP1.
DR Gene3D; G3DSA:3.30.70.330; a_b_plait_nuc_bd; 2.
DR Pfam; PF00076; RRM_1; 1.
DR SMART; SM00360; RRM; 2.
DR PROSITE; PS50102; RRM; 2.
PE 1: Evidence at protein level;
KW Complete proteome; Nucleus; Phosphoprotein; Repeat;
KW Ribosome biogenesis; RNA-binding; rRNA processing.
FT CHAIN 1 438 Nucleolar protein 12.
FT /FTId=PRO_0000081673.
FT DOMAIN 164 262 RRM 1.
FT DOMAIN 270 348 RRM 2.
FT COMPBIAS 20 23 Poly-Ser.
FT COMPBIAS 81 90 Poly-Lys.
FT MOD_RES 94 94 Phosphoserine.
FT MOD_RES 95 95 Phosphoserine.
SQ SEQUENCE 438 AA; 49381 MW; 3E943401F95E7C12 CRC64;
MGETNSSLDN ENTSFVGKLS SSSNVDPTLN LLFSQSKPIP KPVAKETTVL TKKDVEVEEA
NGVEEAAETI ESDTKEVQNI KPKSKKKKKK LNDSSDDIEG KYFEELLAEE DEEKDKDSAG
LINDEEDKSP AKQSVLEERT SQEDVKSERE VAEKLANELE KSDKTVFVNN LPARVVTNKG
DYKDLTKHFR QFGAVDSIRF RSLAFSEAIP RKVAFFEKKF HSERDTVNAY IVFRDSSSAR
SALSLNGTMF MDRHLRVDSV SHPMPQDTKR CVFVGNLAFE AEEEPLWRYF GDCGSIDYVR
IVRDPKTNLG KGFAYIQFKD TMGVDKALLL NEKKMPEGRT LRIMRAKSTK PKSITRSKRG
DEKTRTLQGR ARKLIGKAGN ALLQQELALE GHRAKPGENP LAKKKVNKKR KERAAQWRNK
KAESVGKKQK TAAGKKDK
//
In the above example note the PROSITE entries represented in the DR lines.
These matches have helped in the addition of the similarity comment and the
RNA-binding RRM domains to the feature table.
We have a method that automatically annotates a number of sites or domains
using PROSITE patterns anf profiles. All features copied into the feature
table by using facility are closely assessed to ensure that they are valid
for the particular sequence from that particular organism.
f) Pfam
Pfam [R10] (http://pfam.sanger.ac.uk/) is a large collection of multiple
sequence alignments and hidden Markov models covering many common protein
domains. Great use is made of this database, in conjunction with PROSITE,
for the automatic addition of annotation to TrEMBL entries. It also provides
important information for the curators as they begin to annotate TrEMBL
entries by highlighting the type of domain the sequence has.
g) Tyrosine sulfation sites
Tyrosine sulfation sites are predicted using a software tool called the
Sulfinator [R12]. The Sulfinator employs four different Hidden Markov
Models. The program in only run on eukaryotic proteins that are predicted
or supposed to be secreted or to have at least one extracellular
domain. The sulfation site are indicated as being "Potential". Example:
FT MOD_RES 200 200 Sulfotyrosine (Potential).
Extradom.txt
This file outlines the nomenclature proposal for domains (or modules) found
mainly in extracellular proteins of higher eukaryotes. It shows the standard
nomenclature applied to these classified domains in Swiss-Prot entries. It
can be found via the Web at http://www.uniprot.org/docs/extradom It is one
of numerous documents (available from: http://www.uniprot.org/docs/) that are
distributed with UniProtKB/Swiss-Prot.
Please note that when there is a modification or a binding event, "potential"
is added to show that these have not been determined experimentally. Below
is an example of such cases.
ID YA9A_SCHPO Reviewed; 530 AA.
AC Q09788;
DT 01-NOV-1995, integrated into UniProtKB/Swiss-Prot.
DT 01-NOV-1995, sequence version 1.
DT 25-NOV-2008, entry version 47.
DE RecName: Full=Uncharacterized serine-rich protein C13G6.10c;
DE Flags: Precursor;
GN ORFNames=SPAC13G6.10c;
OS Schizosaccharomyces pombe (Fission yeast).
OC Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina;
OC Schizosaccharomycetes; Schizosaccharomycetales;
OC Schizosaccharomycetaceae; Schizosaccharomyces.
OX NCBI_TaxID=4896;
RN [1]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=ATCC 38366 / 972;
RX MEDLINE=21848401; [Pubmed: 11859360] [Article from publisher]
RA Wood V., Gwilliam R., Rajandream M.A., Lyne M.H., Lyne R., Stewart A.,
RA Sgouros J.G., Peat N., Hayles J., Baker S.G., Basham D., Bowman S.,
RA Brooks K., Brown D., Brown S., Chillingworth T., Churcher C.M.,
RA Collins M., Connor R., Cronin A., Davis P., Feltwell T., Fraser A.,
RA Gentles S., Goble A., Hamlin N., Harris D.E., Hidalgo J., Hodgson G.,
RA Holroyd S., Hornsby T., Howarth S., Huckle E.J., Hunt S., Jagels K.,
RA James K.D., Jones L., Jones M., Leather S., McDonald S., McLean J.,
RA Mooney P., Moule S., Mungall K.L., Murphy L.D., Niblett D., Odell C.,
RA Oliver K., O'Neil S., Pearson D., Quail M.A., Rabbinowitsch E.,
RA Rutherford K.M., Rutter S., Saunders D., Seeger K., Sharp S.,
RA Skelton J., Simmonds M.N., Squares R., Squares S., Stevens K.,
RA Taylor K., Taylor R.G., Tivey A., Walsh S.V., Warren T., Whitehead S.,
RA Woodward J.R., Volckaert G., Aert R., Robben J., Grymonprez B.,
RA Weltjens I., Vanstreels E., Rieger M., Schaefer M., Mueller-Auer S.,
RA Gabel C., Fuchs M., Duesterhoeft A., Fritzc C., Holzer E., Moestl D.,
RA Hilbert H., Borzym K., Langer I., Beck A., Lehrach H., Reinhardt R.,
RA Pohl T.M., Eger P., Zimmermann W., Wedler H., Wambutt R., Purnelle B.,
RA Goffeau A., Cadieu E., Dreano S., Gloux S., Lelaure V., Mottier S.,
RA Galibert F., Aves S.J., Xiang Z., Hunt C., Moore K., Hurst S.M.,
RA Lucas M., Rochet M., Gaillardin C., Tallada V.A., Garzon A., Thode G.,
RA Daga R.R., Cruzado L., Jimenez J., Sanchez M., del Rey F., Benito J.,
RA Dominguez A., Revuelta J.L., Moreno S., Armstrong J., Forsburg S.L.,
RA Cerutti L., Lowe T., McCombie W.R., Paulsen I., Potashkin J.,
RA Shpakovski G.V., Ussery D., Barrell B.G., Nurse P.;
RT "The genome sequence of Schizosaccharomyces pombe.";
RL Nature 415:871-880(2002).
DR EMBL; CU329670; CAA91103.1; -; Genomic_DNA.
DR PIR; S62439; S62439.
DR RefSeq; NP_592836.1; -.
DR GeneID; 2542895; -.
DR KEGG; spo:SPAC13G6.10c; -.
DR NMPDR; fig|4896.1.peg.2806; -.
DR GeneDB_Spombe; SPAC13G6.10c; -.
DR BioCyc; SPOM-XXX-01:SPOM-XXX-01-000580-MON; -.
DR ArrayExpress; Q09788; -.
DR GO; GO:0009986; C:cell surface; NAS:GeneDB_SPombe.
DR GO; GO:0005783; C:endoplasmic reticulum; IDA:GeneDB_SPombe.
DR GO; GO:0005794; C:Golgi apparatus; IDA:GeneDB_SPombe.
DR GO; GO:0003824; F:catalytic activity; IEA:InterPro.
DR GO; GO:0043169; F:cation binding; IEA:InterPro.
DR GO; GO:0005975; P:carbohydrate metabolic process; IEA:InterPro.
DR InterPro; IPR013781; Glyco_hydro_sub_cat.
DR Gene3D; G3DSA:3.20.20.80; Glyco_hydro_cat; 1.
PE 2: Evidence at transcript level;
KW Complete proteome; Glycoprotein; Signal.
FT SIGNAL 1 18 Potential.
FT CHAIN 19 530 Uncharacterized serine-rich protein
FT C13G6.10c.
FT /FTId=PRO_0000014190.
FT CARBOHYD 55 55 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 120 120 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 128 128 N-linked (GlcNAc...) (Potential).
SQ SEQUENCE 530 AA; 54211 MW; 1C6A0261F63DFF02 CRC64;
MRTTFATVAL AFLSTVGALP YAPNHRHHRR DDDGVLTVYE TILETVYVTA VPGANSSSSY
TSYSTGLASV TESSDDGAST ALPTTSTESV VVTTSAPAAS SSATSYPATF VSTPLYTMDN
VTAPVWSNTS VPVSTPETSA TSSSEFFTSY PATSSESSSS YPASSTEVAS SYSASSTEVT
SSYPASSEVA TSTSSYVAPV SSSVASSSEI SAGSATSYVP TSSSSIALSS VVASASVSAA
NKGVSTPAVS SAAASSSAVV SSVVSSATSV AASSTISSAT SSSASASPTS SSVSGKRGLA
WIPGTDLGYS DNFVNKGINW YYNWGSYSSG LSSSFEYVLN QHDANSLSSA SSVFTGGATV
IGFNEPDLSA AGNPIDAATA ASYYLQYLTP LRESGAIGYL GSPAISNVGE DWLSEFMSAC
SDCKIDFIAC HWYGIDFSNL QDYINSLANY GLPIWLTEFA CTNWDDSNLP SLDEVKTLMT
SALGFLDGHG SVERYSWFAP ATELGAGVGN NNALISSSGG LSEVGEIYIS
//
=======
Summary
=======
This has been an introduction to the world of annotation at Swiss-Prot.
There are numerous sources of information available to the curators and it
is our job to assess these and to add only relevant information to the
entries. These sources are primarily publications reporting the isolation of
particular genes and proteins. It is not only biochemical data that is
weaned but also molecular biology and genetic information too. For example,
we have thousands of entries in Swiss-Prot with reports of alternative
splicing as well as genetic map information. Coupled to reading publications
is looking at the data bank itself. In an attempt to maintain consistency
all new entries are checked, via alignments, to see if they belong to a
particular family. When yes, information is copied, but at the same time
checked, from similar entries.
All sources of information are given in Swiss-Prot entries. The reference
blocks show what is represented in the corresponding publication(s). They
therefore act as sources for the information given in the entry. This can be
direct sequencing of the isolated protein (RP SEQUENCE), sequencing of the
gene encoding the protein (RP SEQUENCE FROM N.A.), biochemical studies (RP
CHARACTERIZATION) and 3D studies (e.g. RP X-RAY CRYSTALLOGRAPHY) to name but
a few. It should be noted that in the earlier days of Swiss-Prot annotation
characterization studies may have been carried out but where represented as
only "SEQUENCE FROM N.A." It would be possible to alter these retrospectively,
although in doing so would detract from our current, labor-intensive process
of making new sequences available.
The annotation of Swiss-Prot entries involves extensive knowledge of all
types of proteins, a complete understanding of the Swiss-Prot database
itself as well as skills in assessing alignment programs and pattern
databases. All of these must be considered as one, for each individual
sequence, and all information resulting from these sources is skillfully
assessed before addition to the entry. Therefore we can say that the every
effort is made to ensure that the features and comments in Swiss-Prot are
complete, correct and have pointers to the information source.
Note: a short version of this document has been originally published as:
Junker V.L., Apweiler R., Bairoch A.
Representation of functional information in the Swiss-Prot data bank.
Bioinformatics 15:1066-1067(1999).
==================
Methods references
==================
[R1] Nielsen H., Engelbrecht J., Brunak S., von Heijne G.
Identification of prokaryotic and eukaryotic signal peptides and
prediction of their cleavage sites.
Protein Eng. 10:1-6(1997).
[Pubmed: 9051728]
[R2] Krogh A., Larsson B., von Heijne G., Sonnhammer E.L.L.
Predicting transmembrane protein topology with a hidden Markov
model: application to complete genomes.
J. Mol. Biol. 305:567-580(2001).
[Pubmed: 11152613]
[R3] Moeller S., Croning M.D.R., Apweiler R.
Evaluation of methods for the prediction of membrane spanning
regions.
Bioinformatics 17:646-653(2001).
[R4] Eisenberg D., Schwarz E., Komaromy M., Wall R.
Analysis of membrane and surface protein sequences with the
hydrophobic moment plot.
J. Mol. Biol. 179:125-142(1984).
[R5] Jones D.T., Taylor W.R., Thornton J.M.
A model recognition approach to the prediction of all-helical
membrane protein structure and topology.
Biochemistry 33:3038-3049(1994).
[R6] Lupas A., Van Dyke M., Stock J.
Predicting coiled coils from protein sequences.
Science 252:1162-1164(1991).
[Pubmed: 2031185]
[R7] Andrade M.A., Ponting C., Gibson T., Bork P.
Identification of protein repeats and statistical significance of
sequence comparisons.
J. Mol. Biol. 298:521-537(2000).
[R8] Apweiler R., Attwood T.K., Bairoch A., Bateman A., Birney E.,
Biswas M., Bucher P., Cerutti L., Corpet F., Croning M.D., Durbin R.,
Falquet L., Fleischmann W., Gouzy J., Hermjakob H., Hulo N.,
Jonassen I., Kahn D., Kanapin A., Karavidopoulou Y., Lopez R.,
Marx B., Mulder N.J., Oinn T.M., Pagni M., Servant F., Sigrist C.J.,
Zdobnov E.M.
InterPro -- an integrated documentation resource for protein
families, domains and functional sites
Bioinformatics 16:1145-1150(2000).
[Pubmed: 11125043]
[R9] Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J, Hofmann K.,
Bairoch A.
The PROSITE database, its status in 2002.
Nucleic Acids Res. 30:235-238(2002).
[Pubmed: 11752303]
[R10] Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R.,
Griffiths-Jones S., Howe K.L., Marshall M., Sonnhammer E.L.
The Pfam protein families database.
Nucleic Acids Res. 30:276-280(2002).
[R11] Ponting C.P., Schultz J., Milpetz F., Bork P.
SMART: identification and annotation of domains from signalling and
extracellular protein sequences.
Nucleic Acids Res. 27:229-232(1999).
[Pubmed: 9847187]
[R12] Monigatti F., Gasteiger E., Bairoch A., Jung E.
The Sulfinator: predicting tyrosine sulfation sites in protein
sequences.
Bioinformatics 18:769-770(2002).
[Pubmed: 12050077]
-----------------------------------------------------------------------
Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
Distributed under the Creative Commons Attribution-NoDerivs License
-----------------------------------------------------------------------



