| UniProt Knowledgebase Swiss-Prot Protein Knowledgebase TrEMBL Protein Database What's new in XML? Release 15.5 of 07-Jul-2009 |
Also read about forthcoming changes, and recent and forthcoming changes for the flat file version of the UniProt Knowledgebase.
Questions regarding UniProtKB XML should be directed to our Help Desk.
| UniProtKB release 15.0 of 24-Mar-2009 |
|---|
The controlled vocabulary for organelles has changed in the flat file format of UniProtKB entries. For details of this change, please read the UniProt document What's new?.
To represent this data in the XML format, we changed the
GeneLocationType enumeration values in the XSD as shown in red:
<xs:complexType name="geneLocationType">
...
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="apicoplast"/>
<xs:enumeration value="chloroplast"/>
<!-- <xs:enumeration value="chromatophore"/> -->
<xs:enumeration value="cyanelle"/>
<xs:enumeration value="hydrogenosome"/>
<xs:enumeration value="mitochondrion"/>
<xs:enumeration value="non-photosynthetic plastid"/>
<xs:enumeration value="nucleomorph"/>
<xs:enumeration value="organellar chromatophore"/>
<xs:enumeration value="plasmid"/>
<xs:enumeration value="plastid"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
Example:
<geneLocation type="organellar chromatophore"/>
| UniProtKB release 14.7 of 20-Jan-2009 |
|---|
A new comment topic, DISRUPTION PHENOTYPE, has been introduced in the flat file format of UniProtKB entries. For details of this change, please read the UniProt document What's new?.
To represent this data in the XML format, we modified the XSD type
commentType in the following way:
<xs:complexType name="commentType">
...
<xs:attribute name="type" use="required">
...
<xs:enumeration value="disruption phenotype"/>
| UniProtKB release 14.0 of 22-Jul-2008 |
|---|
To increase the consistency of the different comment types, we changed the XSD type <commentType> in the following way (changes are highlighted in red):
<note> by <text>.<status> and <evidence> to the <text> element.<bpcCommentGroup>.
<xs:complexType name="commentType">
...
<xs:sequence>
<!-- <xs:element name="text" type="xs:string" minOccurs="0" maxOccurs="1">
<xs:annotation>
<xs:documentation>If a CC line type does not have a defined structure,
the text of this comment is stored in the element.
</xs:documentation>
</xs:annotation>
</xs:element> -->
<!-- <xs:group ref="bpcCommentGroup"/> -->
<xs:choice minOccurs="0" maxOccurs="1">
<xs:group ref="bpcCommentGroup"/>
...
</xs:choice>
<xs:element name="location" type="locationType" minOccurs="0" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>Used in 'mass spectrometry' and 'sequence caution' comments.</xs:documentation>
</xs:annotation>
</xs:element>
<!-- <xs:element name="note" type="xs:string" minOccurs="0" maxOccurs="1">
<xs:annotation>
<xs:documentation>If a CC line type contains a 'Note=',
the text of that note is stored in this element.
</xs:documentation>
</xs:annotation>
</xs:element> -->
<xs:element name="text" type="evidencedStringType" minOccurs="0">
<xs:annotation>
<xs:documentation>Used to store the contents of non-structured comment types,
as well the contents of the flat file 'Note=' field of structured comment types.
</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
...
<!-- <xs:attribute name="status" type="xs:string" use="optional">
<xs:annotation>
<xs:documentation>Some comments have a status reflecting their reliability (By similarity, Potential and Probable).
</xs:documentation>
</xs:annotation>
</xs:attribute> -->
...
<!-- <xs:attribute name="evidence" type="xs:string" use="optional"/> -->
</xs:complexType>
The XSD type evidencedStringType is defined as follows:
<xs:complexType name="evidencedStringType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="evidence" type="xs:string" use="optional"/>
<xs:attribute name="status" use="optional">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="By similarity"/>
<xs:enumeration value="Probable"/>
<xs:enumeration value="Potential"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
Examples:
From
<comment type="sequence caution">
<conflict type="erroneous gene model prediction">
<sequence version="1" resource="EMBL-CDS" id="BAA97015"/>
</conflict>
<note>The predicted gene At5g49940 has been split into 2 genes: At5g49940 and At5g49945.</note>
</comment>
To
<comment type="sequence caution">
<conflict type="erroneous gene model prediction">
<sequence version="1" resource="EMBL-CDS" id="BAA97015"/>
</conflict>
<text>The predicted gene At5g49940 has been split into 2 genes: At5g49940 and At5g49945.</text>
</comment>
From
<comment type="function" status="By similarity" evidence="EA3">
<text>Cytochrome c oxidase is the component of the respiratory chain.</text>
</comment>
To
<comment type="function">
<text status="By similarity" evidence="EA3">Cytochrome c oxidase is the component of the respiratory chain.</text>
</comment>
A new controlled vocabulary has been introduced in order to structure subcellular location comments. For details of this change, please read the UniProt document What's new?.
To represent this data in the XML format, we modified the XSD type
commentType as shown in red:
<xs:complexType name="commentType">
...
<xs:sequence>
...
<xs:choice minOccurs="0" maxOccurs="1">
...
<xs:sequence>
<xs:annotation>
<xs:documentation>Used in 'subcellular location' comments.</xs:documentation>
</xs:annotation>
<xs:element name="molecule" type="xs:string" minOccurs="0" maxOccurs="1"/>
<xs:element name="subcellularLocation" type="subcellularLocationType" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
...
The XSD type subcellularLocationType is defined as follows:
<xs:complexType name="subcellularLocationType">
<xs:sequence>
<xs:element name="location" type="evidencedStringType" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="topology" type="evidencedStringType" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="orientation" type="evidencedStringType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
Examples:
<comment type="subcellular location">
<subcellularLocation>
<location evidence="EA3">Mitochondrion inner membrane</location>
<topology evidence="EA3" status="By similarity">Multi-pass membrane protein</topology>
</subcellularLocation>
</comment>
<comment type="subcellular location">
<subcellularLocation>
<location>Cytoplasm</location>
</subcellularLocation>
<subcellularLocation>
<location>Endoplasmic reticulum membrane</location>
<topology>Peripheral membrane protein</topology>
</subcellularLocation>
<subcellularLocation>
<location>Golgi apparatus membrane</location>
<topology>Peripheral membrane protein</topology>
</subcellularLocation>
</comment>
<comment type="subcellular location">
<subcellularLocation>
<location>Cell membrane</location>
<topology status="By similarity">Peripheral membrane protein</topology>
</subcellularLocation>
<subcellularLocation>
<location status="By similarity">Secreted</location>
</subcellularLocation>
<text>The last 22 C-terminal amino acids may participate in cell membrane attachment.</text>
</comment>
<comment type="subcellular location">
<molecule>Isoform 2</molecule>
<subcellularLocation>
<location status="Probable">Cytoplasm</location>
</subcellularLocation>
</comment>
<comment type="subcellular location">
<subcellularLocation>
<location>Golgi apparatus</location>
<location>trans-Golgi network membrane</location>
<topology status="By similarity">Multi-pass membrane protein</topology>
</subcellularLocation>
<text>Predominantly found in the trans-Golgi network (TGN). Not redistributed to the plasma membrane in response to elevated copper levels.</text>
</comment>
<comment type="subcellular location">
<molecule>Isoform 2</molecule>
<subcellularLocation>
<location>Cytoplasm</location>
</subcellularLocation>
</comment>
<comment type="subcellular location">
<molecule>WND/140 kDa</molecule>
<subcellularLocation>
<location>Mitochondrion</location>
</subcellularLocation>
</comment>
The names which are stored in the <protein> element have been categorized to distinguish recommended, alternative and submitted names, etc.
For details of this change, please read the UniProt document
What's new?.
To represent this data in the XML format, we modified the XSD type
proteinType as shown in red:
<xs:complexType name="proteinType">
<xs:annotation>
<xs:documentation>Stores protein names.</xs:documentation>
</xs:annotation>
<xs:sequence>
<!-- <xs:element name="name" type="proteinNameType" maxOccurs="unbounded"/> -->
<xs:group ref="proteinNameGroup"/>
<xs:element name="domain" minOccurs="0" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>The domain list is equivalent to the INCLUDES section of the DE line.</xs:documentation>
</xs:annotation>
<xs:complexType>
<!-- <xs:sequence>
<xs:element name="name" type="proteinNameType" maxOccurs="unbounded"/>
</xs:sequence> -->
<xs:group ref="proteinNameGroup"/>
</xs:complexType>
</xs:element>
<xs:element name="component" minOccurs="0" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>The component list is equivalent to the CONTAINS section of the DE line.</xs:documentation>
</xs:annotation>
<xs:complexType>
<!-- <xs:sequence>
<xs:element name="name" type="proteinNameType" maxOccurs="unbounded"/>
</xs:sequence> -->
<xs:group ref="proteinNameGroup"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<!-- <xs:attribute name="type">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="fragment"/>
<xs:enumeration value="fragments"/>
<xs:enumeration value="version1"/>
<xs:enumeration value="version2"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="evidence" type="xs:string" use="optional">
<xs:annotation>
<xs:documentation>This contains all evidences that are connected to the complete DE line.</xs:documentation>
</xs:annotation>
</xs:attribute> -->
</xs:complexType>
The proteinNameType definition was deleted. The proteinNameGroup is defined as follows:
<xs:group name="proteinNameGroup">
<xs:sequence>
<xs:element name="recommendedName" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="fullName" type="evidencedStringType"/>
<xs:element name="shortName" type="evidencedStringType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="ref" type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="alternativeName" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="fullName" type="evidencedStringType" minOccurs="0"/>
<xs:element name="shortName" type="evidencedStringType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="ref" type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="submittedName" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="fullName" type="evidencedStringType"/>
</xs:sequence>
<xs:attribute name="ref" type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="allergenName" type="evidencedStringType" minOccurs="0"/>
<xs:element name="biotechName" type="evidencedStringType" minOccurs="0"/>
<xs:element name="CdAntigenName" type="evidencedStringType" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="innName" type="evidencedStringType" minOccurs="0" maxOccurs="unbounded/>
</xs:sequence>
</xs:group>
Two new attributes, precursor and fragment,
were added to the sequenceType:
<xs:complexType name="sequenceType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="length" type="xs:integer" use="required"/>
<xs:attribute name="mass" type="xs:integer" use="required"/>
<xs:attribute name="checksum" type="xs:string" use="required"/>
<xs:attribute name="modified" type="xs:date" use="required"/>
<xs:attribute name="version" type="xs:integer" use="required"/>
<xs:attribute name="precursor" type="xs:boolean" use="optional"/>
<xs:attribute name="fragment" use="optional">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="single"/>
<xs:enumeration value="multiple"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
Examples:
<protein>
<recommendedName>
<fullName>Interleukin-2</fullName>
<shortName>IL-2</shortName>
</recommendedName>
<alternativeName>
<fullName>T-cell growth factor</fullName>
<shortName>TCGF</shortName>
</alternativeName>
<innName>Aldesleukin</innName>
</protein>
<sequence precursor="true" ...>
<protein>
<recommendedName ref="1">
<fullName>A disintegrin and metalloproteinase domain 10</fullName>
<shortName>ADAM 10</shortName>
</recommendedName>
<alternativeName>
<fullName>Mammalian disintegrin-metalloprotease</fullName>
</alternativeName>
<alternativeName>
<fullName>Kuzbanian protein homolog</fullName>
</alternativeName>
<CdAntigenName>CD156c</CdAntigenName>
</protein>
<dbReference type="EC" key="1" id="EC 3.4.24.81"/>
<sequence fragment="single" precursor="true" ...>
<protein>
<recommendedName>
<fullName>Arginine biosynthesis bifunctional protein argJ</fullName>
</recommendedName>
<domain>
<recommendedName ref="1">
<fullName>Glutamate N-acetyltransferase</fullName>
</recommendedName>
<alternativeName>
<fullName>Ornithine acetyltransferase</fullName>
<shortName>OATase</shortName>
</alternativeName>
<alternativeName>
<fullName>Ornithine transacetylase</fullName>
</alternativeName>
</domain>
<domain>
<recommendedName ref="2">
<fullName>Amino-acid acetyltransferase</fullName>
</recommendedName>
<alternativeName>
<fullName>N-acetylglutamate synthase</fullName>
<shortName>AGS</shortName>
</alternativeName>
</domain>
<component>
<recommendedName>
<fullName>Arginine biosynthesis bifunctional protein argJ alpha chain</fullName>
</recommendedName>
</component>
<component>
<recommendedName>
<fullName>Arginine biosynthesis bifunctional protein argJ beta chain</fullName>
</recommendedName>
</component>
</protein>
<dbReference type="EC" key="1" id="EC 2.3.1.35"/>
<dbReference type="EC" key="2" id="EC 2.3.1.1"/>
We have added a new value to the controlled vocabulary of organelle names: Chromatophore. For details of this change, please read the UniProt document What's new?.
To represent this data in the XML format, we added a new enumeration value to the type attribute of the GeneLocationType in the XSD as shown in red:
<xs:complexType name="geneLocationType">
<xs:annotation>
<xs:documentation>Defines the locations/origins of the shown sequence (OG line).</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="name" type="statusType" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="apicoplast"/>
<xs:enumeration value="chloroplast"/>
<xs:enumeration value="chromatophore"/>
<xs:enumeration value="cyanelle"/>
<xs:enumeration value="hydrogenosome"/>
<xs:enumeration value="mitochondrion"/>
<xs:enumeration value="non-photosynthetic plastid"/>
<xs:enumeration value="nucleomorph"/>
<xs:enumeration value="plasmid"/>
<xs:enumeration value="plastid"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="evidence" type="xs:string" use="optional"/>
</xs:complexType>
Example:
<geneLocation type="chromatophore"/>
| UniProtKB release 13.6 of 01-Jul-2008 |
|---|
A new type of cross-reference, AGRICOLA, was added to the RX (Reference cross-reference) line in the the flat file format of UniProtKB entries. For details of this change, please read the UniProt document What's new?.
The type of a cross-reference is stored in the type attribute of the dbReference element.
The modification requires no change of the schema.
Example:
<dbReference type="AGRICOLA" id="IND20450567" key="31"/>
| UniProtKB release 13.0 of 26-Feb-2008 |
|---|
A new feature key, NON_STD, was introduced in the flat file format of UniProtKB entries to replace the key SE_CYS. At the same time, we changed the sequence to use the IUPAC/IUBMB recommended one-letter codes 'U' for selenocysteine and 'O' for pyrrolysine. For details of this change, please read the UniProt document What's new?.
To represent this data in the XML format, we modified the XSD type
featureType in the following way:
<xs:complexType name="featureType">
...
<xs:attribute name="type" use="required">
...
<!-- <xs:enumeration value="selenocysteine"/> -->
<xs:enumeration value="non-standard amino acid"/>
| UniProtKB release 12.5 of 13-Nov-2007 |
|---|
A new field, RESOLUTION, was added to the cross-references to the PDB database in the flat file format of UniProtKB entries. For details of this change, please read the UniProt document What's new?.
This optional field is represented in the XML format as an additional
<property> element.
The modification requires no change of the schema.
Example:
From
<dbReference type="PDB" id="1AUW" key="31">
<property type="method" value="X-ray"/>
<property type="chains" value="A/B/C/D=1-468"/>
</dbReference>
To
<dbReference type="PDB" id="1AUW" key="31">
<property type="method" value="X-ray"/>
<property type="resolution" value="1.80 A"/>
<property type="chains" value="A/B/C/D=1-468"/>
</dbReference>
| UniProtKB release 12.4 of 23-Oct-2007 |
|---|
We now provide all XSD files in uncompressed form and changed the names and locations of the XSD and XML files for keywords to use the same names for all distribution formats:
From:
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot.xsd.gz
To:
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot.xsd
From:
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/docs/keyword.xsd.gz
To:
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/docs/keywlist.xsd
From:
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/keydef.xml.gz and
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/docs/keydef.xml.gz
To:
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/docs/keywlist.xml.gz
The format of partial EC numbers has been modified. For details of this change, please read the UniProt document What's new?.
EC numbers are stored in the id attribute of the dbReference element.
The modification requires no change of the schema.
Example:
From
<dbReference type="EC" key="1" id="EC 3.4.24.-"/> <dbReference type="EC" key="1" id="EC 3.1.3.-"/>
To
<dbReference type="EC" key="1" id="EC 3.4.24.-"/> <dbReference type="EC" key="1" id="EC 3.1.3.n1"/>
| UniProtKB release 12.0 of 24-Jul-2007 |
|---|
A new line type, PE (Protein Existence), has been introduced in the flat file format of UniProtKB entries. For details of this change, please read the UniProt document What's new?.
To represent this data in the XML format, we have added a new child element,
proteinExistence, to the entry element in the XSD:
<xs:element name="entry">
...
<xs:element name="proteinExistence" type="proteinExistenceType"/>
...
</xs:element>
The proteinExistenceType is defined as follows:
<xs:complexType name="proteinExistenceType">
<xs:annotation>
<xs:documentation>Protein Existence (flat file: PE line).</xs:documentation>
</xs:annotation>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="evidence at protein level"/>
<xs:enumeration value="evidence at transcript level"/>
<xs:enumeration value="inferred from homology"/>
<xs:enumeration value="predicted"/>
<xs:enumeration value="uncertain"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
The controlled vocabulary that is used for database names in submission citations was modified. For details of this change, please read the UniProt document What's new?.
This information is stored in the db attribute of the citation element.
The modification requires no change of the schema.
Example:
From
<citation type="submission" db="Swiss-Prot" date="2007-03">
To
<citation type="submission" db="UniProtKB" date="2007-03">
| UniProtKB release 11.2 26-Jun-2007 |
|---|
The evidence attribute and the evidence element are
used in UniProtKB/TrEMBL to indicate the source of an annotation. We have begun to
introduce such evidence in UniProtKB/Swiss-Prot as well. In the initial phase,
automatic procedures are used to infer the evidence from the existing data
(mainly the contents of the scope element). It will also be
gradually part of the manual curation process. The completion of the retrofit of
existing UniProtKB/Swiss-Prot with evidence information will be an ongoing
process.
A new comment topic, SEQUENCE CAUTION, has been introduced in the flat file format of UniProtKB entries. For details of this change, please read the UniProt document What's new?.
To represent this data in the XML format, we have modifed the XSD type
commentType in the following way:
<xs:complexType name="commentType">
...
<xs:sequence>
...
<xs:choice minOccurs="0" maxOccurs="1">
...
<xs:element name="conflict">
<xs:annotation>
<xs:documentation>Used in the 'sequence caution' comment (flat file format: CC SEQUENCE CAUTION).</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="sequence" minOccurs="0" maxOccurs="1">
<xs:complexType>
<xs:attribute name="resource" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="EMBL-CDS"/>
<xs:enumeration value="EMBL"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="id" type="xs:string" use="required"/>
<xs:attribute name="version" type="xs:integer" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="frameshift"/>
<xs:enumeration value="erroneous initiation"/>
<xs:enumeration value="erroneous termination"/>
<xs:enumeration value="erroneous gene model prediction"/>
<xs:enumeration value="erroneous translation"/>
<xs:enumeration value="miscellaneous discrepancy"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="ref" type="xs:string" use="optional">
<xs:annotation>
<xs:documentation>Refers to the 'key' attribute of a 'reference' element.</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:complexType>
</xs:element>
...
</xs:choice>
<xs:element name="location" type="locationType" minOccurs="0" maxOccurs="unbounded">
<xs:annotation>
<xs:documentation>Used in 'mass spectrometry' and 'sequence caution' comments.</xs:documentation>
</xs:annotation>
</xs:element>
...
</xs:sequence>
...
<xs:attribute name="type" use="required">
...
<xs:enumeration value="sequence caution"/>
Note that the location element has been moved out of the xs:choice.
| UniProtKB release 10.1 06-Mar-2007 |
|---|
value="-"
Following previous agreement we no longer include cross-references properties with value="-".
Examples of cross-references that are affected:
PDB:
<dbReference type="PDB" id="2PGK" key="11"> <property type="method" value="X-ray"/> <property type="chains" value="-"/> </dbReference>
Becomes:
<dbReference type="PDB" id="2PGK" key="11"> <property type="method" value="X-ray"/> </dbReference>
EMBL:
<dbReference type="EMBL" id="BC001051" key="17"> <property type="protein sequence ID" value="-"/> <property type="status" value="NOT_ANNOTATED_CDS"/> <property type="molecule type" value="mRNA"/> </dbReference>
Becomes:
<dbReference type="EMBL" id="BC001051" key="17"> <property type="status" value="NOT_ANNOTATED_CDS"/> <property type="molecule type" value="mRNA"/> </dbReference>
GeneFarm:
<dbReference type="GeneFarm" id="2241" key="14"> <property type="family number" value="-"/> </dbReference>
Becomes:
<dbReference type="GeneFarm" id="2241" key="14"/>
| UniProtKB release 8.0 of 30-May-2006 |
|---|
The format of the ALTERNATIVE PRODUCTS Comment (CC) line in UniProt has changed. For details of this change, please see the UniProt flat file news. In order to accomodate this change, a new Ribosomal frameshifting value has been added to the attribute "type" of Event element.
<xs:complexType name="eventType"> ... <xs:attribute name="type" use="required"> <xs:simpleType> <xs:restriction base="xs:string> <xs:enumeration value="alternative splicing"/> <xs:enumeration value="alternative initiation"/> <xs:enumeration value="alternative promoter"/> <xs:enumeration value="ribosomal frameshifting"/> </xs:restriction> </xs:simpleType> </xs:attribute>
Additionally, ALTERNATIVE PRODUCTS comment is allowed to have a subelement note.
<xs:complexType name="commentType"> ... <xs:sequence> <xs:element name="event" type="eventType" minOccurs="1" maxOccurs="4"/> <xs:element name="isoform" type="isoformType" minOccurs="0" maxOccurs="unbounded"> <xs:element name="note" type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence>
New line type OH (Organism Host) was intruduced to viral UniProtKB entries. For details of this change, please see the UniProt flat file news. To represent this data in XML format, we introduced a new subelement of entry: organismHost.
The following has been added to the entry element in the XSD:
<xs:element name="organismHost" type="organismType" minOccurs="0" maxOccurs="unbounded"/>
| UniProtKB release 7.0 of 07-Feb-2006 |
|---|
The format of the Date (DT) lines in UniProt has changed. In order to
accomodate this change, a version attribute has been added to both
<entry> and <sequence>.
The following is added to the entry and sequence elements in the XSD:
<xs:attribute name="version" type="xs:integer"/>
Example <entry> and <sequence>
elements using the old schema:
<entry dataset="Swiss-Prot" created="2004-10-25" modified="2005-09-13"> <sequence length="868" mass="95979" checksum="5EAF32DBB48A184C" modified="2004-10-25">
Example <entry> and <sequence>
elements under the new schema:
<entry dataset="Swiss-Prot" created="2004-10-25" modified="2005-09-13" version="1"> <sequence length="868" mass="95979" checksum="5EAF32DBB48A184C" modified="2004-10-25" version="2">
| UniProtKB release 6.1 of 27-Sep-2005 |
|---|
As part of our continuing effort to make it easier to work with the UniProtKB Schema using tools such as JAXB, types for organism, keyword, and sequence outside of the entry element (organismType, keywordType, and sequenceType) have been created. If these types are specified inside of entry type, then the generated Java classes become inner classes of the entry type class (e.g. EntryType.KeywordType), when logically these three should be independent of the entry type. THIS DOES NOT CHANGE THE WAY THE XML FILES LOOK. IT IS A CONVENIENCE MODIFICATION ONLY. All xml documents which are valid against old schema will be valid against new schema as well. In the near future we plan to release a uniprot parser and writer based on JAXB, and this is one step in the preparation of the schema. The new schema has each of the following three types moved to root level:
<!-- Organism definition begins -->
<xs:complexType name="organismType">
<xs:sequence>
<xs:element name="name" type="organismNameType" maxOccurs="unbounded"/>
<xs:element name="dbReference" type="dbReferenceType" maxOccurs="unbounded"/>
<xs:element name="lineage" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="taxon" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="key" type="xs:string" use="required"/>
</xs:complexType>
<!-- Organism definition ends -->
<!-- Keyword definition begins -->
<xs:complexType name="keywordType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="evidence" type="xs:string" use="optional"/>
<xs:attribute name="id" type="xs:string" use="required"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<!-- Keyword definition ends -->
<!-- sequence definition ends -->
<xs:complexType name="sequenceType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="length" type="xs:integer" use="required"/>
<xs:attribute name="mass" type="xs:integer" use="required"/>
<xs:attribute name="checksum" type="xs:string" use="required"/>
<xs:attribute name="modified" type="xs:date" use="required"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<!-- sequence definition ends -->
| UniProtKB release 6.0 of 13-Sep-2005 |
|---|
The format of the OG Chloroplast and Cyanelle lines have changed in the flat-file (see the flat-file recent changes) to indicate more precisely the kind of plastid organelle represented by the sequence. No elements were deleted from GeneLocationType, but 3 tokens were added to the list. The new GeneLocationType is shown below with the original lines in grey, and the additions to the schema in red:
<xs:complexType name="geneLocationType">
<xs:annotation>
<xs:documentation>Defines the locations/origins of the shown sequence (OG line).</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="name" type="statusType" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="apicoplast"/>
<xs:enumeration value="chloroplast"/>
<xs:enumeration value="cyanelle"/>
<xs:enumeration value="hydrogenosome"/>
<xs:enumeration value="mitochondrion"/>
<xs:enumeration value="non-photosynthetic plastid"/>
<xs:enumeration value="nucleomorph"/>
<xs:enumeration value="plasmid"/>
<xs:enumeration value="plastid"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="evidence" type="xs:string" use="optional"/>
</xs:complexType>
Users utilizing tools such as JAXB
were unable to process on an entry-by-entry basis, and therefore were forced to
hold the entire file in memory. In order to parse/write xml entry-by-entry, our
solution was to allow the <entry> and
<copyright> elements to be root elements as well as to be
sub-elements of <uniprot>. THIS DID NOT
CHANGE THE WAY THE XML FILES LOOK. IT WAS A CONVENIENCE MODIFICATION
ONLY. In the near future we plan to release a uniprot parser and writer
based on JAXB, and this is one step in the preparation of the schema.
The EC numbers of a protein are now all stored as database references. In the past, only the first EC number in a protein name was represented as a database reference, which was linked by a "ref" attribute from a protein, domain, or component element. In order to better handle cases of multifunctional proteins, domains, or components with more than one EC number, we have moved the "ref" attribute from the protein, domain, or component element to the name element. For general information on the DE line, please see the user manual. DE lines have been converted into XML using the following rules:
Each EC number goes with the name immediately preceeding it.
Example:
A (EC1) (B) (C) (D) (EC2)EC1 belongs with A, and EC2 belongs with D.
Example:
A (EC1) (B) (C) (D) (EC2) (EC3)EC1 belongs with A, and EC2 and EC3 belong with D.
The following changes were made to the schema:
Removed the "ref" attribute from the protein, domain, and component elements. This will ensure that only the name elements will be allowed to refer to EC numbers, making the reference much more specific. The following in red was removed from the schema:
<xs:complexType name="proteinType"> [...] <xs:attribute name="ref" type="xs:string" use="optional"> <xs:annotation> <xs:documentation>This is referring to a possible EC number (ENZYME database cross reference).</xs:documentation> </xs:annotation> </xs:attribute> [...] </xs:complexType> <xs:element name="domain" minOccurs="0" maxOccurs="unbounded"> <xs:annotation> <xs:documentation>The domain list is equivalent to the INCLUDES section of the DE line.</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="name" type="nameType" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="ref" type="xs:string" use="optional"> <xs:annotation> <xs:documentation>This is referring to a possible EC number (ENZYME database cross reference).</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> </xs:element> <xs:element name="component" minOccurs="0" maxOccurs="unbounded"> <xs:annotation> <xs:documentation>The component list is equivalent to the CONTAINS section of the DE line.</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="name" type="nameType" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="ref" type="xs:string" use="optional"> <xs:annotation> <xs:documentation>This is referring to a possible EC number (ENZYME database cross reference).</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> </xs:element>
Added a "ref" attribute to the <name> element for the
DE line only. This required a renaming of the complex type "nameType" to
"proteinNameType" to make the change explicit, and so only names inside
<protein> elements would be allowed to have refs. The
new proteinNameType is as follows:
<xs:complexType name="proteinNameType">
<xs:annotation>
<xs:documentation>The name type is used for protein names in an
entry.</xs:documentation>
</xs:annotation>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="evidence" type="xs:string" use="optional" />
<xs:attribute name="ref" type="xs:string" use="optional">
<xs:annotation>
<xs:documentation>This is referring to a possible EC number
(ENZYME database cross reference).</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
See below for examples.
Examples:
DE NADH-ubiquinone oxidoreductase 75 kDa subunit, mitochondrial precursor DE (EC 1.6.5.3) (EC 1.6.99.3) (Complex I-75Kd) (CI-75Kd).
in xml format is:
<protein> <name ref="1 2">NADH-ubiquinone oxidoreductase 75 kDa subunit, mitochondrial precursor</name> <name>Complex I-75Kd</name> <name>CI-75Kd</name> </protein> [...] <dbReference type="EC" id="1.6.5.3" key="1"/> <dbReference type="EC" id="1.6.99.3" key="2"/>
If the description had instead been:
DE NADH-ubiquinone oxidoreductase 75 kDa subunit, mitochondrial precursor DE (EC 1.6.5.3) (Complex I-75Kd) (EC 1.6.99.3) (CI-75Kd).
then the xml would be:
<protein> <name ref="1">NADH-ubiquinone oxidoreductase 75 kDa subunit, mitochondrial precursor</name> <name ref="2">Complex I-75Kd</name> <name>CI-75Kd</name> </protein> [...] <dbReference type="EC" id="1.6.5.3" key="1"/> <dbReference type="EC" id="1.6.99.3" key="2"/>
The same goes for the names inside component and domain elements.
| UniProtKB release 5.5 of 19-Jul-2005 |
|---|
UniProtKB contains protein data from two databases: Swiss-Prot and TrEMBL. Currently, in the schema there are no rescrictions on the dataset attribute.
<xs:attribute name="dataset" type="xs:string" use="required"/>
To specify that only those two databases are valid values for the "dataset" attribute, a restriction for this attribute will be introduced.
<xs:attribute name="dataset" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Swiss-Prot"/>
<xs:enumeration value="TrEMBL"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
Upon request from Genew maintainers, we changed the UniProtKB crossreference from "Genew" to "HGNC". For details of this change, please read the UniProt document What's new?. The modification requires no change of the schema, however the XML elements for Genew will change from
<dbReference type="Genew" id="HGNC:12849" key="27">
<property type="entry name" value="YWHAB"/>
</dbReference>
to
<dbReference type="HGNC" id="HGNC:12849" key="27">
<property type="entry name" value="YWHAB"/>
</dbReference>
| UniProtKB release 5.3 of 21-Jun-2005 |
|---|
With this release of UniProtKB, "Hydrogenosome" was added to the list of valid values in the Organelle (OG) section. For more information, see the current news for the flat-file format. The following is the updated geneLocationType for the schema:
<xs:complexType name="geneLocationType">
<xs:annotation>
<xs:documentation>Defines the locations/origins of the shown sequence (OG line).</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="name" type="statusType" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="type" use="required">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="chloroplast"/>
<xs:enumeration value="cyanelle"/>
<xs:enumeration value="mitochondrion"/>
<xs:enumeration value="nucleomorph"/>
<xs:enumeration value="hydrogenosome"/>
<xs:enumeration value="plasmid"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="evidence" type="xs:string" use="optional"/>
</xs:complexType>
| UniProtKB release 5.2 of 07-Jun-2005 |
|---|
We no longer provide uniprot.dtd. Many relationships and restrictions present in the UniProtKB XML format can only be defined in an xsd file, and the dtd is simply not specific enough to be a good indicator of the XML data. This has been announced for 6 months. We request that users contact us through our Help Desk if there are questions/comments.
The UniProt flat-file feature keys SIMILAR, THIOETH, THIOLEST are obsolete. They have not been used in UniProt for a number of months now and their removal from the schema is a maintenance issue only: no changes due to their removal are seen in the UniProt data. The following lines have been removed from uniprot.xsd:
<xs:enumeration value="similar"/> <xs:enumeration value="thioether bond"/> <xs:enumeration value="thiolester bond"/>
| UniProtKB release 5.0 of 10-May-2005 |
|---|
The feature keys DOMAIN and SITE were used to describe distinct types of regions in a protein sequence and we found this situation unsatisfactory. We therefore redefined these two feature keys and introduced 5 new ones. The 5 new keys are: COILED, COMPBIAS, MOTIF, REGION, and TOPO_DOM. This required the following addition to the schema:
<xs:enumeration value="coiled-coil region"/> <xs:enumeration value="compositionally biased region"/> <xs:enumeration value="region of interest"/> <xs:enumeration value="short sequence motif"/> <xs:enumeration value="topological domain"/>
<property> element for EMBL Cross-References
The cross-reference to the EMBL database has been modified in the flat-file
format, with the biological source of the molecule added as a quaternary identifier (see
recent flat-file news for more information).
This change is propogated into the XML format as an additional <property> element.
This requires no change to the schema.
An example is shown below:
From
<dbReference type="EMBL" id="X57346" key="18">
<property type="protein sequence ID" value="CAA40621.1"/>
</dbReference>
To
<dbReference type="EMBL" id="X57346" key="18">
<property type="protein sequence ID" value="CAA40621.1"/>
<property type="molecule type" value="mRNA"/>
</dbReference>
A new mailing list is now available to augment this webpage and to announce schema changes as they appear on the public ftp site. Early warning of schema changes will still be provided through the forthcoming UniProt XML changes page. You may subscribe to the XML mailing list through the form available on http://www.uniprot.org/support/alerts.shtml. Please note that this is an announcement-only mailing list. Any questions or comments you have about UniProt XML should be sent to help@uniprot.org.
| UniProtKB release 4.6 of 26-Apr-2005 |
|---|
The cross-reference to the Mouse Genome Informatics Database used to be called "MGD", but has now been changed to "MGI" to reflect the name of the organization. This requires no change to the schema. An example is shown below:
From
<dbReference type="MGD" id="MGI:1891917" key="16">
<property type="gene designation" value="Ywhab"/>
</dbReference>
To
<dbReference type="MGI" id="MGI:1891917" key="16">
<property type="gene designation" value="Ywhab"/>
</dbReference>
| UniProtKB release 4.5 of 12-Apr-2005 |
|---|
Due to increased user demand for species-specific XML files, Integr8 now provides UniProt XML formatted files for all complete proteomes in the UniProt Knowledgebase. You may download these files directly from the EBI's ftp site (ftp://ftp.ebi.ac.uk/pub/databases/integr8/xml) or through the "Downloads" page provided for each proteome on the Integr8 website. These files will be updated every two weeks in conjunction with each new release of Integr8, which is synchronously produced with releases of the UniProt Knowledgebase.
SMR (The SWISS-MODEL Repository) is a database of annotated three-dimensional comparative protein structure models generated by the fully automated homology-modelling pipeline SWISS-MODEL. More information on SMR may be found on their website.
The SMR database will be stored in <dbReference> elements.
An example is shown below:
<dbReference type="SMR" id="P21958" key="29"> <property type="residue range" value="468-718"/> </dbReference>
| UniProtKB release 4.4 of 29-Mar-2005 |
|---|
<comment> element
With the modification of the Mass Spectrometry comment line (see
recent changes
for details) the only comment type utilizing the "note" attribute was the CC
DATABASE line. This attribute is no longer required, and therefore the
information stored in the "note" attribute has been moved to the
<note> sub-element of <comment>.
An example of these changes are below.
<comment type="online information" name="PROW" note="CD guide CD13 entry"> <link uri="http://www.ncbi.nlm.nih.gov/prow/cd/cd13.htm"/> </comment>New:
<comment type="online information" name="PROW"> <link uri="http://www.ncbi.nlm.nih.gov/prow/cd/cd13.htm"/> <note>CD guide CD13 entry</note> </comment>
This required the deletion of the following section of the schema:
<xs:attribute name="note" type="xs:string" use="optional">
<xs:annotation>
<xs:documentation>Contains a note regarding this comment. Used by the comment type online information.</xs:documentation>
</xs:annotation>
</xs:attribute>
<strain> element
Strain information from a given reference is no longer tokenized into individual strains. If a strain name contained slashes, it used to be further tokenized to represent the strings that are separated by the slashes as in the example below:
<strain>
<name>V583</name>
<name>ATCC 700802</name>
</strain>
While this makes sense for the cases where the slashes are used to separate
synonyms, slashes are unfortunately also used to append serotype, isolate,
substrain and other information which forms an integral part of the strain name
and should hence not be tokenized. Due to this ambiguity of the slash, we have
ceased to tokenize strain names. Therefore there is no longer a
<name> sub-element of <strain>. The
updated schema is as follows:
<xs:element name="strain">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="evidence" type="xs:string" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
The example strain shown above is instead represented as:
<strain>V583 / ATCC 700802</strain>
We try to adopt a consistent policy for naming UniProt XML elements and attributes. One aspect of this is that no underscores should be used, as this makes XML-Object Mapping much more straightforward. However, a small number of underscores have made it into the schema, and we have now changed this. The following changes to the xsd have been made:
<xs:element name="pH_dependence" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="redox_potential" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="temperature_dependence" type="xs:string" minOccurs="0" maxOccurs="1"/>New:
<xs:element name="phDependence" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="redoxPotential" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="temperatureDependence" type="xs:string" minOccurs="0" maxOccurs="1"/>
| UniProtKB release 4.2 of 01-Mar-2005 |
|---|
<location> and <position>
These two complex types have been updated to allow more fine-grained parsing of ranges in both comments and features.
positionType has an additional status of "unknown", and the "position" attribute
is no longer mandatory. This allows "unknown" ranges to be accomodated
explicitly. The new schema forces the <begin> and
<end> elements inside locationType to always be used
together: if one is present, both must be present. locationType also has a new
optional attribute which is used in the mass spectrometry comments to name the
sequence to which a <location> element refers. The new
definitions of these complex types are as follows:
<xs:complexType name="positionType">
<xs:attribute name="position" type="xs:unsignedLong" use="optional" />
<xs:attribute name="status" use="optional" default="certain">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="certain" />
<xs:enumeration value="uncertain" />
<xs:enumeration value="less than" />
<xs:enumeration value="greater than" />
<xs:enumeration value="unknown" />
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
<xs:complexType name="locationType">
<xs:annotation>
<xs:documentation>A location can be either a position or have
both a begin and end.</xs:documentation>
</xs:annotation>
<xs:choice>
<xs:sequence>
<xs:element name="begin" type="positionType" minOccurs="1" />
<xs:element name="end" type="positionType" minOccurs="1" />
</xs:sequence>
<xs:element name="position" type="positionType" />
</xs:choice>
<xs:attribute name="sequence" type="xs:string" use="optional" />
</xs:complexType>
For general information on the MASS SPECTROMETRY comment, please see the user manual.
Since the RANGE portion of the MASS SPECTROMETRY comment has been slightly
modified in the flat file format (see the
recent changes),
it became possible to represent all ranges as location elements instead of using
the "note" attribute (as was done up until now for ranges which also included
sequence identifiers). In order to represent this change there are changes to
the schema file: specifically the <location> element, as
represented by the complex type locationType, and the
<position>, <begin> and
<end> elements, represented by the complex type positionType.
Further, the "note" attribute is no longer used in the mass spectrometry comment
type. It is now only found in the online information comment type.
Examples of these changes are below.
<comment type="mass spectrometry" mass="14919.9" method="Electrospray" note="1-162 (Allele F1C)">New:
<comment type="mass spectrometry" mass="14919.9" method="Electrospray">
<location sequence="Allele F1C">
<begin position="1"/>
<end position="162"/>
</location>
</comment>
<comment type="mass spectrometry" mass="14919.9" method="Electrospray" note="?-162 (Allele F1C)">New:
<comment type="mass spectrometry" mass="14919.9" method="Electrospray">
<location sequence="Allele F1C">
<begin status="unknown"/>
<end position="162"/>
</location>
</comment>
The <location> element is used in both the CC lines and in
the FT lines. Therefore a schema change in this element necessitates a change to
the displaying of features.
<feature type="transit peptide" description="Mitochondrion" status="potential">
<location>
<begin position="1"/>
</location>
</feature>
New:
<feature type="transit peptide" description="Mitochondrion" status="potential">
<location>
<begin position="1"/>
<end status="unknown"/>
</location>
</feature>
| UniProtKB release 4.1 of 15-Feb-2005 |
|---|
The database cross-reference GeneDB_SPombe will be renamed to
GeneDB_Spombe. This database is stored in the UniProt XML in
<dbReference> elements. This will not cause a change to the
xsd or dtd but will affect the value of the "type" attribute of
<dbReference>. An example is shown below:
From
<dbReference type="GeneDB_SPombe" id="SPAP8A3.09c" key="15"/>
To
<dbReference type="GeneDB_Spombe" id="SPAP8A3.09c" key="15"/>
LegioList, a database dedicated to the analysis of the genomes of Legionella pneumophila strain Paris and strain Lens, will be introduced into UniProt as a new database cross-reference. More information on LegioList may be found on their website.
The LegioList database will be stored in <dbReference> elements.
An example is shown below:
<dbReference type="LegioList" id="lpp0849" key="15"/>
| UniProtKB release 4.0 of 1-Feb-2005 |
|---|
A new CC INTERACTION line was introduced with this release of UniProt (see the user manual for more details). Implementing this in XML required a small number of changes. Due to this change happening over the holiday period, we have had to implement it without the normal announcement in the XML forthcoming changes.
Within the complexType "commentType":
<xs:sequence>
<xs:element name="interactant" type="interactantType" minOccurs="2" maxOccurs="2"/>
<xs:element name="organismsDiffer" type="xs:boolean" minOccurs="1" default="false"/>
<xs:element name="experiments" type="xs:integer" minOccurs="1" maxOccurs="1"/>
</xs:sequence>
General additions:
<xs:group name="interactantGroup">
<xs:sequence>
<xs:element name="id" type="xs:string" minOccurs="1"/>
<xs:element name="label" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:complexType name="interactantType">
<xs:attribute name="intactId" type="xs:string" use="required"/>
<xs:group ref="interactantGroup" minOccurs="0"/>
</xs:complexType>
The following are examples of the new comment type:
<comment type="interaction">
<interactant intactId="EBI-163407"/>
<interactant intactId="EBI-111903">
<id>Q9VGZ4</id>
<label>cg6325</label>
</interactant>
<organsismsDiffer>false</organismsDiffer>
<experiments>1</experiments>
</comment>
<comment type="interaction">
<interactant intactId="EBI-356498"/>
<interactant intactId="EBI-457639">
<id>P84198</id>
<label>vim</label>
</interactant>
<organismsDiffer>true</organismsDiffer>
<experiments>4</experiments>
</comment>
<comment type="interaction">
<interactant intactId="EBI-77613"/>
<interactant intactId="EBI-79084">
<id>Q9UCX5</id>
</interactant>
<organismsDiffer>false</organismsDiffer>
<experiments>2</experiments>
</comment>
| UniProtKB release 3.4 of 21-Dec-2004 |
|---|
The order of <event> and <isoform>
elements of alternative product comments was taken directly from the order of
the flat-file (see the user manual for more
details). Implementing this in XML required the following xs:sequence to be
used:
<xs:sequence>
<xs:element name="event" type="eventType" minOccurs="1" maxOccurs="2"/>
<xs:element name="isoform" type="isoformType" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="event" type="eventType" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
Elements with the same name and type in a sequence can confuse an XML parser. In
a situation with both types of events but no isoforms, the content model will
reach a non-deterministic state, because the parser cannot determine whether the
second event belongs to the preceding or succeeding event. The placement of the
<event> elements is not important in the XML format,
therefore in order to remove the problem described, the xsd now has the
following structure:
<xs:sequence>
<xs:element name="event" type="eventType" minOccurs="1" maxOccurs="3"/>
<xs:element name="isoform" type="isoformType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
For more information on alternative splicing, and comments in general, please see the user manual.
The "namedIsoforms" attribute of the complex type eventType used to display the
number of <isoform> elements that are listed in a ALTERNATIVE
PRODUCTS comment. Since the value of this attribute can be easily computed, we
have removed this attribute from the XML format.
This comment type is new and some sections are well structured. Evidence tags will be allowed in certain parts of the comment, but are not present at the moment. For general information regarding the biophysiochemical properties comment type, please see the recent changes for the flat file format. Below is an example of the new XML format.
<comment type="biophysicochemical properties">
<absorption>
<max>~390 nm</max>
<text>free text</text>
</absorption>
<kinetics>
<KM>145 mM for ATP (in the PPAT reaction)</KM>
<KM>34.4 mM for ATP (in the DPCK reaction)</KM>
<Vmax>0.11 mmol/min/mg enzyme in the DPCK reaction</Vmax>
<text>free text</text>
</kinetics>
<pH_dependence>free text</pH_dependence>
<redox_potential>free text</redox_potential>
<temperature_dependence>free text</temperature_dependence>
</comment>
The attribute "updated" of both <entry> and <sequence>
has been renamed to "modified". This is in keeping with standards such as the
Dublin Core Metadata Initiative.
The author lists provided for every citation in XML through the
<person> element do not make use of either the "surname" or
"forename" attributes. Therefore these have been dropped, and only the "name"
attribute is used, which displays the author's last name and initials.
| UniProtKB release 3.3 of 07-Dec-2004 |
|---|
All classes of geneLocation except Plasmid (namely Chloroplast, Cyanelle,
Mitochondrion and Nucleomorph) have their evidence tags displayed as an
attribute of <geneLocation>. Up until now, Plasmids have
held their evidence tags as attributes of the <name>
element. To provide consistency across the whole of the geneLocation types, we
have moved the plasmid evidence attribute into the
<geneLocation> element.
| UniProtKB release 3.0 of 25-Oct-2004 |
|---|
to include this word in the <name> sub-element of
<geneLocation>. Therefore this word is now stripped from
<name>.
In TrEMBL there are cases where the only word in the OG line is
"Plasmid". In these cases a status="unknown" attribute
is added to the <name> element.
As this no longer fits the criteria for nameType, a new complex type was created called geneLocationNameType. This type is shown below:
<xs:complexType name="geneLocationNameType">
<xs:annotation>
<xs:documentation>The name type is used for gene location name.</xs:
documentation>
</xs:annotation>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="evidence" type="xs:string" use="optional"/>
<xs:attribute name="status" use="optional" default="known" >
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="known"/>
<xs:enumeration value="unknown"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
Some examples follow:
<geneLocation type="plasmid"> <name>pZM2</name> </geneLocation>
<geneLocation type="plasmid"> <name evidence="EI3" status="unknown"/> </geneLocation>