UniProt release 6.5
Published November 22, 2005
Headlines
Keyword hierarchies and categories
We have changed the structure of the UniProtKB keyword list, and would like to take this opportunity to describe some concepts behind the use of the keywords in UniProtKB/Swiss-Prot.
UniProtKB/Swiss-Prot entries are tagged with keywords. Keywords help summarize the contents of individual entries, simplify retrieval of sets of entries, and allow entries to be grouped easily according to different aspects such as biological processes, molecular function, subcellular location, domains, ligands, sequence modifications and diseases.
The keywords are described in the keywlist.txt file using the following format:
--------- --------------------------- ---------------------- Line code Content Occurrence in an entry --------- --------------------------- ---------------------- ID Identifier (keyword) Once; starts an entry AC Accession (KW-xxxx) Once DE Definition Once or more SY Synonyms Optional; Once or more GO Gene ontology (GO) mapping Optional; Once or more HI Hierarchy Optional; Once or more CA Category Once // Terminator Once; ends an entry
Example of a complete keyword description:
ID Calcium channel. AC KW-0107 DE Cell membrane glycoprotein forming a channel in a biological membrane DE selectively permeable to calcium ions. Calcium is essential for a DE variety of bodily functions, such as neurotransmission, muscle DE contraction and proper heart function. GO GO:0005262; calcium channel activity HI Molecular function: Ionic channel; Calcium channel. HI Biological process: Transport; Ion transport; Calcium transport; Calcium channel. HI Ligand: Calcium; Calcium channel. CA Molecular function. //
Some keywords are by definition supersets or subsets of others. Such hierarchical relationships are stated in HI lines:
HI Category: Keyword(1); ...; Keyword(n); Described keyword.
From the previous example we can infer that a UniProtKB/Swiss-Prot entry that is tagged with the keyword "Calcium channel" will at least have the following additional keywords appear in the KW line:
KW Calcium; Calcium transport; Ion transport; Ionic channel; Transport.
This formalization of the relationships between keywords enables our curators (assisted by automated procedures) to ensure coherence, and to increase the coverage of UniProtKB/Swiss-Prot entries which keywords describing both specific and more general concepts. This in turn facilitates the retrieval of complete and coherent entry sets by keyword. The current UniProtKB/Swiss-Prot release contains close to one million keywords in almost 200'000 entries.
A "Category" is a top-level keyword that never appears directly in UniProtKB/Swiss-Prot entries. Categories are described along with the other keywords, but are introduced by an IC rather than an ID line using the following format:
--------- --------------------------- ---------------------- Line code Content Occurrence in an entry --------- --------------------------- ---------------------- IC Identifier (category) Once; starts a category entry AC Accession (KW-xxxx) Once DE Definition Once or more
Example of a category description:
IC PTM. AC KW-9991 DE Keywords assigned to proteins because their sequences can differ from DE the mere translation of their corresponding genes, due to some post- DE translational modification.
Changes concerning keywords
New keywords:
