Skip Header

UniProt release 6.5

Published November 22, 2005

Headlines

Keyword hierarchies and categories

We have changed the structure of the UniProtKB keyword list, and would like to take this opportunity to describe some concepts behind the use of the keywords in UniProtKB/Swiss-Prot.

UniProtKB/Swiss-Prot entries are tagged with keywords. Keywords help summarize the contents of individual entries, simplify retrieval of sets of entries, and allow entries to be grouped easily according to different aspects such as biological processes, molecular function, subcellular location, domains, ligands, sequence modifications and diseases.


The keywords are described in the keywlist.txt file using the following format:


---------  ---------------------------     ----------------------
Line code  Content                         Occurrence in an entry
---------  ---------------------------     ----------------------
ID         Identifier (keyword)            Once; starts an entry
AC         Accession (KW-xxxx)             Once
DE         Definition                      Once or more
SY         Synonyms                        Optional; Once or more
GO         Gene ontology (GO) mapping      Optional; Once or more
HI         Hierarchy                       Optional; Once or more
CA         Category                        Once
//         Terminator                      Once; ends an entry

Example of a complete keyword description:

ID   Calcium channel.
AC   KW-0107
DE   Cell membrane glycoprotein forming a channel in a biological membrane
DE   selectively permeable to calcium ions. Calcium is essential for a
DE   variety of bodily functions, such as neurotransmission, muscle
DE   contraction and proper heart function.
GO   GO:0005262; calcium channel activity
HI   Molecular function: Ionic channel; Calcium channel.
HI   Biological process: Transport; Ion transport; Calcium transport; Calcium channel.
HI   Ligand: Calcium; Calcium channel.
CA   Molecular function.
//

Some keywords are by definition supersets or subsets of others. Such hierarchical relationships are stated in HI lines:

HI   Category: Keyword(1); ...; Keyword(n); Described keyword.

From the previous example we can infer that a UniProtKB/Swiss-Prot entry that is tagged with the keyword "Calcium channel" will at least have the following additional keywords appear in the KW line:

KW   Calcium; Calcium transport; Ion transport; Ionic channel; Transport.

This formalization of the relationships between keywords enables our curators (assisted by automated procedures) to ensure coherence, and to increase the coverage of UniProtKB/Swiss-Prot entries which keywords describing both specific and more general concepts. This in turn facilitates the retrieval of complete and coherent entry sets by keyword. The current UniProtKB/Swiss-Prot release contains close to one million keywords in almost 200'000 entries.

A "Category" is a top-level keyword that never appears directly in UniProtKB/Swiss-Prot entries. Categories are described along with the other keywords, but are introduced by an IC rather than an ID line using the following format:

---------  ---------------------------     ----------------------
Line code  Content                         Occurrence in an entry
---------  ---------------------------     ----------------------
IC         Identifier (category)           Once; starts a category entry
AC         Accession (KW-xxxx)             Once
DE         Definition                      Once or more

Example of a category description:

IC   PTM.
AC   KW-9991
DE   Keywords assigned to proteins because their sequences can differ from
DE   the mere translation of their corresponding genes, due to some post-
DE   translational modification.

Changes concerning keywords

New keywords: