How can I access resources on this web site programmatically?
Last modified December 15, 2009
This document describes how this web site can be accessed with programs. All resources (individual entries as well as sets of entries retrieved by queries) are accessible using simple URLs (REST) that can be bookmarked, linked and used in programs.
Contents
- Retrieving Individual Entries
- Batch Retrieval of Entries
- Retrieving Entries via Queries
- Mapping Identifiers
- Format Conversion
Retrieving Individual Entries
The web address for an entry consists of a data set name
(e.g. uniprot, uniref,
uniparc, taxonomy, ...) and the entry's
unique identifier, e.g.:
http://www.uniprot.org/uniprot/P12345
By default, a web page is returned. Depending on the data set, other formats may also be available (check the orange buttons on the entry's web page). Here are some examples:
http://www.uniprot.org/uniprot/P12345.txt http://www.uniprot.org/uniprot/P12345.xml http://www.uniprot.org/uniprot/P12345.rdf http://www.uniprot.org/uniprot/P12345.fasta http://www.uniprot.org/uniprot/P12345.gff http://www.uniprot.org/uniref/UniRef90_P33810.xml http://www.uniprot.org/uniref/UniRef90_P33810.rdf http://www.uniprot.org/uniref/UniRef90_P33810.fasta http://www.uniprot.org/uniref/UniRef90_P33810.tab http://www.uniprot.org/uniparc/UPI000000001F.xml http://www.uniprot.org/uniparc/UPI000000001F.rdf http://www.uniprot.org/uniparc/UPI000000001F.fasta http://www.uniprot.org/uniparc/UPI000000001F.tab
There is an option to have data from referenced data sets included directly in the returned data, if there is any available:
http://www.uniprot.org/uniprot/P12345.rdf?include=yes
The following status codes may be returned:
| Code | Description |
|---|---|
| 200 | The request was processed successfully. |
| 400 | Bad request. There is a problem with your input. |
| 404 | Not found. The resource you requested doesn't exist. |
| 410 | Gone. The resource you requested was removed. |
| 500 | Internal server error. Most likely a temporary problem, but if the problem persists please contact us. |
| 503 | Service not available. The server is being updated, try again later. |
Resolving RDF Identifiers
A request for an address such as
http://purl.uniprot.org/uniprot/P12345
will be resolved, where possible, by redirection to the corresponding resource (see previous section).
Batch Retrieval of Entries
Entries can be retrieved in batch by querying the batch retrieval service with a list of UniProt identifers. Here is an example using Perl.
Retrieving Entries via Queries
You can use any query to define the set of entries that you are interested in, e.g. all reviewed human entries:
http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606
You may then retrieve either the set of entries in one of the supported download formats, or a simple list of entry identifiers, which could be stored and later used to detect changes, or the result table itself.
Tips: Explore the various download options of
each data set. Get familiar with the query builder by using the
Fields link next to the query form. Use the
Customize display to select the columns for retrieving
result tables in tab delimited or Excel format.
The web address for a query result consists of a data set name
(e.g. uniprot, uniref, uniparc,
taxonomy, ...) and the actual query.
The following parameters are supported:
| Parameter | Values | Description |
|---|---|---|
query |
string |
See query syntax
and query fields for UniProtKB.
An empty query string will retrieve all entries in a data set,
except for controlled vocabularies, documents and help, where it opens a
splash page. Tip: use the Fields
link next to the query form.
|
format |
html | tab | fasta | gff | txt | xml | rdf | rss | list |
Format in which to return results.
tab returns tab-delimited data for the given columns.
fasta returns sequence data only, where applicable.
gff returns sequence annotation, where applicable.
txt, xml and rdf return full entries.
rss returns an OpenSearch RSS feed.
list returns a list of identifiers.
Tip: click on the Download button above the list of results.
|
columns |
comma-separated list of values, e.g. for UniProtKB: citation | clusters | comments | database | domains | domain | ec | id | entry name | existence | families | features | genes | go | go-id | interpro | interactor | keywords | keyword-id | last-modified | length | organism | organism-id | pathway | protein names | reviewed | score | sequence | 3d | subcellular locations | taxon | tools | version | virus hosts |
Columns to select for retrieving results in tab format.
Tip: use the Customize display option located above
the table showing the results of your query. Some columns can be parameterized, e.g.
database(PDB) (see the example at the end of this section). |
compress |
yes | no |
Return results gzipped. Note that if the client supports HTTP compression,
results may be compressed transparently even if this parameter is
not set to yes.
|
limit |
integer | Maximum number of results to retrieve. |
offset |
integer |
Offset of the first result, typically used together with
the limit parameter.
|
The following example retrieves all human entries matching the term 'antigen'
in RDF/XML and tab-delimited format, respectively.
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=rdf&compress=yes
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=tab&compress=yes&columns=id,reviewed,protein names
The next example retrieves all human entries with cross-references to PDB in tab-delimited format, showing only the UniProtKB and PDB identifiers.
http://www.uniprot.org/uniprot/?query=organism:9606+AND+database:pdb&format=tab&compress=yes&columns=id,database(PDB)
Mapping Identifiers
To use the database mapping service programmatically you need to know the abbreviations for the database names. Some databases map only one way.
| Name | Abbreviation | Direction |
|---|
Here are some examples for querying the database mapping service using:
PerlRuby
Java
Format Conversion
This service allows you to convert data between different formats.
Example, using the Jakarta Commons HttpClient library:
PostMethod method = new PostMethod("http://www.uniprot.org/convert");
Part[] parts =
{
new StringPart("type", "uniprot"),
new StringPart("from", "txt"),
new StringPart("to", "rdf"),
new FilePart("data", new File("P05067.txt"))
};
method.setRequestEntity(new MultipartRequestEntity(parts, method.getParams()));
HttpClient client = new HttpClient();
int status = client.executeMethod(method);
if (status != HttpStatus.SC_OK)
println(method.getStatusLine());
else println(method.getResponseBodyAsString());
Note that at the moment only single entries are supported.



