How do I access resources on this web site programmatically?
Last modified January 15, 2007
This document describes how to use the web application, from the programmatic access point of view. Individual records and queries are accessible using simple URLs (REST). All pages can be bookmarked and linked.
Contents
- Running Queries
- Resolving Identifiers
- Retrieving Resources (entry records)
- Using Tools: Perl, Ruby and Java Examples
- Format Conversion
Running Queries
You may either download full entries for further analysis, or a simple list of matching entries, which could for be stored and later used to detect changes.
Tips: Get familiar with the query builder by using the Fields
link next to the query form. Use the Customize display
to select the columns for retrieving specific data in tab delimited form.
Explore the various download options.
The web address for a query result consists of a data set name
(e.g. uniprot, uniref, uniparc,
taxonomy, ...) and the actual query.
The following parameters are supported:
| Parameter | Values | Description |
|---|---|---|
query |
string |
See query syntax
and query fields for UniProtKB.
An empty query string will retrieve all entries in a data set,
except for controlled vocabularies, documents and help, where it opens a
splash page. Tip: use the Fields
link next to the query form.
|
format |
html | tab | fasta | gff | txt | xml | rdf | rss | list |
Format in which to return results.
tab returns tab-delimited data for the given columns.
fasta returns sequence data only, where applicable.
gff returns sequence annotation, where applicable.
txt, xml and rdf return full entries.
rss returns an OpenSearch RSS feed.
list returns a list of identifiers.
Tip: click on the Download button above the list of results.
|
columns |
comma-separated list of values, e.g. for UniProtKB: citation | clusters | comments | database | domains | domain | ec | id | entry name | existence | families | features | genes | go | go-id | interpro | interactor | keywords | keyword-id | last-modified | length | organism | organism-id | pathway | protein names | reviewed | score | sequence | 3d | subcellular locations | taxon | tools | version | virus hosts |
Columns to select for retrieving results in tab format.
Tip: use the Customize display option located above
the table showing the results of your query. Some columns can be parameterized, e.g.
database(PDB) (see the example at the end of this section). |
compress |
yes | no |
Return results gzipped. Note that if the client supports HTTP compression,
results may be compressed transparently even if this parameter is
not set to yes.
|
limit |
integer | Maximum number of results to retrieve. |
offset |
integer |
Offset of the first result, typically used together with
the offset parameter.
|
The following example retrieves all human entries matching antigen
in RDF/XML and tab-delimited format, respectively.
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=rdf&compress=yes
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=tab&compress=yes&columns=id,reviewed,protein names
The next example retrieves all human entries with cross-references to PDB in tab-delimited format, showing only the UniProtKB and PDB identifiers.
http://www.uniprot.org/uniprot/?query=organism:9606+AND+database:pdb&format=tab&compress=yes&columns=id,database(PDB)
Resolving Identifiers
A request for an address such as
http://purl.uniprot.org/uniprot/P12345
will be resolved and redirected to the actual resource, see next section.
Retrieving Resources (entry records)
The web address for a resource consists of a data set name and an identifier:
http://www.uniprot.org/uniprot/P12345
By default, a web page is returned. Depending on the data set, other formats may also be available:
http://www.uniprot.org/uniprot/P12345.rdf http://www.uniprot.org/uniprot/P12345.fasta http://www.uniprot.org/uniprot/P68441 http://www.uniprot.org/uniprot/P06213.txt http://www.uniprot.org/uniref/UniRef90_P33810.xml http://www.uniprot.org/uniparc/UPI000000001F
There is an option to have data from referenced data sets included directly in the returned data, if there is any available:
http://www.uniprot.org/uniprot/P12345.rdf?include=yes
The following status codes may be returned:
| Code | Description |
|---|---|
| 200 | The request was processed successfully. |
| 400 | Bad request. There is a problem with your input. |
| 404 | Not found. The resource you requested doesn't exist. |
| 410 | Gone. The resource you requested was removed. |
| 500 | Internal server error. Most likely a temporary problem, but if the problem persists please contact us. |
| 503 | Service not available. The server is being updated, try again later. |
Format Conversion
This service allows you to convert data between different formats.
Example, using the Jakarta Commons HttpClient library:
PostMethod method = new PostMethod("http://www.uniprot.org/convert");
Part[] parts =
{
new StringPart("type", "uniprot"),
new StringPart("from", "txt"),
new StringPart("to", "rdf"),
new FilePart("data", new File("P05067.txt"))
};
method.setRequestEntity(new MultipartRequestEntity(parts, method.getParams()));
HttpClient client = new HttpClient();
int status = client.executeMethod(method);
if (status != HttpStatus.SC_OK)
println(method.getStatusLine());
else println(method.getResponseBodyAsString());
Note that at the moment only single entries are supported.



