Skip Header

How can I access resources on this web site programmatically?

Last modified December 15, 2009

This document describes how this web site can be accessed with programs. All resources (individual entries as well as sets of entries retrieved by queries) are accessible using simple URLs (REST) that can be bookmarked, linked and used in programs.

Contents

Retrieving Individual Entries
Batch Retrieval of Entries
Retrieving Entries via Queries
Mapping Identifiers
Format Conversion

Retrieving Individual Entries

The web address for an entry consists of a data set name (e.g. uniprot, uniref, uniparc, taxonomy, ...) and the entry's unique identifier, e.g.:

http://www.uniprot.org/uniprot/P12345

By default, a web page is returned. Depending on the data set, other formats may also be available (check the orange buttons on the entry's web page). Here are some examples:

http://www.uniprot.org/uniprot/P12345.txt
http://www.uniprot.org/uniprot/P12345.xml
http://www.uniprot.org/uniprot/P12345.rdf 
http://www.uniprot.org/uniprot/P12345.fasta
http://www.uniprot.org/uniprot/P12345.gff

http://www.uniprot.org/uniref/UniRef90_P33810.xml
http://www.uniprot.org/uniref/UniRef90_P33810.rdf
http://www.uniprot.org/uniref/UniRef90_P33810.fasta
http://www.uniprot.org/uniref/UniRef90_P33810.tab

http://www.uniprot.org/uniparc/UPI000000001F.xml
http://www.uniprot.org/uniparc/UPI000000001F.rdf
http://www.uniprot.org/uniparc/UPI000000001F.fasta
http://www.uniprot.org/uniparc/UPI000000001F.tab

There is an option to have data from referenced data sets included directly in the returned data, if there is any available:

http://www.uniprot.org/uniprot/P12345.rdf?include=yes

The following status codes may be returned:

Code Description
200 The request was processed successfully.
400 Bad request. There is a problem with your input.
404 Not found. The resource you requested doesn't exist.
410 Gone. The resource you requested was removed.
500 Internal server error. Most likely a temporary problem, but if the problem persists please contact us.
503 Service not available. The server is being updated, try again later.

Resolving RDF Identifiers

A request for an address such as

http://purl.uniprot.org/uniprot/P12345

will be resolved, where possible, by redirection to the corresponding resource (see previous section).

Batch Retrieval of Entries

Entries can be retrieved in batch by querying the batch retrieval service with a list of UniProt identifers. Here is an example using Perl.

Retrieving Entries via Queries

You can use any query to define the set of entries that you are interested in, e.g. all reviewed human entries:

http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606

You may then retrieve either the set of entries in one of the supported download formats, or a simple list of entry identifiers, which could be stored and later used to detect changes, or the result table itself.

Tips: Explore the various download options of each data set. Get familiar with the query builder by using the Fields link next to the query form. Use the Customize display to select the columns for retrieving result tables in tab delimited or Excel format.

The web address for a query result consists of a data set name (e.g. uniprot, uniref, uniparc, taxonomy, ...) and the actual query. The following parameters are supported:

Parameter Values Description
query string See query syntax and query fields for UniProtKB. An empty query string will retrieve all entries in a data set, except for controlled vocabularies, documents and help, where it opens a splash page. Tip: use the Fields link next to the query form.
format html | tab | fasta | gff | txt | xml | rdf | rss | list Format in which to return results. tab returns tab-delimited data for the given columns. fasta returns sequence data only, where applicable. gff returns sequence annotation, where applicable. txt, xml and rdf return full entries. rss returns an OpenSearch RSS feed. list returns a list of identifiers. Tip: click on the Download button above the list of results.
columns comma-separated list of values, e.g. for UniProtKB: citation | clusters | comments | database | domains | domain | ec | id | entry name | existence | families | features | genes | go | go-id | interpro | interactor | keywords | keyword-id | last-modified | length | organism | organism-id | pathway | protein names | reviewed | score | sequence | 3d | subcellular locations | taxon | tools | version | virus hosts Columns to select for retrieving results in tab format. Tip: use the Customize display option located above the table showing the results of your query. Some columns can be parameterized, e.g. database(PDB) (see the example at the end of this section).
compress yes | no Return results gzipped. Note that if the client supports HTTP compression, results may be compressed transparently even if this parameter is not set to yes.
limit integer Maximum number of results to retrieve.
offset integer Offset of the first result, typically used together with the limit parameter.

The following example retrieves all human entries matching the term 'antigen' in RDF/XML and tab-delimited format, respectively.

http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=rdf&compress=yes
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=tab&compress=yes&columns=id,reviewed,protein names

The next example retrieves all human entries with cross-references to PDB in tab-delimited format, showing only the UniProtKB and PDB identifiers.

http://www.uniprot.org/uniprot/?query=organism:9606+AND+database:pdb&format=tab&compress=yes&columns=id,database(PDB)

Mapping Identifiers

To use the database mapping service programmatically you need to know the abbreviations for the database names. Some databases map only one way.

Name Abbreviation Direction

Here are some examples for querying the database mapping service using:

Perl
Ruby
Java

Format Conversion

This service allows you to convert data between different formats.

Example, using the Jakarta Commons HttpClient library:

PostMethod method = new PostMethod("http://www.uniprot.org/convert");
Part[] parts =
{
  new StringPart("type", "uniprot"),
  new StringPart("from", "txt"),
  new StringPart("to", "rdf"),
  new FilePart("data", new File("P05067.txt"))
};
method.setRequestEntity(new MultipartRequestEntity(parts, method.getParams()));
HttpClient client = new HttpClient();
int status = client.executeMethod(method);
if (status != HttpStatus.SC_OK)
  println(method.getStatusLine());
else println(method.getResponseBodyAsString());

Note that at the moment only single entries are supported.