How can I access resources on this web site programmatically?
Last modified March 21, 2012
This document describes how this web site can be accessed with programs. All resources (individual entries as well as sets of entries retrieved by queries) are accessible using simple URLs (REST) that can be bookmarked, linked and used in programs.
Please consider to provide your email address as part of the User-Agent header that your programs set. This will allow us to contact you in case of problems.
Contents
- Retrieving individual entries
- Batch retrieval of entries
- Retrieving entries via queries
- Retrieving a random entry
- Downloading data at every UniProt release
- Release number and date
- Format conversion
- Mapping database identifiers
Retrieving individual entries
The web address for an entry consists of a data set name
(e.g. uniprot, uniref,
uniparc, taxonomy, ...) and the entry's
unique identifier, e.g.:
http://www.uniprot.org/uniprot/P12345
By default, a web page is returned. Depending on the data set, other formats may also be available (check the orange buttons on the entry's web page). Here are some examples:
http://www.uniprot.org/uniprot/P12345.txt http://www.uniprot.org/uniprot/P12345.xml http://www.uniprot.org/uniprot/P12345.rdf http://www.uniprot.org/uniprot/P12345.fasta http://www.uniprot.org/uniprot/P12345.gff http://www.uniprot.org/uniref/UniRef90_Q76ZR4.xml http://www.uniprot.org/uniref/UniRef90_Q76ZR4.rdf http://www.uniprot.org/uniref/UniRef90_Q76ZR4.fasta http://www.uniprot.org/uniref/UniRef90_Q76ZR4.tab http://www.uniprot.org/uniparc/UPI000000001F.xml http://www.uniprot.org/uniparc/UPI000000001F.rdf http://www.uniprot.org/uniparc/UPI000000001F.fasta http://www.uniprot.org/uniparc/UPI000000001F.tab
For the RDF/XML format there is an option to include data from referenced data sets directly in the returned data:
http://www.uniprot.org/uniprot/P12345.rdf?include=yes
The following status codes may be returned:
| Code | Description |
|---|---|
| 200 | The request was processed successfully. |
| 400 | Bad request. There is a problem with your input. |
| 404 | Not found. The resource you requested doesn't exist. |
| 410 | Gone. The resource you requested was removed. |
| 500 | Internal server error. Most likely a temporary problem, but if the problem persists please contact us. |
| 503 | Service not available. The server is being updated, try again later. |
Resolving RDF identifiers
A request for an address such as
http://purl.uniprot.org/uniprot/P12345
will be resolved, where possible, by redirection to the
corresponding resource (see previous section). For UniProt resources,
entries are returned in RDF/XML format if the HTTP
'Accept' request header is set to
'application/rdf+xml'.
Batch retrieval of entries
Entries can be retrieved in batch by querying our batch retrieval service with a list of UniProt identifers. Here is a Perl example.
Retrieving entries via queries
You can use any query to define the set of entries that you are interested in. Best start with an interactive text search to find the base URL for your set, e.g. all reviewed human entries:
http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606
Click the orange Download button on the search results page to see the available download formats for your query. Hover your mouse over Download for the format of your choice to see the additional parameters (described in the table below) that you need to append to your base URL to retrieve the entries in this format.
Tips: Explore the various download formats of each data set. Get familiar with the query builder of the Search tab by clicking Advanced Search ». Click Customize on the search results page to select the columns for retrieving result tables in tab-delimited or Excel format.
The URL for a query result consists of a data set name
(e.g. uniprot, uniref, uniparc,
taxonomy, ...) and the actual query.
The following query parameters are supported:
| Parameter | Values | Description |
|---|---|---|
query |
string | See query syntax and query fields for UniProtKB. An empty query string will retrieve all entries in a data set, except for controlled vocabularies, documents and help, where it opens a splash page. Tip: Click Advanced Search » in the Search tab. |
format |
html | tab | xls | fasta | gff | txt | xml | rdf | list | rss |
Format in which to return results:
Download above the list of results.
|
columns |
comma-separated list of values, e.g. for UniProtKB: citation | clusters | comments | database | domains | domain | ec | id | entry name | existence | families | features | genes | go | go-id | interpro | interactor | keywords | keyword-id | last-modified | length | organism | organism-id | pathway | protein names | reviewed | score | sequence | 3d | subcellular locations | taxon | tools | version | virus hosts |
Columns to select for retrieving results in tab or xls format.
Tip: Some columns can be parameterized,
e.g. database(PDB) (see the example at the
end of this section). Click Customize
on the search results page. |
include |
yes | no |
Include isoform sequences when the format parameter is set to fasta.Include description of referenced data when the format parameter is set to rdf.This parameter is ignored for all other values of the format parameter.
|
compress |
yes | no |
Return results gzipped. Note that if the client supports HTTP compression,
results may be compressed transparently even if this parameter is
not set to yes.
|
limit |
integer | Maximum number of results to retrieve. |
offset |
integer |
Offset of the first result, typically used together with
the limit parameter.
|
The following example retrieves all human entries matching the term 'antigen'
in RDF/XML and tab-delimited format, respectively.
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=rdf&compress=yes
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=tab&compress=yes&columns=id,reviewed,protein names
The next example retrieves all human entries with cross-references to PDB in tab-delimited format, showing only the UniProtKB and PDB identifiers.
http://www.uniprot.org/uniprot/?query=organism:9606+AND+database:pdb&format=tab&compress=yes&columns=id,database(PDB)
Retrieving a random entry
You can retrieve a random entry by appending &random=yes to any query, e.g. the following query returns a random reviewed human entry:
http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606&random=yes
Downloading data at every UniProt release
The HTTP header
Last-Modified: will avoid that you download data more than once per release,
if you use a download tool that makes use of this information, e.g. the unix commands
lwp-mirror or curl with the -z
option. Here are examples of how to do this in Perl:
Release number and date
If you would like to record the UniProt release number and/or date of the data which you retrieve, you can extract this information from the HTTP header of the response (see this Perl example):
X-UniProt-Release:contains the UniProt release number, e.g.2010_08Last-Modified:contains the UniProt release date, e.g.Tue, 13 Jul 2010 00:00:00 GMT
Format conversion
This service allows you to convert data between different
formats. Note that at the moment only single entries are
supported. Here is a
Java example
(using the
Jakarta Commons HttpClient
library) to convert a UniProtKB entry from txt to rdf format.
Mapping database identifiers
To use our database identifier mapping service programmatically you need to know the abbreviations for the database names. Some databases map only one way.
| Name | Abbreviation | Direction |
|---|
Here are some examples for querying the database mapping service using:
PerlPython
Ruby
Java
