Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

How can I access resources on this web site programmatically?

Last modified May 27, 2015

This document describes how this web site can be accessed with programs. All resources (individual entries as well as sets of entries retrieved by queries) are accessible using simple URLs (REST) that can be bookmarked, linked and used in programs.

Please consider to provide your email address as part of the User-Agent header that your programs set. This will allow us to contact you in case of problems.

Contents

Retrieving individual entries
Batch retrieval of entries (cf. "Retrieve/ID mapping")
Retrieving entries via queries
Retrieving a random entry
Downloading data at every UniProt release
Release number and date
Format conversion
Mapping database identifiers (cf. "Retrieve/ID mapping")

Retrieving individual entries

The web address for an entry consists of a data set name (e.g. uniprot, uniref, uniparc, taxonomy, ...) and the entry’s unique identifier, e.g.:

 http://www.uniprot.org/uniprot/P12345

By default, a web page is returned. Depending on the data set, other formats may also be available (click on “Formats” on the entry’s web page). Here are some examples:

http://www.uniprot.org/uniprot/P12345.txt
http://www.uniprot.org/uniprot/P12345.xml
http://www.uniprot.org/uniprot/P12345.rdf
http://www.uniprot.org/uniprot/P12345.fasta
http://www.uniprot.org/uniprot/P12345.gff

http://www.uniprot.org/uniref/UniRef90_P04259.xml
http://www.uniprot.org/uniref/UniRef90_P04259.rdf
http://www.uniprot.org/uniref/UniRef90_P04259.fasta
http://www.uniprot.org/uniref/UniRef90_P04259.tab

http://www.uniprot.org/uniparc/UPI000000001F.xml
http://www.uniprot.org/uniparc/UPI000000001F.rdf
http://www.uniprot.org/uniparc/UPI000000001F.fasta
http://www.uniprot.org/uniparc/UPI000000001F.tab

Note that UniRef identifiers cannot be guaranteed to be stable, since the sequence clusters are recomputed at every release, and the representative protein may change. See also: How to link to UniProt entries.

For the RDF/XML format there is an option to include data from referenced data sets directly in the returned data:

 http://www.uniprot.org/uniprot/P12345.rdf?include=yes 

The following status codes may be returned:

Code Description
200 The request was processed successfully.
400 Bad request. There is a problem with your input.
404 Not found. The resource you requested doesn’t exist.
410 Gone. The resource you requested was removed.
500 Internal server error. Most likely a temporary problem, but if the problem persists please contact us.
503 Service not available. The server is being updated, try again later.

Resolving RDF identifiers

A request for an address such as

 http://purl.uniprot.org/uniprot/P12345

will be resolved, where possible, by redirection to the corresponding resource (see previous section). For UniProt resources, entries are returned in RDF/XML format if the HTTP 'Accept' request header is set to 'application/rdf+xml'.

Batch retrieval of entries

Entries can be retrieved in batch by querying our Retrieve/ID mapping service with a list of UniProt identifers. Here is a Perl example.

Retrieving entries via queries

You can use any query to define the set of entries that you are interested in. Best start with an interactive text search to find the base URL for your set, e.g. all reviewed human entries:

http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606

or all reviewed entries that were created in the current UniProtKB/Swiss-Prot release:

http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+created:[current TO *]
There are several ways of obtaining the URL corresponding to your query for programmatic use:
  • The URL for the html format with the default column setup can always be found in the browser’s location bar.
  • For alternative formats (e.g. fasta, tab-separated), you can click on Download to explore the various download formats available for your query. You can request a preview by selecting a format and clicking on Preview first 10. In that case, the browser’s location bar will contain the URL corresponding to your query and format selections, provided that it is not longer than 1000 characters. All you will have to do before using the URL programmatically is to remove the "&limit=10" part.
  • The Share button gives access to the base URL for your query with all the columns in your web view (see also Customise and share your search results). The URL is shown irrespective of its length, even if it may exceed the limitation for the length of GET requests (dependent on client and server software). The default format that this URL generates is the HTML view format. To define a download format, you can append the format of your choice to the URL (i.e. &format=). The table below describes the parameters that you can append to your base URL to retrieve the entries in this format. For example, if you wanted to download the UniProtKB results for ‘insulin’ with the default columns in tab-separated format:
http://www.uniprot.org/uniprot/?query=insulin&sort=score&columns=id,entry name,reviewed,protein names,genes,organism,length&format=tab  

Tips:

The URL for a query result consists of a data set name (e.g. uniprot, uniref, uniparc, taxonomy, ...) and the actual query. The following query parameters are supported:

Parameter Values Description
query string See query syntax and query fields for UniProtKB.
An empty query string will retrieve all entries in a data set. Tip: Click Advanced
in the search bar.
format html | tab | xls | fasta | gff | txt | xml | rdf | list | rss Format in which to return results:
  • tab returns data for the selected columns in tab-separated format.
  • xls returns data for the selected columns for import into Excel.
  • fasta returns sequence data only, where applicable.
  • gff returns sequence annotation, where applicable.
  • txt, xml and rdf return full entries.
  • list returns a list of identifiers.
  • rss returns an OpenSearch RSS feed.
Tip: Click Download above the list of results.
columns comma-separated list of values, e.g. for UniProtKB: citation | clusters | comments | domains | domain | ec | id | entry name | existence | families | features | genes | go | go-id | interactor | keywords | last-modified | length | organism | organism-id | pathway | protein names | reviewed | sequence | 3d | version | virus hosts Columns to select for retrieving results in tab or xls format. Tip: Some columns can be parameterized,
e.g. database(PDB) (see the example at the
end of this section). Click Columns on the search results page (Full list of UniProtKB column names).
include yes | no Include isoform sequences when the format parameter is set to fasta.
Include description of referenced data when the format parameter is set to rdf.
This parameter is ignored for all other values of the format parameter.
compress yes | no Return results gzipped. Note that if the client supports HTTP compression,
results may be compressed transparently even if this parameter is
not set to yes.
limit integer Maximum number of results to retrieve.
offset integer Offset of the first result, typically used together with
the limit parameter.

The following example retrieves all human entries matching the term ‘antigen’ in RDF/XML and tab-separated format, respectively.

http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=rdf&compress=yes
http://www.uniprot.org/uniprot/?query=organism:9606+AND+antigen&format=tab&compress=yes&columns=id,reviewed,protein names

The next example retrieves all human entries with cross-references to PDB in tab-separated format, showing only the UniProtKB and PDB identifiers.

http://www.uniprot.org/uniprot/?query=organism:9606+AND+database:pdb&format=tab&compress=yes&columns=id,database(PDB)

Retrieving a random entry

You can retrieve a random entry by appending &random=yes to any query, e.g. the following query returns a random reviewed human entry:

http://www.uniprot.org/uniprot/?query=reviewed:yes+AND+organism:9606&random=yes

Downloading data at every UniProt release

The HTTP header Last-Modified: will avoid that you download data more than once per release, if you use a download tool that makes use of this information, e.g. the unix commands lwp-mirror or curl with the -z option. Here are examples of how to do this in Perl:

  • Download all UniProt sequences for a given organism in FASTA format
  • Download the UniProt complete or reference proteomes for all organisms below a given taxonomic node in FASTA format

Release number and date

If you would like to record the UniProt release number and/or date of the data which you retrieve, you can extract this information from the HTTP header of the response (see this Perl example):
  • X-UniProt-Release: contains the UniProt release number, e.g. 2010_08
  • Last-Modified: contains the UniProt release date, e.g. Tue, 13 Jul 2010 00:00:00 GMT

Format conversion

This service allows you to convert data between different formats. Note that at the moment only single entries are supported. Here is a Java example (using the Jakarta Commons HttpClient library) to convert a UniProtKB entry from txt to rdf format.

Mapping database identifiers

To use our database identifier mapping service programmatically you need to know the abbreviations for the database names. Some databases map only one way.

Name Abbreviation Direction

Here are some examples for querying the database mapping service using:

  • Perl
  • Python
  • Ruby
  • Java

Related terms: programmatic access, program, script, wget, curl, web services, API