Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Downloaded data seems incomplete or corrupted - how can I get help with download problems?

Last modified May 15, 2015

FTP downloads

Every folder on our FTP server contains a file called RELEASE.metalink that specifies the size and MD5 checksum of every file in that folder, e.g.
ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/RELEASE.metalink

Metalink is an extensible metadata file format that describes one or more computer files available for download. It facilitates file verification and recovery from data corruption and lists alternate download sources (mirror URIs).

Various command line download tools, e.g. cURL version 7.30 or higher and aria2, support metalink.

Example: The following command will download all files in the current_release/ folder and verify their MD5 checksums:

curl --metalink ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/RELEASE.metalink

They will be downloaded from one of the alternative locations mentioned in the metalink file. If one FTP server goes down during a download, programs can automatically switch to another mirror location. Some programs can also download segments from several FTP locations at the same time, which can make downloads much faster.

Please note that UniProt can be downloaded from the consortium member FTP sites at three different geographical locations:

USA: ftp://ftp.uniprot.org/pub/databases/uniprot
UK: ftp://ftp.ebi.ac.uk/pub/databases/uniprot
Switzerland: ftp://ftp.expasy.org/databases/uniprot

HTTP downloads

Due to HTTP transport unreliability (HTTP streams tend to fail after a while due to packet loss), large downloads should be split into smaller chunks using the “offset” and “limit” functions. These are described in our FAQ for programmatic access.

1) Start by retrieving the number of results in your query by checking the “X-Total-Results” header like in the example Download all UniProt sequences for a given organism in FASTA format.

2) If the number of results x is greater than 50000, repeat your query and append the following to the URL:

&offset=0&limit=50000
&offset=50000&limit=50000
&offset=100000&limit=50000 etc.

Also use compress=yes

e.g. (using 50 instead of 50000 to make the file more manageable in the browser)
http://www.uniprot.org/uniprot/?query=organism:%22Homo%20sapiens%20(Human)%20[9606]%22&fil=&offset=0&limit=50&compress=yes&format=fasta
http://www.uniprot.org/uniprot/?query=organism:%22Homo%20sapiens%20(Human)%20[9606]%22&fil=&offset=50&limit=50&compress=yes&format=fasta
http://www.uniprot.org/uniprot/?query=organism:%22Homo%20sapiens%20(Human)%20[9606]%22&fil=&offset=100&limit=50&compress=yes&format=fasta

etc.

3) Once you have your download, use gzip -t to check the integrity of your file. Uncompress the chunks and concatenate them into a single download file.