Downloaded data seems incomplete or corrupted - how can I get help with download problems?
Last modified May 15, 2015
Every folder on our FTP server contains a file called RELEASE.metalink that specifies the size and MD5 checksum of every file in that folder, e.g.
Metalink is an extensible metadata file format that describes one or more computer files available for download. It facilitates file verification and recovery from data corruption and lists alternate download sources (mirror URIs).
Example: The following command will download all files in the
current_release/ folder and verify their MD5 checksums:
curl --metalink ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/RELEASE.metalink
They will be downloaded from one of the alternative locations mentioned in the metalink file. If one FTP server goes down during a download, programs can automatically switch to another mirror location. Some programs can also download segments from several FTP locations at the same time, which can make downloads much faster.
Please note that UniProt can be downloaded from the consortium member FTP sites at three different geographical locations:
Due to HTTP transport unreliability (HTTP streams tend to fail after a while due to packet loss), large downloads should be split into smaller chunks using the “offset” and “limit” functions. These are described in our FAQ for programmatic access.
1) Start by retrieving the number of results in your query by checking the “X-Total-Results” header like in the example Download all UniProt sequences for a given organism in FASTA format.
2) If the number of results x is greater than 50000, repeat your query and append the following to the URL:
&offset=0&limit=50000 &offset=50000&limit=50000 &offset=100000&limit=50000 etc.
http://www.uniprot.org/uniprot/?query=organism:%22Homo%20sapiens%20(Human)%20%22&fil=&offset=0&limit=50&compress=yes&format=fasta http://www.uniprot.org/uniprot/?query=organism:%22Homo%20sapiens%20(Human)%20%22&fil=&offset=50&limit=50&compress=yes&format=fasta http://www.uniprot.org/uniprot/?query=organism:%22Homo%20sapiens%20(Human)%20%22&fil=&offset=100&limit=50&compress=yes&format=fasta
3) Once you have your download, use
gzip -t to check the integrity of your file. Uncompress the chunks and concatenate them into a single download file.