Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

How can I download the sequences corresponding to a specified domain or region from a list of UniProt entries?

Last modified March 21, 2017

Run your query, e.g. to retrieve the UniProtKB entries annotated to contain disintegrin domains, (or alternatively, with a list of identifiers).

Then click on “Download” and choose to download the results in GFF format.
You can modify the GFF file as follows:

Keep only the lines containing your domain/region, (e.g. “Disintegrin”, “Cytoplasmic” or “Transit”) and ignore all other lines (e.g. using grep). These lines include information about extent of the domains/regions.

Transform the relevant lines (e.g. using a scripting language, or a word processor) from

Q9R158 UniProtKB Domain 392 478 . . . Note=Disintegrin
Q10741 UniProtKB Domain 457 551 . . . Note=Disintegrin
O14672 UniProtKB Domain 457 551 . . . Note=Disintegrin
to
Q9R158[392-478]
Q10741[457-551]
O14672[457-551]

and ignore all other lines.

You will then be able to use the “Retrieve/ID mapping” service to upload the file you obtained from modifying the GFF, and retrieve the corresponding entries. To download the subsequences, select the format “FASTA (source list)” from the download menu.

If you only have a short list of entries, you can also select the domains manually from the entry views by clicking on “Add to basket” at the right hand side of the feature descriptions in the section “Family and domains” of these entries. When you have finished selecting your domains, open the basket and click on “Download”.