How can I download the sequences corresponding to a specified domain or region from a list of UniProt entries?
Last modified February 1, 2012
Then click on "Download" (the orange button at the top right) and choose to download the results in GFF format. You can modify the GFF file as follows:
Keep only the lines containing your domain/region, (e.g. "Disintegrin", "Cytoplasmic" or "Transit") and ignore all other lines (e.g. using grep). These lines include information about extent of the domains/regions.
Transform the relevant lines (e.g. using a scripting language, or a word processor) from
Q9R158 UniProtKB Domain 392 478 . . . Note=Disintegrin Q10741 UniProtKB Domain 457 551 . . . Note=Disintegrin O14672 UniProtKB Domain 457 551 . . . Note=Disintegrinto
Q9R158[392-478] Q10741[457-551] O14672[457-551]
and ignore all other lines.
You will then be able to use the "Retrieve" tab to upload the file you obtained from modifying the GFF, and retrieve the corresponding sequences in FASTA format.
If you only have a short list of entries, you can also select the domains manually by clicking in the checkboxes in the section "Sequence annotation (Features)" of these entries. Once one or more sequences have been marked, the "Retrieve" button in the green bar becomes available. After clicking on this button to submit your data, you are forwarded to a download page that lists the available formats, including FASTA.