Retrieve / ID mapping
Last modified August 25, 2015
Select the Retrieve/ID mapping tab of the toolbar and enter or upload a list of identifiers (or gene names) to do one of the following:
- Retrieve the corresponding UniProt entries to download them or work with them on this website.
- Convert identifiers which are of a different type to UniProt identifiers or vice versa, and download the identifier lists.
How to use this tool
- Enter identifiers or upload them from a file, separated by a space or a new line, into the form field, for example: P31946 P62258 ALBU_HUMAN
- If you need to convert to another identifier type (as performed previously by the “ID mapping” service), select the source and target type from the “From/To” dropdown menus under “Options”. Otherwise, to retrieve or download a list UniProtKB entries, keep the default selection of these menus (from UniProtKB AC/ID to UniProtKB)
- Click the Go button.
The following kinds of UniProt identifiers are supported:
|P00750-2||UniProtKB entry isoform sequence|
|P00750[39-81]||UniProtKB sequence range|
|A4_HUMAN||UniProtKB entry name|
When mapping from a source database external to UniProt, you can submit any identifier as used in the UniProtKB cross-references . If your job is not successful and you are not sure which source database to use, try a text search in UniProtKB with one of your identifiers, and look at an example entry. Check out the cross-reference section to find out which database uses these identifiers.
Further queries involving your UniProtKB data sets
After you have submitted your data, you are forwarded to a query result page showing the correspondence of submitted identifiers (from external databases, or obsolete UniProtKB identifiers) with current UniProtKB accession numbers. You can use the basket, download and align services like in any query result, as well as reconfigure the table layout (“Columns”) or add additional constraints to your query.
Jobs have unique identifiers, which (depending on the job type) can be used in queries (e.g. to get the intersection of two sequence similarity searches). Job identifiers and the related data are kept for 7 days, and are then deleted.
The list of identifiers that could not be mapped can be retrieved for further inspection or analysis.
When mapping popular sequence database identifiers such as RefSeq, gi numbers, EMBL, EMBLCDS to UniProtKB, unmapped identifiers can be further mapped to UniParc. This can be particularly useful for proteins from redundant proteomes.
Code examples for programmatic access to the database identifier mapping service are available in our FAQ about programmatic access.
- Very large mapping requests (>100,000 identifiers) are likely to fail. If you prefer to run your mapping locally, you can also download the data underlying this service.
- For performance reasons, databases where the mapping relationship to UniProtKB identifiers is one-to-many, e.g. GO, InterPro or PubMed, are not supported. For limited lists of such identifiers, it is possible to query UniProtKB using the text search form with identifiers combined by “or”, e.g. “interpro IPR014000” OR “interpro IPR014002” OR “interpro IPR014003”. One can then further use the Columns button to remove unwanted columns from the table view, or edit the query string (URL) to add “&columns=id,database(interpro)” to it. The same addition can be made to the URL for download of the tab-separated view, e.g. /uniprot/?query=%22interpro+IPR014000%22+or+%22interpro+IPR014002%22+or+%22interpro+IPR014003%22&format=tab&columns=id,database).
See also: Related questions from our FAQ
Related terms: batch, bulk