UniProt, GeneID, RefSeq mapping: how does it work?
Last modified August 6, 2010
How does UniProt do GeneID and RefSeq mappings?
As per a protocol we have formalized with the NCBI, we create a RefSeq protein centric-mapping. If a UniProtKB protein (or its isoforms) is 100% identical (end-to-end identical) to a RefSeq Protein and is from the same organism and/or has common EMBL/DDBJ/GenBank accession numbers, then that RefSeq Accession is mapped to the UniProtKB protein and consequently the entry will also get the corresponding GeneID cross-reference.
Why are GeneID cross-references absent from some human entries?
If a UniProtKB protein does not map to a RefSeq Protein, this entry will not have a GeneID cross-reference.
Why do some GeneID entries link to UniProtKB entries, but those UniProtKB entries do not have the GeneID cross-reference?
Apart from the UniProtKB-RefSeq mappings that the UniProt Consortium provides to NCBI, and that are reported in 'NCBI Reference Sequences (RefSeq)' section of RefSeq entry reports, NCBI also computes additional 'Related sequences', which can include UniProtKB proteins and are displayed in a separate section (see NCBI's definition of 'related sequences').
