UniProt release 6.7
Published December 20, 2005
'De-merge' of multi-species in UniProtKB/Swiss-Prot
UniProtKB/Swiss-Prot as a non-redundant protein database used to "merge" entries originating from different species, if there were 100% conserved. In merged entries, information about the source of each organism was noted in the OS (Organism Species) lines, e.g. actin, P03996 (ACTA_HUMAN):
OS Homo sapiens (Human), Mus musculus (Mouse), Rattus norvegicus (Rat), OS Bos taurus (Bovine), and Oryctolagus cuniculus (Rabbit).
However, the OC (Organism Classification) lines only contained the taxonomy of the first listed species, and the "species part" of the entry name was built on the first organism in the list ("_HUMAN").
As the type of information on proteins has greatly evolved, and more and more data have been documented that are species specific, Swiss-Prot had to adapt and change its merging policy. While it may seem to contradict the principle of non-redundancy on the sequence level to create two or more entries for an identical sequence, this does make sense from the annotation point of view. The new policy allows to clarify which information item has been proven for which organism. Even if a protein has the same sequence in two or more different organisms, there may be evidence for different post-translational modifications, sequence variants, alternative splicing, protein-protein interactions, tissue specificity, and implication in diseases. Moreover, since some organism-specific scientific communities use different gene name nomenclatures, it is important to reflect such species-specific nomenclature usage.
With this release, we have completed the de-merging of all the UniProtKB/Swiss-Prot entries (almost 6'000) that contained information relative to two or more distinct species.
The primary accession number of a formerly merged entry has been retained as a secondary accession number in all of the resulting de-merged entries. A new primary accession number has been attributed to all de-merged entries.
In the example above: ACTA_HUMAN (old primary AC: P03996, old secondary AC: P04108) has been de-merged into:
|entry name||new primary AC||secondary ACs|
|ACTA_BOVIN||P62739||P03996 P04108 Q862W5|
|ACTA_RAT||P62738||P03996 P04108 P70476|
Changes concerning keywords
- Matrix protein -> Viral matrix protein
- Seed embryo
Changes concerning the controlled vocabulary for PTMs
New terms for the feature key 'CROSSLNK':
- Cyclopeptide (Arg-Cys)
- Cyclopeptide (Cys-Arg)
New terms for the feature key 'MOD_RES':
- Cysteine methyl disulfide