Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 1.12

Published June 21, 2004


Noah's ark or biodiversity in Swiss-Prot

While the crux of Swiss-Prot annotation is targeted toward a number of model organisms (human, Arabidopsis, Drosophila, E. coli, etc.), there is a continual increase in the number of species that are represented in the knowledgebase. Currently Swiss-Prot contains sequences originating from about 8'550 different species. For ~50% of these species there is only one associated entry in Swiss-Prot. This is often a protein whose gene is used for building phylogenetic trees, such as RuBisCO, cytochrome b or hemoglobin. On the other end of the spectrum, the 20 most represented species cover about 40% of the database (60'000 sequences). The most represented species is of course ourselves, with Homo sapiens filling up 7% of Swiss-Prot.

UniProtKB News

Digital Object Identifier (DOI) in the RX line

The Digital Object Identifier (DOI) is a system for identifying and exchanging intellectual property in the digital environment. We introduced the new optional identifier "DOI" to the RX line. It is used to store the Digital Object Identifier of a cited document. The format for this RX line topic is:


The order of the optional topics in an RX line is:

RX   [MEDLINE=Medline_identifier; ][PubMed=Pubmed_identifier; ][DOI=Digital_object_identifier;]


RX   MEDLINE=97291283; PubMed=9145897; DOI=10.1007/s00248-002-2038-4;

Note: The length of a DOI is not restricted. If the topic DOI does not fit into an RX line that already contains a topic, a further RX line will be created, which may be longer than 76 characters.

New line type: RG (Reference Group)

The new reference line 'RG' (Reference Group) has been introduced to list the consortium name associated with a given citation. The RG line is mainly used in submission reference blocks, but can also be used in paper references, if the working group is cited as an author in the paper.

Note: RA (Reference Author) and RG line can be present in the same reference block; at least one RA or RG line is mandatory per reference block.

The same line type has recently been introduced in the EMBL nucleotide sequence database.

The format for this line is:

RG   Consortium_name;


RG   The C. elegans sequencing consortium;
RG   The Brazilian network for HIV isolation and characterization;

Cross-references to EchoBASE

We have added cross-references to EchoBASE, the integrated post-genomic database for E. coli, available at

The identifiers of the appropriate DR line are:

Resource abbreviation EchoBASE
Resource identifier EchoBASE's unique identifier for a gene.
DR   EchoBASE; EB4119; -.