Skip Header

 

Release 5.0

Published May 10, 2005

Headlines

UniProtKB/Swiss-Prot major release (47.0)

Release 47.0 of UniProtKB/Swiss-Prot contains 181'571 sequence entries, comprising 65'742'349 amino acids abstracted from 128'438 references. 11'531 sequences have been added since release 46, the sequence data of 841 existing entries has been updated and the annotations of 166'572 entries have been revised. This represents an increase of 6%.

Many improvements were carried out in the last 3 months. In particular, we have introduced 5 new feature keys in order to better describe different types of regions in a protein sequence, and we added an additional qualifier to our cross-references to nucleotide sequence databases, the molecule type.

All the recent changes to the Swiss-Prot format are described in detail in the continuously updated document:

UniProt Knowledgebase release 5.0 includes Swiss-Prot release 47.0 and TrEMBL release 30.0.

Full statistics and release notes

UniProtKB News

Format change in the DR line

The DR (Database cross-Reference) lines are used as pointers to information related to entries and found in data collections other than Swiss-Prot. Until now, the format of a DR line was:

DR   DATABASE_IDENTIFIER; PRIMARY_IDENTIFIER; SECONDARY_IDENTIFIER[; TERTIARY_IDENTIFIER].

We have introduced a forth identifier, changing the DR line format to:

DR   DATABASE_IDENTIFIER; PRIMARY_IDENTIFIER; SECONDARY_IDENTIFIER[; TERTIARY_IDENTIFIER][; QUATERNARY_IDENTIFIER].

The database cross-references to EMBL is affected by this modification (see below).

Changes in the cross-references to EMBL

The biological source of the molecule has been added as quaternary identifier to the cross-reference (DR line) of the EMBL database.

Former format:

DR   EMBL; ACCESSION_NUMBER; PROTEIN_ID; STATUS_IDENTIFIER.

New format:

DR   EMBL; ACCESSION_NUMBER; PROTEIN_ID; STATUS_IDENTIFIER; MOLECULE_TYPE.

The molecule type is controlled vocabulary and currently includes:

Examples:

DR   EMBL; M68939; AAA26107.1; -; Genomic_DNA.
DR   EMBL; U56386; AAB72034.1; -; mRNA.

Cross-references to PANTHER

Cross-references have been added to PANTHER, which stands for Protein ANalysis THrough Evolutionary Relationships, a classification system that was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. Proteins have been classified according to families and subfamilies, molecular functions, biological processes and pathways. PANTHER is available at https://panther.appliedbiosystems.com/.

The format for the explicit links are:

Data bank identifier PANTHER
Primary identifier PANTHER's unique identifier for a protein family or sub-family.
Secondary identifier PANTHER's entry name for a protein family or sub-family.
Tertiary identifier Number of domains found, which is generally 1, rarely 2 for the fusion of identical domains/proteins.
Example
O59826:
DR   PANTHER; PTHR11732:SF69; KCNAB_channel; 1.

New feature (FT) keys and redefinition of existing FT keys

The feature keys DOMAIN and SITE were used to describe distinct types of regions in a protein sequence and we found this situation unsatisfactory. We therefore redefined these two feature keys and introduced 5 new ones.

Redefinition of the feature keys DOMAIN and SITE:

Description of the 5 new feature keys:

The introduction of these new feature keys allows to establish a clear sorting order for feature tables. The following order is used:

1. Molecule processing
     * INIT_MET, SIGNAL, PROPEP, TRANSIT, CHAIN, PEPTIDE
2. Regions
     * TOPO_DOM, TRANSMEM
     * DOMAIN, REPEAT
     * CA_BIND, ZN_FING, DNA_BIND, NP_BIND
     * REGION
     * COILED
     * MOTIF
     * COMPBIAS
3. Sites
     * ACT_SITE
     * METAL
     * BINDING
     * SITE
4. Amino acid modifications (pre and PTM)
     * SE_CYS
     * MOD_RES
     * LIPID
     * CARBOHYD
     * DISULFID
     * CROSSLNK
5. Natural variations
     * VARSPLIC
     * VARIANT
6. Experimental info
     * MUTAGEN
     * UNSURE
     * CONFLICT
     * NON_CONS
     * NON_TER
7. Secondary structure
      * HELIX, TURN, STRAND

Keys of equal priority (listed on one line above) are ordered according to sequence positions.