Skip Header

You are using a version of Internet Explorer that may not display all features of this website. Please upgrade to a modern browser.

UniProt release 2.0

Published July 5, 2004

Headlines

UniProtKB/Swiss-Prot major release (44.0)

Release 44.0 of Swiss-Prot contains 153'825 sequence entries, comprising 56'599'343 amino acids abstracted from 117'387 references. 6'633 sequences have been added since release 43, the sequence data of 582 existing entries has been updated and the annotations of 139'855 entries have been revised. This represents an increase of 4%.

Many improvements were carried out in the last 3 months at the level of the GN, RX, CC and FT lines.

Full statistics and release notes

UniProtKB News

Incorporation of new entries into the biweekly UniProtKB releases of Swiss-Prot and TrEMBL

The files provided in the ftp directory /databases/uniprot/knowledgebase/new/ (and known as TrEMBL_New) have been removed. These files contained new sequence entries and sequences to be used to update existing Swiss-Prot or TrEMBL entries. Until now, these entries were integrated into Swiss-Prot and TrEMBL mostly only at full releases of these databases. We now incorporate these new and updated sequences into the biweekly UniProtKB releases of Swiss-Prot and TrEMBL (/databases/uniprot/knowledgebase/uniprot_sprot* and /databases/uniprot/knowledgebase/uniprot_trembl*), and, therefore, the distribution of these files is no longer necessary.

New format for the GN (Gene Name) line

We have introduced a new format for the GN (Gene Name) line and all gene names have been converted to mixed case. The new format is more structured than the previous one, in order to distinguish between three types of information:

  1. Gene names (a.k.a gene symbols). The names(s) used to represent a gene. As there can be more than one name assigned to a gene. We make a distinction between the one which we believe should be used as the official gene name and the other names which are listed as "Synonyms".
  2. Ordered locus names (a.k.a. OLN, ORF numbers, CDS numbers or Gene numbers). A name used to represent an ORF in a completely sequenced genome or chromosome. It is generally based on a prefix representing the organism and a number which usually represents the sequential ordering of genes on the chromosome. Depending on the genome sequencing center, numbers are attributed only to protein-coding genes, or also to pseudogenes, or also to tRNAs and other features. Examples: HI0934, Rv3245c, At5g34500, YER456W.
  3. ORF names (a.k.a. Sequencing names or Contig names or Temporary ORFNames). A name temporarily attributed by a sequencing project to an open reading frame. This name is generally based on a cosmid numbering system. Examples: MtCY277.28c, SYGP-ORF50, SpBC2F12.04, C06E1.1, CG10954.

The new format of the GN line is:

GN   Name=<name>; Synonyms=<name1>[, <name2>...]; OrderedLocusNames=<name1>[, <name2>...];
GN   ORFNames=<name1>[, <name2>...];

None of the above four tokens are mandatory. But a "Synonyms" token can only be present if there is a "Name" token.

If there is more than one gene, GN line blocks for the different genes are separated by the following line:

GN   and

Wrapping is done preferentially at a semicolon, otherwise at a comma.

Examples:

GN   Name=atpG; Synonyms=uncG, papC;
GN   OrderedLocusNames=b3733, c4659, z5231, ECs4675, SF3813, S3955;
GN   ORFNames=SPAC1834.11c;
GN   Name=cysA1; Synonyms=cysA; OrderedLocusNames=Rv3117, MT3199;
GN   ORFNames=MTCY164.27;
GN   and
GN   Name=cysA2; OrderedLocusNames=Rv0815c, MT0837; ORFNames=MTV043.07c;

Cross-references to IntAct

We have added cross-references to IntAct, the Protein interaction database and analysis system available at http://www.ebi.ac.uk/intact/.

The identifiers of the appropriate DR line are:

Resource abbreviation IntAct
Resource identifier The Swiss-Prot primary AC number for the protein. This is used by IntAct as a link to all the interactions in which that protein is involved.
Example
P14653:
DR   IntAct; P14653; -.

Change in cross-references to Reactome (former GK)

The Genome Knowledgebase (GK) was renamed to Reactome. We changed the database name in the relevant cross-references (DR lines) accordingly.

Example:

DR   Reactome; Q9BZJ0; -.

Changes concerning the controlled vocabulary for PTMs

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new introduced controlled vocabularies for PTMs are:

  • Hydroxylation: All entries with annotated hydroxylation sites have the keyword Hydroxylation.

     3,4-dihydroxyarginine
     3,5-dihydroxylysine
    
  • Unidentified N-terminal blocking modifications:

     Blocked amino end (Ala)
     Blocked amino end (Arg)
     Blocked amino end (Asp)
     Blocked amino end (Cys)
     Blocked amino end (Gln)
     Blocked amino end (Glu)
     Blocked amino end (Gly)
     Blocked amino end (Ile)
     Blocked amino end (Met)
     Blocked amino end (Pro)
     Blocked amino end (Ser)
     Blocked amino end (Thr)
     Blocked amino end (Val)
    
  • Unidentified N-terminal blocking modifications:

     Blocked carboxyl end (His)