Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2.1

Published July 19, 2004


Annotation of HERV protein sequences

The human genome contains a number of human endogenous retroviruses (HERVs). These proviruses (the integrated form of retroviral DNA) are retroviral sequences that are transmitted vertically as part of the host germ line. A number of HERV 'families' have been identified, each derived from an independent colonisation event.

Some proviruses display open reading frames with coding capacity for a variety of viral-like proteins (Env, Gag, Pol, Pro, etc.). We have already annotated in Swiss-Prot a significant number of HERV proteins. We only include such potential proteins if they are meeting one of these three criteria: i) if there is evidence of their expression by the host, ii) if the derived sequence encodes a full-length protein, iii) if the protein has a potential cellular function.

UniProtKB News

Change in the keyword line (KW)

Keywords are now stored by alphabetical order on the KW lines of both Swiss-Prot and TrEMBL entries.

Format change in the comment line (CC) topic: MASS SPECTROMETRY

We have slightly changed the format for the comment line topic MASS SPECTROMETRY, which reports the exact molecular weight of a protein or part of a protein as determined by mass spectrometric methods. The modifications concern the topic RANGE, which has become mandatory, and the introduction of the new mandatory topic NOTE, which is used to indicate the relevant reference number.

New format:

CC   -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX]; RANGE=XX-XX[ (Name)]; NOTE={Free text (Ref.n)|Ref.n}.


  • 'MW=XXX' is the determined molecular weight (MW);
  • 'MW_ERR=XX' (optional) is the accuracy or error range of the MW measurement;
  • 'METHOD=XX' (optional) is the ionization method;
  • 'RANGE=XX-XX[ (Name)]' (mandatory) is used to indicate what part of the protein sequence entry corresponds to the molecular weight. In case of multiple products, the name of the relevant isoform is enclosed.
  • 'NOTE={Free text (Ref.n)|Ref.n}' (mandatory) indicates the relevant reference, which optionally can be preceded by a comment in free text format.


CC       RANGE=1-284 (Isoform 3); NOTE=Ref.6.

Cross-references to AGD

We have added cross-references to Ashbya genome database, available at

The identifiers of the appropriate DR line are:

Resource abbreviation AGD
Resource identifier AGD's unique identifier for a gene. This is generally the OLN (Ordered Locus Name) for that gene (eg: AAR059C), except for mitochondrial genes where AGD uses an identifier based on the gene name (eg: AgCOB1).
Optional information 1 None; a dash '-' is stored in that field.
DR   AGD; AAR059C; -.

Changes concerning the controlled vocabulary for PTMs

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new introduced controlled vocabularies for PTMs are:

  • Hydroxylation: All entries with annotated hydroxylation sites have the keyword Hydroxylation.

  • Unidentified N-terminal blocking modifications:

     Blocked amino end (Asx)
     Blocked amino end (Xaa)
  • Other new controlled vocabularies for PTMs that are annotated with the feature key MOD_RES:

     Pros-8alpha-FAD-histidine (Keyword: FAD)
     N6-murein peptidoglycan lysine
UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again