Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

UniProt release 2017_06

Published June 7, 2017


Sexual reproduction: good ideas shared with viruses

Sexual reproduction is a brilliant eukaryotic invention that allows the reassortment of alleles through recombination. The first step is the formation of haploid male and female gametes that unite to form a new individual. Most gametes unite by membrane fusion, a process mediated by specialized proteins, called fusogens. The study of these proteins is difficult, since they are often scarce. The few identified so far are clade-specific, such as bindin in echinoderms or izumo in mammals, suggesting that each clade has evolved its own fusion strategy. This is at least what was thought until the discovery of hapless-2 (HAP21), also called generative cell specific-1 (GCS1).

Hapless-2 is a single-span transmembrane protein located at the gamete cell surface, typically at mating structures. It is essential for gamete fusion in the green alga Chlamydomonas reinhardtii, but also in other plants, including Arabidopsis thaliana, and Lilium longiflorum and in protozoans, such as Plasmodium berghei or Tetrahymena thermophila. A thorough eukaryotic genome examination reveals the existence of this gene in many major eukaryotic taxa, from slime molds to the honey bee. It is however not present in fungi, nor in most animals, including humans. The wide evolutionary distribution of hapless-2 suggests it was present in the last eukaryotic common ancestor and lost in some clades later on. Disruption of hapless-2 blocks gamete fusion, but not adhesion to gametes of the opposite mating type (or sex), suggesting that gamete adhesion relies on proteins that are species-specific, but that fusion itself is mediated by an ancestral common gene product.

Earlier this year, the 3D-structure of Chlamydomonas reinhardtii hapless-2 was unraveled. The secondary and tertiary structures of the ectodomain are almost identical to viral class II proteins, such as the envelope protein E of flaviviruses, with which hapless-2 shares very low identity at the amino acid level, and which are also involved in membrane fusion. Fédry et al. hypothesize that these fusion proteins most certainly derived from a common ancestor, whose gene has likely been transferred via horizontal exchange.

Like the flavivirus class II proteins, the hapless-2 ectodomain trimerizes concomitantly with insertion into the membrane of the partner gamete. The trigger for trimerization of hapless-2 is not yet known, although acidification, which drives trimerization of flavivirus class II proteins in late endosomes, is not required.

Information gained from the 3D structure of hapless-2 may help in the development of transmission-blocking vaccines (TBVs), a new strategy to fight malaria (and other protozoan diseases). Successful transmission of Plasmodium from humans to mosquitoes relies on hapless-2-dependent fusion of the parasite gametes and fertilization, which occurs rapidly after ingestion by the mosquito. If TBVs could be designed to induce anti-hapless-2 antibodies in human hosts, these would be ingested by Anopheles mosquitoes along with blood Plasmodium gametocytes. The initial gamete fusion step could be prevented and the deadly cycle of transmission blocked. This approach has already been tested in model animals and, although the preliminary results look promising, they are not yet sufficient for clinical development. The identification of new peptides, that are both functionally crucial and immunogenic, may prove very helpful in the design of efficient anti-malaria TBVs.

As of this release, hapless-2 UniProtKB/Swiss-Prot entries have been created and are publicly available.

1 The acronym HAP2 is somewhat unfortunate, since this protein has nothing to do with the yeast HAP2 transcription factor. These are the mysterious ways of nomenclature, which sometimes may be quite confusing...

UniProtKB news

Modification of cross-references to PATRIC

We have modified our cross-references to the PATRIC database in order to reflect the new PATRIC primary identifier scheme. The earlier identifier scheme used simple numeric ids, e.g.


which were replaced by more informative primary identifiers such as

Text format

Example: Q9ZNI1

Previous format:

DR   PATRIC; 19579917; VBIStaAur99865_1117.

New format:

DR   PATRIC; fig|93061.5.peg.1117; -.

XML format

Example: Q9ZNI1

Previous format:

<dbReference type=“PATRIC” id=“19579917”>
  <property type=“gene designation” value=“VBIStaAur99865_1117”/>

New format:

<dbReference type="PATRIC" id="fig|93061.5.peg.1117"/>

RDF format

Example: Q9ZNI1

Previous format:

  rdfs:seeAlso <> .
  rdf:type up:Resource ;
  up:database <> ;
  rdfs:comment "VBIStaAur99865_1117" .

New format:

  rdfs:seeAlso <> .
  rdf:type up:Resource ;
  up:database <> .

New file linking deleted entries to their subsequently reinstated versions

Since release 2015_04, we are applying at each release a procedure to identify highly redundant proteomes within selected species groups using a combination of manual and automatic methods. This procedure prevents the creation of UniProtKB/TrEMBL entries from these redundant proteomes, but also means that a huge number of previously existing entries had to be deleted from UniProtKB when the procedure was put in place.

It may happen that proteomes that were identified as redundant are later reinstated as non-redundant, e.g. a proteome for a strain used as a model by a significant community or with proteins that have been crystallized. In the past, it has also happened on rare occasions that entries were deleted but later reinstated for other reasons. In such cases, the UniProtKB entries are created anew, with new accession numbers.

To help users to link deleted to subsequently reinstated entries, we are introducing a file that maps old to new accession numbers via their protein_ids. This file is available (in compressed format) by FTP at

This mapping will also be used to make queries for obsolete identifiers on the UniProt website more meaningful.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):

  • Cyclopeptide (Glu-Asn)

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • S-methylmethionine

Deleted term

  • N-acetylated lysine

Changes in subcellular location controlled vocabulary

New subcellular location:

Changes to keywords

New keyword:

UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again