Comparative genomics of protoploid Saccharomycetaceae.
Souciet J.-L., Dujon B., Gaillardin C., Johnston M., Baret P.V., Cliften P., Sherman D.J., Weissenbach J., Westhof E., Wincker P., Jubin C., Poulain J., Barbe V., Segurens B., Artiguenave F., Anthouard V., Vacherie B., Val M.-E., Fulton R.S., Minx P., Wilson R., Durrens P., Jean G., Marck C., Martin T., Nikolski M., Rolland T., Seret M.-L., Casaregola S., Despons L., Fairhead C., Fischer G., Lafontaine I., Leh V., Lemaire M., de Montigny J., Neuveglise C., Thierry A., Blanc-Lenfle I., Bleykasten C., Diffels J., Fritsch E., Frangeul L., Goeffon A., Jauniaux N., Kachouri-Lafond R., Payen C., Potier S., Pribylova L., Ozanne C., Richard G.-F., Sacerdot C., Straub M.-L., Talla E.
Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call "protoploid" because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified.