The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates.
Whole-genome sequencing has been skewed toward bacterial pathogens as a consequence of the prioritization of medical and veterinary diseases. However, it is becoming clear that in order to accurately measure genetic variation within and between pathogenic groups, multiple isolates, as well as commensal species, must be sequenced. This study examined the pangenomic content of Escherichia coli. Six distinct E. coli pathovars can be distinguished using molecular or phenotypic markers, but only two of the six pathovars have been subjected to any genome sequencing previously. Thus, this report provides a seminal description of the genomic contents and unique features of three unsequenced pathovars, enterotoxigenic E. coli, enteropathogenic E. coli, and enteroaggregative E. coli. We also determined the first genome sequence of a human commensal E. coli isolate, E. coli HS, which will undoubtedly provide a new baseline from which workers can examine the evolution of pathogenic E. coli. Comparison of 17 E. coli genomes, 8 of which are new, resulted in identification of approximately 2,200 genes conserved in all isolates. We were also able to identify genes that were isolate and pathovar specific. Fewer pathovar-specific genes were identified than anticipated, suggesting that each isolate may have independently developed virulence capabilities. Pangenome calculations indicate that E. coli genomic diversity represents an open pangenome model containing a reservoir of more than 13,000 genes, many of which may be uncharacterized but important virulence factors. This comparative study of the species E. coli, while descriptive, should provide the basis for future functional work on this important group of pathogens.