Files
Abstract
A sound genome assembly and robust annotations are essential to the differential analysis of bacterial genomes. Using a case study data set of newly sequenced Vibrio vulnificus genomes, both the biology of these bacteria, and the bioinformatics processes that support identification of the similarities and differences found within the different isolates of V. vulnificus, were examined. The two main themes of this research are 1) identification of the virulence and survival mechanisms of clinical and environmental biotypes of Vibrio vulnificus and 2) quantification of the impact of different analysis choices on the overall biological conclusions of the study. Whole genome sequencing, in conjunction with comparative genomics, are current techniques used to capture the genetic and functional repertoire of organisms. It is important to consider and track analytic provenance in bacterial genomics because the impact of making alternate workflow choices can involve changing the biological interpretation of hundreds of genes, even in relatively simple bacterial genomes. Chapter 1 describes the bioinformatics analyses used to determine the draft genome sequences of three environmental genotype Vibrio vulnificus reference genomes and to identify genotype-specific genomic regions. Chapter 1 also highlights the functional systems including the virulence and survival genes that differentiate between clinical and environmental Vibrio vulnificus genotypes. Chapter 2 explores the direct impact of the parameter and methods selected during the assembly and annotation stage of a genome project. Despite decades of advances in ab initio gene prediction, method and parameter choices still strongly influence the identification of genes, and therefore the biologically significant results in a comparative genomics analysis. Using a benchmarking approach based on simulation studies with a related genome, it is possible to identify an optimal assembly-to-annotation pipeline for the collection of V. vulnificus strains. A software framework for comparing the outcomes of different assembly-to-annotation workflows was constructed in the Taverna workflow management system and used to carry out the bioinformatics experiments described in Chapter 3. Chapter 3 expands on the analysis performed in Chapter 1 by performing an extensive comparative genomics analysis of newly sequenced Vibrio vulnificus genomes, each ones represents the different biological classifications found within this species. The analysis of these genomes reveals genes that are specific to each of the biotypes. Comparative analysis of representative strains from each of the established Vibrio vulnificus biotypes is used to identify differentiating genes, which may relate to the apparent host-specificity of the different biotypes.