[PhD Thesis] Integrating distributed post-genomic data to infer the molecular basis of bacterial phenotypes

Craddock, T

[PhD Thesis] Integrating distributed post-genomic data to infer the molecular basis of bacterial phenotypes

Lookup NU author(s): Tracy Craddock

Downloads

Full text is not currently available for this publication.

Abstract

The aim of the project described in this thesis is to understand and predict the characteristics and behaviour of a family of bacteria through an analysis of genome wide data from a variety of sources. The focus of the research is a family of bacteria, Bacillus, whose members show a diverse range of phenotypes, from the non-pathogenic B. subtilis to B. anthracis, the causative agent of anthrax. Specifically, the focus was on the genomic scale identification and characterisation of secreted proteins from Bacillus species. Firstly, the application of Grid-based computational approaches to problems in genomic analysis and annotation was investigated, applying myGrid technology to a biological problem not previously addressed using this approach. e-Science workflows and a service-oriented approach were developed and applied to predict and characterise secreted proteins, and the results automatically integrated into a custom relational database. An associated Web portal was also developed to facilitate expert curation, results browsing and querying over the database. Workflow technology was also used to classify the putative secreted proteins into families and to study the relationships between and within these families. The design of the workflows, the architecture and the reasoning behind the approach used to build this system, called BaSPP, are discussed. Analysis of the putative Bacillus secretomes revealed clear distinctions between proteins present in the pathogens and those in the non-pathogens. The properties of the protein families present in all Bacillus secretomes, as well as those specific either to the pathogens or to the non-pathogens were investigated. Many of the protein families contained members of unknown function. In the second part of the project, these families were investigated in more depth, using additional data integration strategies not previously applied to these organisms. The secretomes were modelled at the system level, in the broader context of interactomes. A system called SubtilNet was therefore developed, using B. subtilis as the reference organism. As part of SubtilNet, a toolkit and architecture were developed and implemented for building and analysing probabilistic functional integrated networks (PFINs). The PFINs built for each Bacillus species using this system were subsequently used to delve further into the interactions specific to the secreted proteins by extracting and exploring the cross-species PFINs of these proteins. The cross-species PFINs for the protein families specific to the pathogens and non-pathogens were explored, with particular emphasis on the core PrsA-like protein family, which acted as a use case to show how the PFINs can be used to shed light on protein function. The addition of orthologous links between species was demonstrated to facilitate network clustering and analysis, enabling putative annotations to be applied to proteins previously of unknown function.

Publication metadata

Author(s): Craddock T

Publication type: Report

Publication status: Published

Series Title:

Year: 2007

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne

ePrints

[PhD Thesis] Integrating distributed post-genomic data to infer the molecular basis of bacterial phenotypes

Downloads

Abstract

Publication metadata

Share