A current approach to integrate genomic data with other information sources is gene set enrichment analysis (GSEA). In the latter, one first identifies a group of genes that are individually associated with virulence and then searches for motifs with more nodes included in that set than expected by chance. In this project we propose a different method that evaluates directly network motif significance. This can improve the method performance in the detection of associations between data sources. The project will start with the reconstruction of the metabolic and transcriptional networks of S. pneumoniae from database and literature sources. Both will be major achievements, useful for the study of virulence determinants, and for the vast community studying fundamental biology and pathogenesis of S. pneumoniae. Such an effort will constitute an important model for other streptococcal pathogens. Next, we will develop new methods to integrate network information with CGH and epidemiological data for a collection of pneumococcal strains. These will allow us to identify nodes or motifs significantly associated with virulence. Lastly, topological analysis of the virulence motifs and their insertion spots within the networks will answer questions about the typical structural properties of significant motifs. Study of network motifs has been a pivotal tool to understand the connection between complex network topology and cellular function (13). It is our strong believe that it can also be a fruitful tool in the clarification of virulence molecular mechanisms, where, to our knowledge, it has never been applied.
LASIGE is supported by FCT, project UID/CEC/00408/2019