Functional analysis of gene lists, networks and regulatory systems
Date
2010-05-19T09:21:47Z
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Kaasaegne biotehnoloogia võimaldab kirjeldada elu aluseid suuremahulistes eksperimentides. Näiteks saab mõõta tuhandete geenide avaldumistasemeid, otsida DNA-st mustreid või kirjeldada interaktsioonivõrgustikke. Andmete interpreteerimine hõlmab arvutuslikke väljakutseid. Käesolev doktoritöö sisaldab meetodeid, arvutuslikke eksperimente ja tarkvara geenigruppide, võrgustike ja regulatoorsete süsteemide analüüsiks. Funktsionaalne analüüs on tõlgenduslik ülesanne, mille käigus seostatakse eksperimentaalsed leiud faktidega bioloogilistest protsessidest, komponentidest, jne. Kuna teadmiste baas on ebatäielik, kasutatakse selleks statistilise rikastatuse hinnangut. Töö käigus on arendatud kolm tööriista: g:Profiler, GraphWeb ja KEGGanim on veebipõhised vahendid geenigruppide, -võrgustike ja radade tõlgendamiseks. Samuti on logistilise regressiooni põhjal loodud algoritm eGLM (Ensemble Generalised Linear Models), mis valib andmetest tugevaimaid ennustavaid efekte. Töö bioloogiline osa pühendub pagaripärmi S. cerevisiae regulatsiooni analüüsile. Esmalt koostati transkriptsioonifaktoreid (TF) kirjeldav andmestik ning valideeriti see funktsionaalse analüüsi teel. Seejärel rakendati andmestikul eGLM meetodit protsessispetsiifiliste TF-ide ennustamiseks. Arvutused hästikirjeldatud rakutsükli andmetelel näitavad, et eGLM valib välja kõik põhiregulaatorid ning edestab standardseid meetodeid. Vähetuntud statsionaarfaasi (G0) analüüsis leiti relevantseid funktsioone, s.h. vananemine ja metabolismi aeglustumine. Mõned ennustatud TF-id valideeruvad ka eksperimentaalselt: teatud pärmimutantidel on G0-s tervetest rakkudest oluliselt erinev elunevus.
Modern high-throughput technologies provide scientists with large quantities of data that characterise life at the molecular level. These include expression levels of thousands of genes, regulatory DNA patterns, and complex networks of interacting molecules. Interpretation of these data involves computational challenges. This dissertation includes methods, computational experiments and software involving functional analysis of gene lists, networks and regulatory systems. Functional analysis is an interpretative task that associates experimental findings with existing knowledge of biological processes, components, etc. The knowledge is often incomplete and statistical enrichment is used in the analysis. As part of this dissertation, three web-based tools were developed and published. g:Profiler, GraphWeb and KEGGanim perform functional analysis of gene lists, interaction networks and biological pathways. Logistic regression models were extended to the eGLM algorithm (Ensemble Generalised Linear Models) that highlights strongest predictors in categorical data. The biological contribution of the dissertation is the functional analysis of the transcription regulatory network of the budding yeast S. cerevisiae. First, a large-scale dataset of transcription factor (TF) activities was composed and validated with functional analysis. The eGLM method was then applied for computational prediction of process-specific TFs. eGLM showed high performance on the well-described cell cycle pathway and outperformed several common methods, as it recovered all core TFs with low error rate. Analysis of the cryptic state of quiescence (G0) revealed functions like ageing and metabolic decline that agree with current knowledge. Furthermore, experimental validation demonstrates the power of eGLM, as several deletion mutants of predicted TFs have significantly different survival profiles in G0.
Modern high-throughput technologies provide scientists with large quantities of data that characterise life at the molecular level. These include expression levels of thousands of genes, regulatory DNA patterns, and complex networks of interacting molecules. Interpretation of these data involves computational challenges. This dissertation includes methods, computational experiments and software involving functional analysis of gene lists, networks and regulatory systems. Functional analysis is an interpretative task that associates experimental findings with existing knowledge of biological processes, components, etc. The knowledge is often incomplete and statistical enrichment is used in the analysis. As part of this dissertation, three web-based tools were developed and published. g:Profiler, GraphWeb and KEGGanim perform functional analysis of gene lists, interaction networks and biological pathways. Logistic regression models were extended to the eGLM algorithm (Ensemble Generalised Linear Models) that highlights strongest predictors in categorical data. The biological contribution of the dissertation is the functional analysis of the transcription regulatory network of the budding yeast S. cerevisiae. First, a large-scale dataset of transcription factor (TF) activities was composed and validated with functional analysis. The eGLM method was then applied for computational prediction of process-specific TFs. eGLM showed high performance on the well-described cell cycle pathway and outperformed several common methods, as it recovered all core TFs with low error rate. Analysis of the cryptic state of quiescence (G0) revealed functions like ageing and metabolic decline that agree with current knowledge. Furthermore, experimental validation demonstrates the power of eGLM, as several deletion mutants of predicted TFs have significantly different survival profiles in G0.
Description
Väitekirja elektroonilisest versioonist puuduvad publikatsioonid.