Developing and applying bioinformatics tools for gene expression data interpretation
Date
2021-05-19
Authors
Kolberg, Liis
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Tänapäeva tehnoloogiad võimaldavad teadlastel korraga mõõta kõikide geenide avaldumise ehk ekspressiooni tasemeid erinevates tingimustes ja inimgruppides. Näiteks mõõdetakse geenide ekspressiooni kasvaja diagnoosiga inimeste vähi- ja normaalses koes. Tulemuseks on mahukad andmestikud kümnete tuhandete geenide ekspressioonitasemetega, kust otsitakse sarnase profiiliga geene, mis võivad olla kaasatud teatud vähitüübi avaldumisse. Selleks kasutatakse erinevaid andmekaeve meetodeid ning statistilisi teste, mis leiavad sarnaselt käituvate geenide grupid. Nende geenigruppide paremaks mõistmiseks koondatakse nende kohta teada olev info ja tuvastatakse sealt ühised kirjeldused. Nii võib leida varem vähem uuritud geenidele uusi funktsioone või uuritava haigusega seotud uusi geene. Sellise analüüsi raames on vaja rakendada mitmeid meetodeid ja teha suurel hulgal statistilisi teste, mille läbi viimiseks loovad bioinformaatikud erinevaid tööriistu. Käesolevas doktoritöös arendasime kahte tööriista, g:Profiler ja funcExplorer, mis aitavad geeniekspressiooni andmeid lihtsalt interpreteerida. g:Profiler leiab geeninimekirjade kirjeldustest olulise ühisosa, funcExplorer grupeerib sarnase profiiliga geenid, võttes arvesse ka g:Profileri leitud kirjeldusi. Muuhulgas esitavad antud tööriistad tulemusi jooniste abil ja interaktiivselt, võimaldades kiirelt hoomata andmete sisu ning jagada saadud tulemusi teistega. Töö teises osas uurisime geenide ekspressiooni mõjutavaid geneetilisi variante. Selleks leidsime funcExploreriga esmalt sarnase ekspressiooniga geenigrupid. Seejärel tuvastasime geneetilised variandid, mis mõjutavad nende geenide avaldumise taset. Lõpuks kasutasime g:Profilerit, et tõlgendada saadud gruppe ja seeläbi ka neid mõjutavaid geneetilisi variante. Tehtud analüüsi käigus leidsime uue seose, mille oluliseks osaks on ekspressiooni mõõtmise aeg ja tingimused ning kinnitasime mitmeid varasemalt leitud tugevaid seoseid geneetiliste variantide ja geeniekspressiooni vahel.
Modern technologies enable researchers to simultaneously measure the expression levels of all genes under different conditions and in different groups of people. For example, gene expression is measured in cancer and normal human tissues. The result is a high-dimensional data set with expression levels of tens of thousands of genes that are searched for genes with similar expression patterns that may be involved in developing a particular type of cancer. Different data mining methods and statistical tests are used to detect gene groups with a similar expression. To better understand these groups, previously known information about them is gathered to identify common functions. Thus, new functions of less studied genes or new genes related to the studied disease can be found. However, such analyses require applying several methods and performing numerous statistical tests. For this reason, bioinformaticians develop tools that perform such calculations. In this thesis, we developed two tools, g:Profiler and funcExplorer, that enable to interpret gene expression data easily. g:Profiler finds significant intersections from the descriptions of gene lists, funcExplorer groups genes with a similar profile, taking into account the descriptions found with g:Profiler. Among other things, these tools present the results using plots and interactivity, allowing to obtain a global overview of the data and share the results with others. In the second part of the thesis, we studied genetic variants that affect gene expression levels. To do this, we first used funcExplorer to detect gene groups with a similar expression. We then identified genetic variants that influence the expression of these genes. Finally, we used g:Profiler to interpret these groups and thus the genetic variants that affect them. As a result, we identified a novel association, an essential part of which is the time and conditions of expression measurement, and confirmed several previously found associations.
Modern technologies enable researchers to simultaneously measure the expression levels of all genes under different conditions and in different groups of people. For example, gene expression is measured in cancer and normal human tissues. The result is a high-dimensional data set with expression levels of tens of thousands of genes that are searched for genes with similar expression patterns that may be involved in developing a particular type of cancer. Different data mining methods and statistical tests are used to detect gene groups with a similar expression. To better understand these groups, previously known information about them is gathered to identify common functions. Thus, new functions of less studied genes or new genes related to the studied disease can be found. However, such analyses require applying several methods and performing numerous statistical tests. For this reason, bioinformaticians develop tools that perform such calculations. In this thesis, we developed two tools, g:Profiler and funcExplorer, that enable to interpret gene expression data easily. g:Profiler finds significant intersections from the descriptions of gene lists, funcExplorer groups genes with a similar profile, taking into account the descriptions found with g:Profiler. Among other things, these tools present the results using plots and interactivity, allowing to obtain a global overview of the data and share the results with others. In the second part of the thesis, we studied genetic variants that affect gene expression levels. To do this, we first used funcExplorer to detect gene groups with a similar expression. We then identified genetic variants that influence the expression of these genes. Finally, we used g:Profiler to interpret these groups and thus the genetic variants that affect them. As a result, we identified a novel association, an essential part of which is the time and conditions of expression measurement, and confirmed several previously found associations.
Description
Väitekirja elektrooniline versioon ei sisalda publikatsioone
Keywords
bioinformatics, data processing, gene expression, interpretation