Integration methods for heterogeneous biological data
Kuupäev
2019-05-22
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Abstrakt
Tänu tehnoloogiate arengule on bioloogiliste andmete maht viimastel aastatel mitmekordistunud. Need andmed katavad erinevaid bioloogia valdkondi. Piirdudes vaid ühe andmestikuga saab bioloogilisi protsesse või haigusi uurida vaid ühest aspektist korraga. Seetõttu on tekkinud üha suurem vajadus masinõppe meetodite järele, mis aitavad kombineerida eri valdkondade andmeid, et uurida bioloogilisi protsesse tervikuna. Lisaks on nõudlus usaldusväärsete haigusspetsiifiliste andmestike kogude järele, mis võimaldaks vastavaid analüüse efektiivsemalt läbi viia. Käesolev väitekiri kirjeldab, kuidas rakendada masinõppel põhinevaid integratsiooni meetodeid erinevate bioloogiliste küsimuste uurimiseks. Me näitame kuidas integreeritud andmetel põhinev analüüs võimaldab paremini aru saada bioloogilistes protsessidest kolmes valdkonnas: Alzheimeri tõbi, toksikoloogia ja immunoloogia. Alzheimeri tõbi on vanusega seotud neurodegeneratiivne haigus millel puudub efektiivne ravi. Väitekirjas näitame, kuidas integreerida erinevaid Alzheimeri tõve spetsiifilisi andmestikke, et moodustada heterogeenne graafil põhinev Alzheimeri spetsiifiline andmestik HENA. Seejärel demonstreerime süvaõppe meetodi, graafi konvolutsioonilise tehisnärvivõrgu, rakendamist HENA-le, et leida potentsiaalseid haigusega seotuid geene. Teiseks uurisime kroonilist immuunpõletikulist haigust psoriaasi. Selleks kombineerisime patsientide verest ja nahast pärinevad laboratoorsed mõõtmised kliinilise infoga ning integreerisime vastavad analüüside tulemused tuginedes valdkonnaspetsiifilistel teadmistel. Töö viimane osa keskendub toksilisuse testimise strateegiate edasiarendusele. Toksilisuse testimine on protsess, mille käigus hinnatakse, kas uuritavatel kemikaalidel esineb organismile kahjulikke toimeid. See on vajalik näiteks ravimite ohutuse hindamisel. Töös me tuvastasime sarnase toimemehhanismiga toksiliste ühendite rühmad. Lisaks arendasime klassifikatsiooni mudeli, mis võimaldab hinnata uute ühendite toksilisust.
A fast advance in biotechnological innovation and decreasing production costs led to explosion of experimental data being produced in laboratories around the world. Individual experiments allow to understand biological processes, e.g. diseases, from different angles. However, in order to get a systematic view on disease it is necessary to combine these heterogeneous data. The large amounts of diverse data requires building machine learning models that can help, e.g. to identify which genes are related to disease. Additionally, there is a need to compose reliable integrated data sets that researchers could effectively work with. In this thesis we demonstrate how to combine and analyze different types of biological data in the example of three biological domains: Alzheimer’s disease, immunology, and toxicology. More specifically, we combine data sets related to Alzheimer’s disease into a novel heterogeneous network-based data set for Alzheimer’s disease (HENA). We then apply graph convolutional networks, state-of-the-art deep learning methods, to node classification task in HENA to find genes that are potentially associated with the disease. Combining patient’s data related to immune disease helps to uncover its pathological mechanisms and to find better treatments in the future. We analyse laboratory data from patients’ skin and blood samples by combining them with clinical information. Subsequently, we bring together the results of individual analyses using available domain knowledge to form a more systematic view on the disease pathogenesis. Toxicity testing is the process of defining harmful effects of the substances for the living organisms. One of its applications is safety assessment of drugs or other chemicals for a human organism. In this work we identify groups of toxicants that have similar mechanism of actions. Additionally, we develop a classification model that allows to assess toxic actions of unknown compounds.
A fast advance in biotechnological innovation and decreasing production costs led to explosion of experimental data being produced in laboratories around the world. Individual experiments allow to understand biological processes, e.g. diseases, from different angles. However, in order to get a systematic view on disease it is necessary to combine these heterogeneous data. The large amounts of diverse data requires building machine learning models that can help, e.g. to identify which genes are related to disease. Additionally, there is a need to compose reliable integrated data sets that researchers could effectively work with. In this thesis we demonstrate how to combine and analyze different types of biological data in the example of three biological domains: Alzheimer’s disease, immunology, and toxicology. More specifically, we combine data sets related to Alzheimer’s disease into a novel heterogeneous network-based data set for Alzheimer’s disease (HENA). We then apply graph convolutional networks, state-of-the-art deep learning methods, to node classification task in HENA to find genes that are potentially associated with the disease. Combining patient’s data related to immune disease helps to uncover its pathological mechanisms and to find better treatments in the future. We analyse laboratory data from patients’ skin and blood samples by combining them with clinical information. Subsequently, we bring together the results of individual analyses using available domain knowledge to form a more systematic view on the disease pathogenesis. Toxicity testing is the process of defining harmful effects of the substances for the living organisms. One of its applications is safety assessment of drugs or other chemicals for a human organism. In this work we identify groups of toxicants that have similar mechanism of actions. Additionally, we develop a classification model that allows to assess toxic actions of unknown compounds.
Kirjeldus
Väitekirja elektrooniline versioon ei sisalda publikatsioone
Märksõnad
bioinformaatika, andmekogud, andmeanalüüs, bioloogilised protsessid, Alzheimeri tõbi, immuunhaigused, toksikoloogia, neurovõrgud