Risk scores and their predictive ability for common complex diseases
Date
2019-05-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Genotüpiseerimise ja sekveneerimisega seotud tehnoloogiate odavnemine on plahvatuslikult kasvatanud geneetiliste andmete hulka, võimaldades nende ja olemasolevate fenotüübiliste andmete kombineerimisel paljude tunnuste ning haiguste geneetilist tausta uurida. Kõige uuritumad geneetilise varieeruvuse allikad on ühenukleotiidilised polümorfismid (SNPd). Enamasti on sagedaste SNPide mõjud tunnustele üsna väikesed ning seetõttu on nad eraldiseisvana väikese prognoosivõimega. Seevastu paljude SNPide efektide kokku kombineerimisel saadav tunnus, mida nimetatakse geneetiliseks riskiskooriks, on mitmete komplekshaiguste prognoosimisel osutunud väga väärtuslikuks.
Töö raames tutvustatakse topeltkaalumise meetodit, mis kaasab geneetilisse riskiskoori korraga paljusid vähekorreleeritud SNPe olemasolevast ülegenoomsest uuringust (GWAS). Antud metoodikat rakendatakse nii simulatsioonides kui ka Eesti Geenivaramu (EGV) andmetel, et iseloomustada selle töötavust erinevate haiguste korral ning võrrelda seda eelnevalt kasutatud lihtsamate meetoditega.
Samuti uuriti, kas ja kuidas on geneetiliste riskiskooride prognoosimisvõime seotud sellega, millisest GWASist SNPide kaalud võetakse. Ilmnes, et erinevate GWASide põhjal tehtud geneetilised riskiskoorid samale haigusele ei pruugi olla üksteisega eriti korreleeritud ning seetõttu sõltub konkreetse isiku jaoks geneetilise eelsoodumuse hindamine vaadeldavast geneetilisest riskiskoorist ega ole seega üheselt määratud. Veel uuriti, kuidas geneetiliste riskiskooride jaotus käitub erinevates etnilistes populatsioonides. Leiti, et geneetiliste riskiskooride jaotus sõltub uuritavate populatsioonide geneetilisest struktuurist ning seetõttu ei saa geneetilise riskiskoori abil geneetilist eelsoodumust määrata populatsioonistruktuuri arvesse võtmata.
Viimaks uuriti kolme tuntud mittegeneetilist riskiskoori ja nende prognoosivõimet kardiovaskulaarhaiguste jaoks EGV andmetes. Kaks riskiskoori olid Eesti andmetes hästi kalibreeritud, kuid kõige uuem ja keerulisema algoritmiga neist (QRISK2) alahindas tekkivate juhtude arvu. Samuti selgus, et antud mittegeneetiliste skooridega kaasas käivate ravijuhiste järgi tuleks pea pooltele keskealistele meestele ning veerandile keskealistele naistele, kes uuringus osalesid, soovitada kolesterooli alandavate ravimite manustamist südameveresoonkonna haiguste riski vähendamiseks.
The prices of genotyping and whole genome sequencing have been decreasing rapidly over the past few years. Due to that, genotypic data has become available in large quantities, allowing for extensive investigation of the genetic background of many common complex diseases. The most studied genetic variants are single nucleotide polymorphisms (SNPs). Each SNP separately tends to have a small effect on common complex diseases. However, by combining the effects of many SNPs together into one variable – called genetic risk score (GRS) – one can compose a useful predictor for determining the genetic predisposition for a disease. In this thesis, a new method called doubly-weighting will be introduced, which allows for inclusion of many uncorrelated markers instead of including only few genome-wide significant ones from genome-wide association study(GWAS) and at the same time, intends to correct for winner’s curse bias problem. We illustrate its predictive ability under several scenarios with both simulations and Estonian Biobank data to show that it systematically performs better than more simple methods. In the second article, it was investigated how the selection of GWAS study affects the predictive ability of GRSs for breast cancer. We also tried combining several GRS together into one metaGRS to achieve the best predictive genetic score. We also addressed the problem that different genetic risk scores with similar predictive ability are not necessarily highly correlated for the same disease. Another important aspect influencing the predictive ability of GRSs is the similarity between discovery and target dataset of which the GRS is intended for. This is investigated in the third article, where it is showed that the distributions of GRSs heavily depend on ancestral background of the population. In the fourth article, three known non-genetic risk scores for ASCVD are validated in the Estonian Biobank data. Two of them were well calibrated, but the newest and most complicated algorithm developed in the UK estimated almost twice as less cases than observed. We also compared the statin treatment recommendations based on guideline specific criteria and found that statins for primary prevention were recommended for almost half of the men and quarter of women under investigation, illustrating high risk levels of ASCVD in Estonia.
The prices of genotyping and whole genome sequencing have been decreasing rapidly over the past few years. Due to that, genotypic data has become available in large quantities, allowing for extensive investigation of the genetic background of many common complex diseases. The most studied genetic variants are single nucleotide polymorphisms (SNPs). Each SNP separately tends to have a small effect on common complex diseases. However, by combining the effects of many SNPs together into one variable – called genetic risk score (GRS) – one can compose a useful predictor for determining the genetic predisposition for a disease. In this thesis, a new method called doubly-weighting will be introduced, which allows for inclusion of many uncorrelated markers instead of including only few genome-wide significant ones from genome-wide association study(GWAS) and at the same time, intends to correct for winner’s curse bias problem. We illustrate its predictive ability under several scenarios with both simulations and Estonian Biobank data to show that it systematically performs better than more simple methods. In the second article, it was investigated how the selection of GWAS study affects the predictive ability of GRSs for breast cancer. We also tried combining several GRS together into one metaGRS to achieve the best predictive genetic score. We also addressed the problem that different genetic risk scores with similar predictive ability are not necessarily highly correlated for the same disease. Another important aspect influencing the predictive ability of GRSs is the similarity between discovery and target dataset of which the GRS is intended for. This is investigated in the third article, where it is showed that the distributions of GRSs heavily depend on ancestral background of the population. In the fourth article, three known non-genetic risk scores for ASCVD are validated in the Estonian Biobank data. Two of them were well calibrated, but the newest and most complicated algorithm developed in the UK estimated almost twice as less cases than observed. We also compared the statin treatment recommendations based on guideline specific criteria and found that statins for primary prevention were recommended for almost half of the men and quarter of women under investigation, illustrating high risk levels of ASCVD in Estonia.
Description
Väitekirja elektrooniline versioon ei sisalda publikatsioone
Keywords
haigused, riskitegurid, haiguse prognoos, pärilik eelsoodumus, geneetiline muutlikkus, ühenukleotiidsed polümorfismid, geneetilised assotsiatsiooniuuringud, statistilised meetodid