Estimating Concordance Between Measured and Predicted Genetic Variant Effects on Chromatin Accessibility
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Many GWAS studies have identified genetic variants associated with human traits or
diseases. However, understanding the underlying molecular mechanisms of those associations
has been challenging. Chromatin accessibility is one of those traits that has
been associated with a higher risk for a disease. If chromatin is not accessible, then
transcription factors cannot bind to it and gene expression or protein synthesis cannot
be initiated. This can lead to an altered risk for some diseases. Therefore, it is essential
to study quantitative trait loci that affect chromatin accessibility (caQTLs). One of the
approaches to find genetic variants is caQTL mapping. It uses open chromatin data
and genotype imputation to find associations between genetic variants and chromatin
accessibility. Additional fine-mapping distinguishes the potentially causal variants. In
addition, deep learning models predicting genetic variants’ effects on molecular traits
have been integrated into the studies to understand even better the biological mechanisms
behind the associations between genetic variants and phenotypic traits. However, the
predictive accuracy of these models is still unclear. In this thesis, we created five caQTL
datasets for five different cell types based on the fine-mapping results. These datasets
were then used to validate the performance of a state-of-the-art deep learning model
Enformer in predicting genetic variant effects on chromatin accessibility. Although
other studies have evaluated Enformer predictions already, then they have done it from
gene expression perspective based on measured effects from RNA-seq data. This thesis,
however, compares measured genetic variants’ effects on chromatin accessibility from
ATAC-seq data to Enformer’s predicted effects. It compares both the effect size but
also the direction of it. It provides an initial overview of how Enformer performs on
chromatin accessibility. Results showed that Enformer performs pretty well on especially
the variants for which it predicts stronger effects. In addition, it provided expected results
when the cell type of a measured variant was different from the cell type of the predicted
variant, meaning it had more opposite effects than it would have with a similar cell type.
On the other hand, it also showed very low near-zero effect scores in many cases when
the measured effect was higher.
Description
Keywords
bioinformatics, caQTLs, chromatin accessibility