Estimating Concordance Between Measured and Predicted Genetic Variant Effects on Chromatin Accessibility

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Tartu Ülikool

Abstract

Many GWAS studies have identified genetic variants associated with human traits or diseases. However, understanding the underlying molecular mechanisms of those associations has been challenging. Chromatin accessibility is one of those traits that has been associated with a higher risk for a disease. If chromatin is not accessible, then transcription factors cannot bind to it and gene expression or protein synthesis cannot be initiated. This can lead to an altered risk for some diseases. Therefore, it is essential to study quantitative trait loci that affect chromatin accessibility (caQTLs). One of the approaches to find genetic variants is caQTL mapping. It uses open chromatin data and genotype imputation to find associations between genetic variants and chromatin accessibility. Additional fine-mapping distinguishes the potentially causal variants. In addition, deep learning models predicting genetic variants’ effects on molecular traits have been integrated into the studies to understand even better the biological mechanisms behind the associations between genetic variants and phenotypic traits. However, the predictive accuracy of these models is still unclear. In this thesis, we created five caQTL datasets for five different cell types based on the fine-mapping results. These datasets were then used to validate the performance of a state-of-the-art deep learning model Enformer in predicting genetic variant effects on chromatin accessibility. Although other studies have evaluated Enformer predictions already, then they have done it from gene expression perspective based on measured effects from RNA-seq data. This thesis, however, compares measured genetic variants’ effects on chromatin accessibility from ATAC-seq data to Enformer’s predicted effects. It compares both the effect size but also the direction of it. It provides an initial overview of how Enformer performs on chromatin accessibility. Results showed that Enformer performs pretty well on especially the variants for which it predicts stronger effects. In addition, it provided expected results when the cell type of a measured variant was different from the cell type of the predicted variant, meaning it had more opposite effects than it would have with a similar cell type. On the other hand, it also showed very low near-zero effect scores in many cases when the measured effect was higher.

Description

Keywords

bioinformatics, caQTLs, chromatin accessibility

Citation