Prediction of Cell Counts from DNA Methylation

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Tartu Ülikool

Abstract

DNA methylation is an epigenetic factor that modulates gene expression. The close relationship between gene expression and cell differentiation serves as a basis for methylationbased cell mixture deconvolution—a method for determining the proportions of constituent cell types in a biological sample. Previous work has demonstrated its usefulness in predicting lymphocyte subtypes in blood samples, but has neglected TEMRA, a type of senescent lymphocyte associated with aging and autoimmune diseases. This thesis sets out to explore the feasibility of estimating the proportions of T cells in various stages of differentiation, including TEMRA, from methylation sequencing data using machine learning. The results show that while prediction accuracy is lower for TEMRA subtypes than for general subtypes such as T cells, it is nonetheless a viable approach for this task, especially since DNA sequencing is cheaper and more scalable than traditional laboratory methods for blood sample analysis.

Description

Keywords

Methylation, cell mixture deconvolution, TEMRA, machine learning, regression

Citation