Prediction of Cell Counts from DNA Methylation
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
DNA methylation is an epigenetic factor that modulates gene expression. The close relationship
between gene expression and cell differentiation serves as a basis for methylationbased
cell mixture deconvolution—a method for determining the proportions of constituent
cell types in a biological sample. Previous work has demonstrated its usefulness
in predicting lymphocyte subtypes in blood samples, but has neglected TEMRA, a type of
senescent lymphocyte associated with aging and autoimmune diseases. This thesis sets
out to explore the feasibility of estimating the proportions of T cells in various stages
of differentiation, including TEMRA, from methylation sequencing data using machine
learning. The results show that while prediction accuracy is lower for TEMRA subtypes
than for general subtypes such as T cells, it is nonetheless a viable approach for this
task, especially since DNA sequencing is cheaper and more scalable than traditional
laboratory methods for blood sample analysis.
Description
Keywords
Methylation, cell mixture deconvolution, TEMRA, machine learning, regression