Clustering Methods for Interpreting Medical Data
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
The medical bills can be analyzed to identify disease trajectories. By applying
machine learning methods it is possible to find answers to questions, like which diagnoses
occur together and from what these conditions arise.
This study uses various clustering methods, like Bernoulli mixture models and
autoencoders compression with K-means, to divide patient into groups based on the
diagnoses they have received. The results of the models are visualized on the heatmaps
showing how likely it is to encounter specific diagnoses in those groups.
Also a guided hidden Markov model was used to form a lifelong disease path from
the short segments of the different patients’ treatment. This provides a way to observe
how certain conditions arise in different ages and allows to track the disease development
over time. It found similar results, what had been previously reported in medical studies,
like development of J35 from H65.
The models interpretability was also improved by using support vector machines as a
feature selection method for I11. This way it was possible to get rid of all the diagnoses,
which had no connection to I11 and only keep those contributing to the development of
the disease. Result on the processed data also agreed with the medical findings, like I50
development from I11.
Description
Keywords
disease trajectory, medical bills, clustering, visualisation, interpretation, diagnose ranking, unsupervised learning, Bernoulli mixture models, hidden Markov models, K-means, autoencoder, support vector machines