Clustering Methods for Interpreting Medical Data

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Tartu Ülikool

Abstract

The medical bills can be analyzed to identify disease trajectories. By applying machine learning methods it is possible to find answers to questions, like which diagnoses occur together and from what these conditions arise. This study uses various clustering methods, like Bernoulli mixture models and autoencoders compression with K-means, to divide patient into groups based on the diagnoses they have received. The results of the models are visualized on the heatmaps showing how likely it is to encounter specific diagnoses in those groups. Also a guided hidden Markov model was used to form a lifelong disease path from the short segments of the different patients’ treatment. This provides a way to observe how certain conditions arise in different ages and allows to track the disease development over time. It found similar results, what had been previously reported in medical studies, like development of J35 from H65. The models interpretability was also improved by using support vector machines as a feature selection method for I11. This way it was possible to get rid of all the diagnoses, which had no connection to I11 and only keep those contributing to the development of the disease. Result on the processed data also agreed with the medical findings, like I50 development from I11.

Description

Keywords

disease trajectory, medical bills, clustering, visualisation, interpretation, diagnose ranking, unsupervised learning, Bernoulli mixture models, hidden Markov models, K-means, autoencoder, support vector machines

Citation