Tervisesündmuste üldistamine sõnavektorite abil
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
In the electronic health record, each visit to doctor could generate multiple data points. The same health issue could be linked to multiple diagnoses, drug prescriptions and measurements that are all separate events. Such a high resolution of the data makes its analysis difficult. In this thesis, word2vec model and K-means clustering are used to aggregate related health events into generalised events in an OMOP CDM dataset. It is shown that word2vec can successfully identify related events. As the number of clusters grows, each cluster becomes more homogenous, but there will also be a higher number of
similar clusters. As a result of generalization, the number of events in a patient’s dataset decreased significantly.
Description
Keywords
Word2vec, K-means, OMOP CDM, ICD10