Patient Treatment Trajectories Using Vector Embeddings

dc.contributor.advisorLaur, Sven, juhendaja
dc.contributor.authorSiimon, Õie Renata
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-10-19T10:43:42Z
dc.date.available2023-10-19T10:43:42Z
dc.date.issued2023
dc.description.abstractIn this thesis, data from Estonian Health Insurance Fund (Haigekassa) in 2010–2019 was used to construct vector representations of patient treatment trajectories with BERT, and for comparison, with word2vec. The goal was to see how well such natural language processing (NLP) models perform when sequences of medical services are used as input instead of sentences, and if BERT performs better than word2vec. So far, research on how well NLP models work with non-natural language sequences is limited, and this thesis contributes to filling this gap. In this thesis, treatment trajectories were built as sequences of service codes appearing on 41 million medical invoices. Models in this thesis were constructed in two stages. First, service code embeddings were trained with BERT and word2vec. Then, classification models were built by fine-tuning BERT and training KNN and SVM classifiers on top of word2vec embeddings. Results showed that despite poor performance of BERT in pre-training stage, it outperformed models built on top of word2vec embeddings in all seven classification tasks. The highest accuracy (0.9918) was achieved in classifying treatment types (5 classes) and the lowest (0.4121) in classifying diagnosis (174 classes). It was concluded that BERT indeed proved useful with this type of non-natural language input data, and that the contextual embeddings of BERT worked better than non-contextual ones of word2vec. From among the four BERT models built in this thesis, the second largest was the overall best, showing that if the ‘language’ used is simpler than natural language, then BERT models with reduced dimensions might work better.et
dc.identifier.urihttps://hdl.handle.net/10062/93620
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectmachine learninget
dc.subjecttreatment trajectoryet
dc.subjectmedical billet
dc.subjectword2vecet
dc.subjectBERTet
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titlePatient Treatment Trajectories Using Vector Embeddingset
dc.typeThesiset

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Siimon_MSc_DataScience_2023.pdf
Size:
1.82 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: