Fact Extraction from Medical Text using Neural Networks
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Fact extraction from free text is a challenging task requiring a great deal of human effort
to program regular expressions and build rule-based solutions. It is essential in the
medical field where many care details are only stored as free text and automated fact
extraction is the only way to interpret the large scale medical databases. Such medical
texts represent communication between doctors and the text is often not syntactically
valid, concepts are not represented consistently and the text is rife with misspellings.
The described problems make it challenging to develop rule-based solutions to handle
all the potential ways a fact might be written down. In this thesis, The effectiveness of
neural networks was explored to do the fact extraction on texts from discharge reports
on the Estonian Health Information System. We used the whole dataset of medical
texts to train word embedding models. On the subsets of the data with annotations of
particular facts, different classification models were tested to detect those. We found that
employing pre-trained word embeddings allowed us to efficiently learn new models for
fact extraction using relatively small amounts of annotated data. We managed to achieve
an F1 score of 0.86% for a new tag using 732 samples as the training dataset, validate on
82 samples, and testing over 3258 samples.
Description
Keywords
Medical Named Entity Recognition, Fact Extraction, Word Embedding, Bi-Directional Long Short Term Memory, Interpretability