Fact Extraction from Medical Text using Neural Networks



Journal Title

Journal ISSN

Volume Title


Tartu Ülikool


Fact extraction from free text is a challenging task requiring a great deal of human effort to program regular expressions and build rule-based solutions. It is essential in the medical field where many care details are only stored as free text and automated fact extraction is the only way to interpret the large scale medical databases. Such medical texts represent communication between doctors and the text is often not syntactically valid, concepts are not represented consistently and the text is rife with misspellings. The described problems make it challenging to develop rule-based solutions to handle all the potential ways a fact might be written down. In this thesis, The effectiveness of neural networks was explored to do the fact extraction on texts from discharge reports on the Estonian Health Information System. We used the whole dataset of medical texts to train word embedding models. On the subsets of the data with annotations of particular facts, different classification models were tested to detect those. We found that employing pre-trained word embeddings allowed us to efficiently learn new models for fact extraction using relatively small amounts of annotated data. We managed to achieve an F1 score of 0.86% for a new tag using 732 samples as the training dataset, validate on 82 samples, and testing over 3258 samples.



Medical Named Entity Recognition, Fact Extraction, Word Embedding, Bi-Directional Long Short Term Memory, Interpretability