Fact Extraction from Medical Text using Neural Networks

dc.contributor.advisorKolde, Raivo, juhendaja
dc.contributor.authorMahmoud, Nesma
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-11-07T12:38:02Z
dc.date.available2023-11-07T12:38:02Z
dc.date.issued2020
dc.description.abstractFact extraction from free text is a challenging task requiring a great deal of human effort to program regular expressions and build rule-based solutions. It is essential in the medical field where many care details are only stored as free text and automated fact extraction is the only way to interpret the large scale medical databases. Such medical texts represent communication between doctors and the text is often not syntactically valid, concepts are not represented consistently and the text is rife with misspellings. The described problems make it challenging to develop rule-based solutions to handle all the potential ways a fact might be written down. In this thesis, The effectiveness of neural networks was explored to do the fact extraction on texts from discharge reports on the Estonian Health Information System. We used the whole dataset of medical texts to train word embedding models. On the subsets of the data with annotations of particular facts, different classification models were tested to detect those. We found that employing pre-trained word embeddings allowed us to efficiently learn new models for fact extraction using relatively small amounts of annotated data. We managed to achieve an F1 score of 0.86% for a new tag using 732 samples as the training dataset, validate on 82 samples, and testing over 3258 samples.et
dc.identifier.urihttps://hdl.handle.net/10062/94074
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectMedical Named Entity Recognitionet
dc.subjectFact Extractionet
dc.subjectWord Embeddinget
dc.subjectBi-Directional Long Short Term Memoryet
dc.subjectInterpretabilityet
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleFact Extraction from Medical Text using Neural Networkset
dc.typeThesiset

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mahmoud_computerScience_2020.pdf
Size:
1.62 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: