GPT-3.5 peenhäälestamine terviseandmete märgendamiseks
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Description
The aim of this thesis was to explore how well can GPT-3.5 Turbo label named entities. Patient health data contains a lot of useful information in free text form. In order to use this for statistical analyses, structured information has to be extracted from them, for example by annotating named entities. Machine learning based approaches require a lot of annotated data for this, however, a large language model such as GPT-3.5 Turbo has been shown to adapt to different tasks on only a few examples. This general understanding can be leveraged to label named entities. In this thesis, models were finetuned with different amounts of data to see how it would benefit labelling. Results showed that fine-tuning does enhance the model’s proficiency in recognising entities in health data. Additionally, it is found that the models fine-tuned on English electronic health records outperform their base counterpart at annotating synthetic Estonian electronic health records.
Keywords
GPT, keelemudelid, nimeolemite märgendamine, loomuliku keele töötlus, terviseandmed