GPT-3.5 peenhäälestamine terviseandmete märgendamiseks

Tammin, Anna Maria

GPT-3.5 peenhäälestamine terviseandmete märgendamiseks

Files

Tammin_informaatika_2024.pdf (549.82 KB)

Date

2024

Authors

Tammin, Anna Maria

Publisher

Tartu Ülikool

Description

The aim of this thesis was to explore how well can GPT-3.5 Turbo label named entities. Patient health data contains a lot of useful information in free text form. In order to use this for statistical analyses, structured information has to be extracted from them, for example by annotating named entities. Machine learning based approaches require a lot of annotated data for this, however, a large language model such as GPT-3.5 Turbo has been shown to adapt to different tasks on only a few examples. This general understanding can be leveraged to label named entities. In this thesis, models were finetuned with different amounts of data to see how it would benefit labelling. Results showed that fine-tuning does enhance the model’s proficiency in recognising entities in health data. Additionally, it is found that the models fine-tuned on English electronic health records outperform their base counterpart at annotating synthetic Estonian electronic health records.

Keywords

GPT, keelemudelid, nimeolemite märgendamine, loomuliku keele töötlus, terviseandmed

URI

https://hdl.handle.net/10062/105213

Collections

MTAT bakalaureusetööd – Bachelor's theses

Full item page

GPT-3.5 peenhäälestamine terviseandmete märgendamiseks

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections