GPT mudeli sisendi ja temperatuuri mõju meditsiiniliste andmete märgendamisele

Kukk, Veronika

GPT mudeli sisendi ja temperatuuri mõju meditsiiniliste andmete märgendamisele

Files

kukk_informaatika_2024.pdf (561.74 KB)

Date

2024

Authors

Kukk, Veronika

Publisher

Tartu Ülikool

Abstract

Unstructured texts written by doctors contain valuable information about patients. One of the approaches to extract information from these texts is to label named entities (for example disease, procedure) by using machine learning models. However, it is difficult to train high-quality labeling models in low resource languages, such as Estonian, since the necessary training data is scarce. In this thesis, synthetic patient data was used to examine the quality of GPT-3.5 model annotations on Estonian data. Annotations of the GPT model with three temperature parameters were compared. In addition, the thesis explores how the number of classes affects the model’s annotations. The results showed that in two out of three cases the lowest temperature had higher quality annotations than other temperatures. As for the number of classes, it was found that asking two or three classes together achieved higher results than asking only one class.

Keywords

named entity recognition, natural language processing, large language models, artificial intelligence, healthcare informatics

URI

https://hdl.handle.net/10062/104914

Collections

MTAT bakalaureusetööd – Bachelor's theses

Full item page

GPT mudeli sisendi ja temperatuuri mõju meditsiiniliste andmete märgendamisele

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections