GPT mudeli sisendi ja temperatuuri mõju meditsiiniliste andmete märgendamisele

dc.contributor.advisorŠuvalov, Hendrik, juhendaja
dc.contributor.authorKukk, Veronika
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2024-09-26T07:08:39Z
dc.date.available2024-09-26T07:08:39Z
dc.date.issued2024
dc.description.abstractUnstructured texts written by doctors contain valuable information about patients. One of the approaches to extract information from these texts is to label named entities (for example disease, procedure) by using machine learning models. However, it is difficult to train high-quality labeling models in low resource languages, such as Estonian, since the necessary training data is scarce. In this thesis, synthetic patient data was used to examine the quality of GPT-3.5 model annotations on Estonian data. Annotations of the GPT model with three temperature parameters were compared. In addition, the thesis explores how the number of classes affects the model’s annotations. The results showed that in two out of three cases the lowest temperature had higher quality annotations than other temperatures. As for the number of classes, it was found that asking two or three classes together achieved higher results than asking only one class.
dc.identifier.urihttps://hdl.handle.net/10062/104914
dc.language.isoest
dc.publisherTartu Ülikoolet
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Estoniaen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/ee/
dc.subjectnamed entity recognition
dc.subjectnatural language processing
dc.subjectlarge language models
dc.subjectartificial intelligence
dc.subjecthealthcare informatics
dc.subject.otherbakalaureusetöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticsen
dc.subject.otherinfotechnologyen
dc.titleGPT mudeli sisendi ja temperatuuri mõju meditsiiniliste andmete märgendamisele
dc.typeThesis

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
kukk_informaatika_2024.pdf
Suurus:
561.74 KB
Formaat:
Adobe Portable Document Format