GPT-3.5 peenhäälestamine terviseandmete märgendamiseks

dc.contributor.advisorŠuvalov, Hendrik, juhendaja
dc.contributor.authorTammin, Anna Maria
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2024-10-07T10:19:17Z
dc.date.available2024-10-07T10:19:17Z
dc.date.issued2024
dc.descriptionThe aim of this thesis was to explore how well can GPT-3.5 Turbo label named entities. Patient health data contains a lot of useful information in free text form. In order to use this for statistical analyses, structured information has to be extracted from them, for example by annotating named entities. Machine learning based approaches require a lot of annotated data for this, however, a large language model such as GPT-3.5 Turbo has been shown to adapt to different tasks on only a few examples. This general understanding can be leveraged to label named entities. In this thesis, models were finetuned with different amounts of data to see how it would benefit labelling. Results showed that fine-tuning does enhance the model’s proficiency in recognising entities in health data. Additionally, it is found that the models fine-tuned on English electronic health records outperform their base counterpart at annotating synthetic Estonian electronic health records.
dc.identifier.urihttps://hdl.handle.net/10062/105213
dc.language.isoet
dc.publisherTartu Ülikoolet
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Estoniaen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/ee/
dc.subjectGPT
dc.subjectkeelemudelid
dc.subjectnimeolemite märgendamine
dc.subjectloomuliku keele töötlus
dc.subjectterviseandmed
dc.subject.otherbakalaureusetöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticsen
dc.subject.otherinfotechnologyen
dc.titleGPT-3.5 peenhäälestamine terviseandmete märgendamiseks
dc.typeThesis

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Tammin_informaatika_2024.pdf
Suurus:
549.82 KB
Formaat:
Adobe Portable Document Format