Tekstandmete ettevalmistamine suurte keelemudelite treenimiseks

dc.contributor.advisorFišel, Mark, juhendaja
dc.contributor.authorPastarus, Tanel
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2024-10-04T07:09:37Z
dc.date.available2024-10-04T07:09:37Z
dc.date.issued2024
dc.description.abstractThis bachelor’s thesis focuses on restoring the original order of translated text data by referencing the original text corpus documents. After the translation process, some sentences contained errors, which the author tried to fix by processing them. Additionally, a pilot test was conducted by fine-tuning three GPT-2 models on the processed data to assess the viability of using translated text data for training language models.
dc.identifier.urihttps://hdl.handle.net/10062/105100
dc.language.isoet
dc.publisherTartu Ülikoolet
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Estoniaen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/ee/
dc.subjectKeelemudelid
dc.subjecttekstikorpus
dc.subjecttekstiandmestik
dc.subjectpeenhäälestamine
dc.subjecttöötlemine
dc.subjectLanguage models
dc.subjecttext corpus
dc.subject.otherbakalaureusetöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticsen
dc.subject.otherinfotechnologyen
dc.titleTekstandmete ettevalmistamine suurte keelemudelite treenimiseks
dc.typeThesis

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Pastarus_informaatika_2024.pdf
Suurus:
239.63 KB
Formaat:
Adobe Portable Document Format