Tekstandmete ettevalmistamine suurte keelemudelite treenimiseks
Kuupäev
2024
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikool
Abstrakt
This bachelor’s thesis focuses on restoring the original order of translated text data by referencing the original text corpus documents. After the translation process, some sentences contained errors, which the author tried to fix by processing them. Additionally, a pilot test was conducted by fine-tuning three GPT-2 models on the processed data to assess the viability of using translated text data for training language models.
Kirjeldus
Märksõnad
Keelemudelid, tekstikorpus, tekstiandmestik, peenhäälestamine, töötlemine, Language models, text corpus