Tekstandmete ettevalmistamine suurte keelemudelite treenimiseks

Kuupäev

2024

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Tartu Ülikool

Abstrakt

This bachelor’s thesis focuses on restoring the original order of translated text data by referencing the original text corpus documents. After the translation process, some sentences contained errors, which the author tried to fix by processing them. Additionally, a pilot test was conducted by fine-tuning three GPT-2 models on the processed data to assess the viability of using translated text data for training language models.

Kirjeldus

Märksõnad

Keelemudelid, tekstikorpus, tekstiandmestik, peenhäälestamine, töötlemine, Language models, text corpus

Viide