Tekstandmete ettevalmistamine suurte keelemudelite treenimiseks
| dc.contributor.advisor | Fišel, Mark, juhendaja | |
| dc.contributor.author | Pastarus, Tanel | |
| dc.contributor.other | Tartu Ülikool. Loodus- ja täppisteaduste valdkond | et |
| dc.contributor.other | Tartu Ülikool. Arvutiteaduse instituut | et |
| dc.date.accessioned | 2024-10-04T07:09:37Z | |
| dc.date.available | 2024-10-04T07:09:37Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | This bachelor’s thesis focuses on restoring the original order of translated text data by referencing the original text corpus documents. After the translation process, some sentences contained errors, which the author tried to fix by processing them. Additionally, a pilot test was conducted by fine-tuning three GPT-2 models on the processed data to assess the viability of using translated text data for training language models. | |
| dc.identifier.uri | https://hdl.handle.net/10062/105100 | |
| dc.language.iso | et | |
| dc.publisher | Tartu Ülikool | et |
| dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Estonia | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/ee/ | |
| dc.subject | Keelemudelid | |
| dc.subject | tekstikorpus | |
| dc.subject | tekstiandmestik | |
| dc.subject | peenhäälestamine | |
| dc.subject | töötlemine | |
| dc.subject | Language models | |
| dc.subject | text corpus | |
| dc.subject.other | bakalaureusetööd | et |
| dc.subject.other | informaatika | et |
| dc.subject.other | infotehnoloogia | et |
| dc.subject.other | informatics | en |
| dc.subject.other | infotechnology | en |
| dc.title | Tekstandmete ettevalmistamine suurte keelemudelite treenimiseks | |
| dc.type | Thesis |
Failid
Originaal pakett
1 - 1 1
Laen...
- Nimi:
- Pastarus_informaatika_2024.pdf
- Suurus:
- 239.63 KB
- Formaat:
- Adobe Portable Document Format