Paragraph-Level Machine Translation for Low-Resource Finno-Ugric Languages
| dc.contributor.author | Pashchenko, Dmytro | |
| dc.contributor.author | Yankovskaya, Lisa | |
| dc.contributor.author | Fishel, Mark | |
| dc.contributor.editor | Johansson, Richard | |
| dc.contributor.editor | Stymne, Sara | |
| dc.coverage.spatial | Tallinn, Estonia | |
| dc.date.accessioned | 2025-02-18T14:03:22Z | |
| dc.date.available | 2025-02-18T14:03:22Z | |
| dc.date.issued | 2025-03 | |
| dc.description.abstract | We develop paragraph-level machine translation for four low-resource Finno-Ugric languages: Proper Karelian, Livvi, Ludian, and Veps. The approach is based on sentence-level pre-trained translation models, which are fine-tuned with paragraph-parallel data. This allows the resulting model to develop a native ability to handle discource-level phenomena correctly, in particular translating from grammatically gender-neutral input in Finno-Ugric languages. We collect monolingual and parallel paragraph-level corpora for these languages. Our experiments show that paragraph-level translation models can translate sentences no worse than sentence-level systems, while handling discourse-level phenomena better. For evaluation, we manually translate part of FLORES-200 into these four languages. All our results, data, and models are released openly. | |
| dc.identifier.uri | https://hdl.handle.net/10062/107242 | |
| dc.language.iso | en | |
| dc.publisher | University of Tartu Library | |
| dc.relation.ispartofseries | NEALT Proceedings Series, No. 57 | |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.title | Paragraph-Level Machine Translation for Low-Resource Finno-Ugric Languages | |
| dc.type | Article |
Failid
Originaal pakett
1 - 1 1