Low-resource Finno-Ugric Neural Machine Translation through Cross-lingual Transfer Learning
dc.contributor.advisor | Tättar, Andre, juhendaja | |
dc.contributor.author | Tars, Maali | |
dc.contributor.other | Tartu Ülikool. Loodus- ja täppisteaduste valdkond | et |
dc.contributor.other | Tartu Ülikool. Arvutiteaduse instituut | et |
dc.date.accessioned | 2023-10-26T13:24:15Z | |
dc.date.available | 2023-10-26T13:24:15Z | |
dc.date.issued | 2023 | |
dc.description.abstract | First high-quality machine translation models were mainly focusing on large languages, such as English and German. Thankfully, the trend has been growing toward helping languages with fewer resources. Most Finno-Ugric languages are low-resource and require the help of different techniques and larger languages for additional information during translation. Recently, multiple big companies have released multilingual pre-trained neural machine translation models that can be adapted to low-resource languages. However, some of the Finno-Ugric languages included in our work were not included in the training of these pre-trained models. Thus, we need to use cross-lingual transfer for fine-tuning the models to our selected languages. In addition, we do data augmentation by back-translation to alleviate the data scarcity issue of low-resource languages. We train multiple different models to determine the best setting for our selected languages and improve over previous results for all language pairs. As a result, we deploy the best model and create the first multilingual NMT system for multiple low-resource Finno-Ugric languages. | et |
dc.identifier.uri | https://hdl.handle.net/10062/93787 | |
dc.language.iso | eng | et |
dc.publisher | Tartu Ülikool | et |
dc.rights | openAccess | et |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | neural networks | et |
dc.subject | automatic learning | et |
dc.subject | machine translation | et |
dc.subject | language technology | et |
dc.subject | transfer learning | et |
dc.subject.other | magistritööd | et |
dc.subject.other | informaatika | et |
dc.subject.other | infotehnoloogia | et |
dc.subject.other | informatics | et |
dc.subject.other | infotechnology | et |
dc.title | Low-resource Finno-Ugric Neural Machine Translation through Cross-lingual Transfer Learning | et |
dc.type | Thesis | et |