Low-resource Finno-Ugric Neural Machine Translation through Cross-lingual Transfer Learning

Tars, Maali

Low-resource Finno-Ugric Neural Machine Translation through Cross-lingual Transfer Learning

Files

Tars_MSc_computer_science_2023.pdf (691.86 KB)

Date

2023

Authors

Tars, Maali

Publisher

Tartu Ülikool

Abstract

First high-quality machine translation models were mainly focusing on large languages, such as English and German. Thankfully, the trend has been growing toward helping languages with fewer resources. Most Finno-Ugric languages are low-resource and require the help of different techniques and larger languages for additional information during translation. Recently, multiple big companies have released multilingual pre-trained neural machine translation models that can be adapted to low-resource languages. However, some of the Finno-Ugric languages included in our work were not included in the training of these pre-trained models. Thus, we need to use cross-lingual transfer for fine-tuning the models to our selected languages. In addition, we do data augmentation by back-translation to alleviate the data scarcity issue of low-resource languages. We train multiple different models to determine the best setting for our selected languages and improve over previous results for all language pairs. As a result, we deploy the best model and create the first multilingual NMT system for multiple low-resource Finno-Ugric languages.

Keywords

neural networks, automatic learning, machine translation, language technology, transfer learning

URI

https://hdl.handle.net/10062/93787

Collections

MTAT magistritööd – Master's theses

Full item page

Low-resource Finno-Ugric Neural Machine Translation through Cross-lingual Transfer Learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections