Low-resource Finno-Ugric Neural Machine Translation through Cross-lingual Transfer Learning
Laen...
Kuupäev
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikool
Abstrakt
First high-quality machine translation models were mainly focusing on large languages,
such as English and German. Thankfully, the trend has been growing toward
helping languages with fewer resources. Most Finno-Ugric languages are low-resource
and require the help of different techniques and larger languages for additional information
during translation. Recently, multiple big companies have released multilingual
pre-trained neural machine translation models that can be adapted to low-resource languages.
However, some of the Finno-Ugric languages included in our work were not
included in the training of these pre-trained models. Thus, we need to use cross-lingual
transfer for fine-tuning the models to our selected languages. In addition, we do data
augmentation by back-translation to alleviate the data scarcity issue of low-resource
languages. We train multiple different models to determine the best setting for our
selected languages and improve over previous results for all language pairs. As a result,
we deploy the best model and create the first multilingual NMT system for multiple
low-resource Finno-Ugric languages.
Kirjeldus
Märksõnad
neural networks, automatic learning, machine translation, language technology, transfer learning