Improving translation for low-resource Finno-Ugric languages with Neural Machine Translation models
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Training a good neural machine translation model requires a lot of data. The majority
of languages in the world have low amounts of suitable data available for this task.
One possible solution to this problem is developing a multilingual model, combining
high-resource and low-resource languages and creating a shared vocabulary space, where
knowledge gained from high-resource languages is applied to translating low-resource
languages. Another useful technique is to produce new data for low-resource languages
by creating synthetic translations of monolingual data with a baseline model. In this
thesis we use both of those methods, training a multilingual baseline model on Finno-
Ugric language family data and increasing the amount of data for smaller Finno-Ugric
languages by translating monolingual data with the multilingual baseline model in order
to improve machine translation quality for low-resource languages.
Description
Keywords
neural networks, automatic learning, machine translation, language technology