Improving translation for low-resource Finno-Ugric languages with Neural Machine Translation models

Tars, Maali

Improving translation for low-resource Finno-Ugric languages with Neural Machine Translation models

Files

tars_informaatika_2021.pdf (179.18 KB)

Date

2021

Authors

Tars, Maali

Publisher

Tartu Ülikool

Abstract

Training a good neural machine translation model requires a lot of data. The majority of languages in the world have low amounts of suitable data available for this task. One possible solution to this problem is developing a multilingual model, combining high-resource and low-resource languages and creating a shared vocabulary space, where knowledge gained from high-resource languages is applied to translating low-resource languages. Another useful technique is to produce new data for low-resource languages by creating synthetic translations of monolingual data with a baseline model. In this thesis we use both of those methods, training a multilingual baseline model on Finno- Ugric language family data and increasing the amount of data for smaller Finno-Ugric languages by translating monolingual data with the multilingual baseline model in order to improve machine translation quality for low-resource languages.

Keywords

neural networks, automatic learning, machine translation, language technology

URI

https://hdl.handle.net/10062/91989

Collections

MTAT bakalaureusetööd – Bachelor's theses

Full item page

Improving translation for low-resource Finno-Ugric languages with Neural Machine Translation models

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections