Training the Best Neural Machine Translation Model for the Estonian-English Language Pair

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Tartu Ülikool

Abstract

To this day, a lot of neural machine translation models have been developed to produce high-quality translations on many language directions. The same goes for Estonian- English. However, these models that have been trained on that language pair are mostly multilingual or already outdated and need enhancing. This bachelor’s thesis represents a bilingual approach using recent effective technologies with the most current data available to improve the previous best result for this Estonian-English language pair. This paper introduces a state-of-the-art bilingual neural machine translation system, which outperforms the previous best result achieved for Estonian-English. The system uses different methods to achieve the goal - trains baseline models on parallel data, generates additional data with available monolingual data and backtranslation, combines the synthetic data with the initial parallel corpus, trains a new model on the augmented corpus, and in the final step, uses ensembles of those already trained models.

Description

Keywords

neural networks, machine translation, BLEU, language technology

Citation