Efficient Use of Pre-trained NMT Models Through Mixing and Matching

Purason, Taido

Efficient Use of Pre-trained NMT Models Through Mixing and Matching

dc.contributor.advisor	Tättar, Andre, juhendaja
dc.contributor.author	Purason, Taido
dc.contributor.other	Tartu Ülikool. Loodus- ja täppisteaduste valdkond	et
dc.contributor.other	Tartu Ülikool. Arvutiteaduse instituut	et
dc.date.accessioned	2023-10-30T08:00:26Z
dc.date.available	2023-10-30T08:00:26Z
dc.date.issued	2023
dc.description.abstract	With an increasing amount of pre-trained language models and neural machine translation (NMT) models becoming available, it is important to investigate how to use them when training new models to avoid expensive training from scratch. This thesis investigates how to effectively use pre-trained models, focusing on combining encoders and decoders of different independent pre-trained NMT models as modules. This is not directly possible since the intermediate representations of any two independent NMT models are different and cannot be combined without modification. To get around this, firstly, a dimension adapter is added if the encoder and decoder have different embedding dimensionalities, and secondly, extra encoder layers are added after the pre-trained encoder to align the intermediate representations. As a proof of concept, this thesis looks at many-to-Estonian translation and combines a massively multilingual encoder and a high-quality language-specific decoder. The results show significant improvements in both translation quality and speed for many-to-one translation over the baseline multilingual model. Furthermore, the ability to rapidly train a high-quality NMT system is successfully demonstrated with Estonain-Ukrainian and Ukrainian-Estonian translation, achieving competitive results compared to previous works. More broadly, the thesis demonstrates that sentence representations of two independent NMT models can be made compatible without changing the pre-trained components while keeping translation quality from deteriorating.	et
dc.identifier.uri	https://hdl.handle.net/10062/93820
dc.language.iso	eng	et
dc.publisher	Tartu Ülikool	et
dc.rights	openAccess	et
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	natural language processing	et
dc.subject	neural machine translation	et
dc.subject	machine translation	et
dc.subject	multilingual machine translation	et
dc.subject	artificial neural networks	et
dc.subject.other	magistritööd	et
dc.subject.other	informaatika	et
dc.subject.other	infotehnoloogia	et
dc.subject.other	informatics	et
dc.subject.other	infotechnology	et
dc.title	Efficient Use of Pre-trained NMT Models Through Mixing and Matching	et
dc.type	Thesis	et

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: taido_purason_masters_thesis.pdf
Suurus:: 416.61 KB
Formaat:: Adobe Portable Document Format
Kirjeldus:

Lae alla

Litsentsi pakett

Nüüd näidatakse 1 - 1 1

Nimi:: license.txt
Suurus:: 1.71 KB
Formaat:: Item-specific license agreed upon to submission
Kirjeldus:

Lae alla

Kollektsioonid

LTAT magistritööd – Master's theses