Improving translation for low-resource Finno-Ugric languages with Neural Machine Translation models
dc.contributor.advisor | Tättar, Andre, juhendaja | |
dc.contributor.author | Tars, Maali | |
dc.contributor.other | Tartu Ülikool. Loodus- ja täppisteaduste valdkond | et |
dc.contributor.other | Tartu Ülikool. Arvutiteaduse instituut | et |
dc.date.accessioned | 2023-09-05T12:29:41Z | |
dc.date.available | 2023-09-05T12:29:41Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Training a good neural machine translation model requires a lot of data. The majority of languages in the world have low amounts of suitable data available for this task. One possible solution to this problem is developing a multilingual model, combining high-resource and low-resource languages and creating a shared vocabulary space, where knowledge gained from high-resource languages is applied to translating low-resource languages. Another useful technique is to produce new data for low-resource languages by creating synthetic translations of monolingual data with a baseline model. In this thesis we use both of those methods, training a multilingual baseline model on Finno- Ugric language family data and increasing the amount of data for smaller Finno-Ugric languages by translating monolingual data with the multilingual baseline model in order to improve machine translation quality for low-resource languages. | et |
dc.identifier.uri | https://hdl.handle.net/10062/91989 | |
dc.language.iso | eng | et |
dc.publisher | Tartu Ülikool | et |
dc.rights | openAccess | et |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | neural networks | et |
dc.subject | automatic learning | et |
dc.subject | machine translation | et |
dc.subject | language technology | et |
dc.subject.other | bakalaureusetööd | et |
dc.subject.other | informaatika | et |
dc.subject.other | infotehnoloogia | et |
dc.subject.other | informatics | et |
dc.subject.other | infotechnology | et |
dc.title | Improving translation for low-resource Finno-Ugric languages with Neural Machine Translation models | et |
dc.type | Thesis | et |