Fišel, Mark, juhendajaLuhtaru, AgnesTartu Ülikool. Loodus- ja täppisteaduste valdkondTartu Ülikool. Arvutiteaduse instituut2023-08-252023-08-252022https://hdl.handle.net/10062/91760State-of-the-art neural grammatical error correction (GEC) systems are valuable for correcting various grammatical mistakes in texts. However, training neural models requires a lot of error correction examples, which is a scarce resource for less common languages. We study two methods that work without human-annotated data and see how a small GEC corpus improves the performance of both models. The first method we explore is pre-training using mainly language-independent synthetic data. The second one is correcting errors with multilingual neural machine translation (NMT) via monolingual zero-shot translation. We found that the model trained using only synthetic data corrects few mistakes but rarely proposes incorrect edits. On the contrary, the NMT model corrects many different mistakes but adds numerous unnecessary changes. Training with the GEC data decreases the differences between the models - the synthetic model starts to correct more errors, and the NMT model is less creative with changing the text.engopenAccessAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/natural language processingneural machine translationgrammatical error correctionmagistritöödinformaatikainfotehnoloogiainformaticsinfotechnologyLow-resource Grammatical Error Correction via Synthetic Pre-training and Monolingual Zero-shot TranslationThesis