Sirvi Autor "Luhtaru, Agnes" järgi

Nüüd näidatakse 1 - 3 3

Automatic Transcription for Estonian Children’s Speech
(University of Tartu Library, 2023-05) Luhtaru, Agnes; Jaaska, Rauno; Kruusamäe, Karl; Fišhel, Mark
Grammatiliste vigade parandamine mitmekeelse neuromasintõlkega
(Tartu Ülikool, 2020) Luhtaru, Agnes; Fišel, Mark, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
We introduce an approach to grammatical error correction that does not require annotated training data. We train a multilingual neural machine translation model that uses only language-parallel translations. There are more openly available translations available than grammatical error correction corpora, especially for low-resource languages like Estonian. We find out that this system has high recall but low precision. So it corrects plenty of mistakes but adds many mistakes to correct text. Adding artificial mistakes increases the recall and has really positive impact on spelling error correction. Our model reliably corrects grammatical errors, like subject-verb agreement and noun number, but struggles with lexical errors and unnecessary paraphrasing.
Low-resource Grammatical Error Correction via Synthetic Pre-training and Monolingual Zero-shot Translation
(Tartu Ülikool, 2022) Luhtaru, Agnes; Fišel, Mark, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
State-of-the-art neural grammatical error correction (GEC) systems are valuable for correcting various grammatical mistakes in texts. However, training neural models requires a lot of error correction examples, which is a scarce resource for less common languages. We study two methods that work without human-annotated data and see how a small GEC corpus improves the performance of both models. The first method we explore is pre-training using mainly language-independent synthetic data. The second one is correcting errors with multilingual neural machine translation (NMT) via monolingual zero-shot translation. We found that the model trained using only synthetic data corrects few mistakes but rarely proposes incorrect edits. On the contrary, the NMT model corrects many different mistakes but adds numerous unnecessary changes. Training with the GEC data decreases the differences between the models - the synthetic model starts to correct more errors, and the NMT model is less creative with changing the text.