Sirvi Autor "Masciolini, Arianna" järgi

Nüüd näidatakse 1 - 4 4

A query engine for L1-L2 parallel dependency treebanks
(University of Tartu Library, 2023-05) Masciolini, Arianna
Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies
(University of Tartu Library, 2025-11) Masciolini, Arianna; Lange, Herbert; Tóth, Márton András; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
We introduce STUnD (Search Tool for Universal Dependencies), a corpus search tool designed to facilitate working with parallel data. STUnD employs a query language that allows describing syntactic structures and specifying divergence patterns, which in turn make it possible to look for systematic differences between texts. Furthermore, the tool can automatically detect the differences between two similar documents. To achieve all this, STUnD leverages Universal Dependencies (UD), a cross-lingually consistent standard for morphosyntactic annotation. Input can consist of preannotated UD treebanks or raw text, which the tool automatically processes through a third-party parser. As demonstrated in the case study included in the present chapter, STUnD is especially well-suited for comparing syntactic structures across languages, with applications in the context of typology and translation studies. Other use cases include retrieving grammatical errors from parallel learner corpora and comparing different analyses of the same text.
SweLL with pride: How to put a learner corpus to good use
(University of Tartu Library, 2025-11) Volodina, Elena; Masciolini, Arianna; Megyesi, Beáta; Prentice, Julia; Rudebeck, Lisa; Sundberg, Gunlög; Wirén, Mats; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
Second language (L2) learner corpora are collections of language samples that demonstrate learners’ abilities to perform some learning tasks, e.g. an ability to write essays, answer to reading comprehension questions, or talk on a given topic. Such corpora are necessary for both empirical-based research within Second Language Acquisition (SLA), and for development of methods for automatic processing of such data. L2 corpora are notoriously difficult to collect, and their value depends to a greater degree on the representativeness and balance of the sampled data, type of associated metadata and reliability of manual annotations. In this chapter we thoroughly describe the SweLL-gold corpus of L2 Swedish, its annotation, statistics and metadata, and showcase main types of its use, such as (1) in research on SLA through detailed instructions on how to perform corpus searches given SweLL-specific annotation, combined with guidelines for SVALA usage, a tool for correction annotation; and (2) in NLP research on problems such as grammatical error correction through guidelines on how to use the different available file formats that the SweLL-gold corpus is released in. Both cases are further supported by case studies and, where available, relevant scripts ready for reuse by researchers.
The MultiGEC-2025 Shared Task on Multilingual Grammatical Error Correction at NLP4CALL
(University of Tartu Library, 2025-03) Masciolini, Arianna; Caines, Andrew; De Clercq, Orphée; Kruijsbergen, Joni; Kurfalı, Murathan; Muñoz Sánchez, Ricardo; Volodina, Elena; Östling, Robert; Muñoz Sánchez, Ricardo; Alfter, David; Volodina, Elena; Kallas, Jelena
This paper reports on MultiGEC-2025, the first shared task in text-level Multilingual Grammatical Error Correction. The shared task features twelve European languages (Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian) and is organized into two tracks, one for systems producing minimally corrected texts, thus preserving as much as possible of the original language use, and one dedicated to systems that prioritize fluency and idiomaticity. We introduce the task setup, data, evaluation metrics and baseline; present results obtained by the submitted systems and discuss key takeaways and ideas for future work.