Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies

Masciolini, Arianna; Lange, Herbert; Tóth, Márton András

Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies

Failid

Huminfra_Handbook_Chapter14.pdf (3.01 MB)

Kuupäev

2025-11

Autorid

Masciolini, Arianna

Lange, Herbert

Tóth, Márton András

Kirjastaja

University of Tartu Library

Abstrakt

We introduce STUnD (Search Tool for Universal Dependencies), a corpus search tool designed to facilitate working with parallel data. STUnD employs a query language that allows describing syntactic structures and specifying divergence patterns, which in turn make it possible to look for systematic differences between texts. Furthermore, the tool can automatically detect the differences between two similar documents. To achieve all this, STUnD leverages Universal Dependencies (UD), a cross-lingually consistent standard for morphosyntactic annotation. Input can consist of preannotated UD treebanks or raw text, which the tool automatically processes through a third-party parser. As demonstrated in the case study included in the present chapter, STUnD is especially well-suited for comparing syntactic structures across languages, with applications in the context of typology and translation studies. Other use cases include retrieving grammatical errors from parallel learner corpora and comparing different analyses of the same text.

URI

https://hdl.handle.net/10062/117353
https://doi.org/10.58009/aere-perennius0183

Kollektsioonid

Huminfra handbook: Empowering digital and experimental humanities

Kirje täielik lehekülg

Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid