Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies
Laen...
Kuupäev
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
University of Tartu Library
Abstrakt
We introduce STUnD (Search Tool for Universal Dependencies), a corpus search tool designed to facilitate working with parallel data. STUnD employs a query language that allows describing syntactic structures and specifying divergence patterns, which in turn make it possible to look for systematic differences between texts. Furthermore, the tool can automatically detect the differences between two similar documents. To achieve all this, STUnD leverages Universal Dependencies (UD), a cross-lingually consistent standard for morphosyntactic annotation. Input can consist of preannotated UD treebanks or raw text, which the tool automatically processes through a third-party parser. As demonstrated in the case study included in the present chapter, STUnD is especially well-suited for comparing syntactic structures across languages, with applications in the context of typology and translation studies. Other use cases include retrieving grammatical errors from parallel learner corpora and comparing different analyses of the same text.