Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies

dc.contributor.authorMasciolini, Arianna
dc.contributor.authorLange, Herbert
dc.contributor.authorTóth, Márton András
dc.contributor.editorBouma, Gerlof
dc.contributor.editorDannélls, Dana
dc.contributor.editorKokkinakis, Dimitrios
dc.contributor.editorVolodina, Elena
dc.date.accessioned2025-11-10T12:33:28Z
dc.date.available2025-11-10T12:33:28Z
dc.date.issued2025-11
dc.description.abstractWe introduce STUnD (Search Tool for Universal Dependencies), a corpus search tool designed to facilitate working with parallel data. STUnD employs a query language that allows describing syntactic structures and specifying divergence patterns, which in turn make it possible to look for systematic differences between texts. Furthermore, the tool can automatically detect the differences between two similar documents. To achieve all this, STUnD leverages Universal Dependencies (UD), a cross-lingually consistent standard for morphosyntactic annotation. Input can consist of preannotated UD treebanks or raw text, which the tool automatically processes through a third-party parser. As demonstrated in the case study included in the present chapter, STUnD is especially well-suited for comparing syntactic structures across languages, with applications in the context of typology and translation studies. Other use cases include retrieving grammatical errors from parallel learner corpora and comparing different analyses of the same text.
dc.identifier.isbn9789908536125
dc.identifier.urihttps://hdl.handle.net/10062/117353
dc.identifier.urihttps://doi.org/10.58009/aere-perennius0183
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofHuminfra handbook: Empowering digital and experimental humanities
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleExploring parallel corpora with STUnD: A Search Tool for Universal Dependencies
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Huminfra_Handbook_Chapter14.pdf
Suurus:
3.01 MB
Formaat:
Adobe Portable Document Format