Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages

Muradoglu, Saliha; Gray, James; Simpson, Jane Helen; Proctor, Michael; Harvey, Mark

Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages

Failid

2025_resourceful_1_7.pdf (1.37 MB)

Kuupäev

2025-03

Autorid

Kirjastaja

University of Tartu Library

Abstrakt

Linguistic datasets are essential across fields: computational linguists use them for NLP development, theoretical linguists for statistical arguments supporting hypotheses about language, and documentary linguists for preserving examples and aiding grammatical descriptions. Transforming raw data (e.g., recordings or dictionaries) into structured forms (e.g., tables) requires non-trivial decisions within processing pipelines. This paper highlights the importance of these processes in understanding linguistic systems. Our contributions include: (1) an interactive dashboard for four central Australian languages with custom filters, and (2) demonstrating how data processing decisions influence measured outcomes.

URI

https://aclanthology.org/2025.resourceful-1.0/
https://hdl.handle.net/10062/107113

Kollektsioonid

Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Kirje täielik lehekülg

Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid