Towards large-scale speech foundation models for a low-resource minority language

dc.contributor.authorGetman, Yaroslav
dc.contributor.authorGrósz, Tamás
dc.contributor.authorHiovain-Asikainen, Katri
dc.contributor.authorLehtonen, Tommi
dc.contributor.authorKurimo, Mikko
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-17T14:27:54Z
dc.date.available2025-02-17T14:27:54Z
dc.date.issued2025-03
dc.description.abstractModern ASR systems require massive amounts of training data. While ASR training data for most languages are scarce and expensive to transcribe, a practical solution is to collect huge amounts of raw untranscribed speech and pre-train the ASR model in a self-supervised manner. Unfortunately, for many low-resource minority languages, even untranscribed speech data are scarce. In this paper, we propose a solution for the Northern Sámi language with 22,400 hours of speech extracted from the Finnish radio and television archives. We evaluated the model performance with different decoding algorithms and examined the models' internal behavior with interpretation-based techniques.
dc.identifier.urihttps://hdl.handle.net/10062/107210
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleTowards large-scale speech foundation models for a low-resource minority language
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_19.pdf
Suurus:
967.06 KB
Formaat:
Adobe Portable Document Format