Andmebaasi logo
Valdkonnad ja kollektsioonid
Kogu ADA
Eesti
English
Deutsch
  1. Esileht
  2. Sirvi autori järgi

Sirvi Autor "Getman, Yaroslav" järgi

Tulemuste filtreerimiseks trükkige paar esimest tähte
Nüüd näidatakse 1 - 1 1
  • Tulemused lehekülje kohta
  • Sorteerimisvalikud
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Towards large-scale speech foundation models for a low-resource minority language
    (University of Tartu Library, 2025-03) Getman, Yaroslav; Grósz, Tamás; Hiovain-Asikainen, Katri; Lehtonen, Tommi; Kurimo, Mikko; Johansson, Richard; Stymne, Sara
    Modern ASR systems require massive amounts of training data. While ASR training data for most languages are scarce and expensive to transcribe, a practical solution is to collect huge amounts of raw untranscribed speech and pre-train the ASR model in a self-supervised manner. Unfortunately, for many low-resource minority languages, even untranscribed speech data are scarce. In this paper, we propose a solution for the Northern Sámi language with 22,400 hours of speech extracted from the Finnish radio and television archives. We evaluated the model performance with different decoding algorithms and examined the models' internal behavior with interpretation-based techniques.

DSpace tarkvara autoriõigus © 2002-2025 LYRASIS

  • Teavituste seaded
  • Saada tagasisidet