Match ‘em: Multi-Tiered Alignment for Error Analysis in ASR

dc.contributor.authorParsons, Phoebe
dc.contributor.authorKvale, Knut
dc.contributor.authorSvendsen, Torbjørn
dc.contributor.authorSalvi, Giampiero
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-18T13:59:06Z
dc.date.available2025-02-18T13:59:06Z
dc.date.issued2025-03
dc.description.abstractWe introduce “Match ‘em”: a new framework for aligning output from automatic speech recognition (ASR) with reference transcriptions. This allows a more detailed analysis of errors produced by end-to-end ASR systems compared to word error rate (WER). Match ‘em performs the alignment on both the word and character level; each relying on information from the other to provide the most meaningful global alignment. At the character level, we define a speech production motivated character similarity metric. At the word level, we rely on character similarities to define word similarity and, additionally, we reconcile compounding (insertion or deletion of spaces). We evaluated Match ‘em on transcripts of three European languages produced by wav2vec2 and Whisper. We show that Match ‘em results in more similar word substitution pairs and that compound reconciling can capture a broad range of spacing errors. We believe Match ‘em to be a valuable tool for ASR error analysis across many languages.
dc.identifier.urihttps://hdl.handle.net/10062/107240
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleMatch ‘em: Multi-Tiered Alignment for Error Analysis in ASR
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_48.pdf
Suurus:
200.01 KB
Formaat:
Adobe Portable Document Format