How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment

Kuulmets, Hele-Andra; Purason, Taido; Fishel, Mark

How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment

Failid

2025_nodalida_1_37.pdf (325.29 KB)

Kuupäev

2025-03

Autorid

Kuulmets, Hele-Andra

Purason, Taido

Fishel, Mark

Kirjastaja

University of Tartu Library

Abstrakt

We present a systematic evaluation of multilingual capabilities of open large language models (LLMs), specifically focusing on five Finno-Ugric (FiU) languages. Our investigation covers multiple prompting strategies across several benchmarks and reveals that Llama-2 7B and Llama-2 13B perform weakly on most FiU languages. In contrast, Llama 3.1 models show impressive improvements, even for extremely low-resource languages such as Võro and Komi, indicating successful cross-lingual knowledge transfer inside the models. Finally, we show that stronger base models outperform weaker, language-adapted models, thus emphasizing the importance of base model in successful language adaptation.

URI

https://hdl.handle.net/10062/107228

Kollektsioonid

Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Kirje täielik lehekülg

How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid