From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM

Date

2025-03

Journal Title

Journal ISSN

Volume Title

Publisher

University of Tartu Library

Abstract

Large Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.

Description

Keywords

Citation