From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM
Date
2025-03
Journal Title
Journal ISSN
Volume Title
Publisher
University of Tartu Library
Abstract
Large Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.