SweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models

Kurfalı, Murathan; Zahra, Shorouq; Gogoulou, Evangelia; Dürlich, Luise; Carlsson, Fredrik; Nivre, Joakim

SweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models

Failid

2025_nodalida_1_36.pdf (1.05 MB)

Kuupäev

2025-03

Autorid

Kirjastaja

University of Tartu Library

Abstrakt

This introduces SweSAT-1.0, a new benchmark dataset created from the Swedish university entrance exam (Högskoleprovet) to assess large language models in Swedish. The current version of the benchmark includes 867 questions across six different tasks, including reading comprehension, mathematical problem solving, and logical reasoning. We find that some widely used open-source and commercial models excel in verbal tasks, but we also see that all models, even the commercial ones, struggle with reasoning tasks in Swedish. We hope that SweSAT-1.0 will facilitate research on large language models for Swedish by enriching the breadth of available tasks, offering a challenging evaluation benchmark that is free from any translation biases.

URI

https://hdl.handle.net/10062/107227

Kollektsioonid

Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Kirje täielik lehekülg

SweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid