SweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models

dc.contributor.authorKurfalı, Murathan
dc.contributor.authorZahra, Shorouq
dc.contributor.authorGogoulou, Evangelia
dc.contributor.authorDürlich, Luise
dc.contributor.authorCarlsson, Fredrik
dc.contributor.authorNivre, Joakim
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-18T09:33:52Z
dc.date.available2025-02-18T09:33:52Z
dc.date.issued2025-03
dc.description.abstractThis introduces SweSAT-1.0, a new benchmark dataset created from the Swedish university entrance exam (Högskoleprovet) to assess large language models in Swedish. The current version of the benchmark includes 867 questions across six different tasks, including reading comprehension, mathematical problem solving, and logical reasoning. We find that some widely used open-source and commercial models excel in verbal tasks, but we also see that all models, even the commercial ones, struggle with reasoning tasks in Swedish. We hope that SweSAT-1.0 will facilitate research on large language models for Swedish by enriching the breadth of available tasks, offering a challenging evaluation benchmark that is free from any translation biases.
dc.identifier.urihttps://hdl.handle.net/10062/107227
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleSweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_36.pdf
Suurus:
1.05 MB
Formaat:
Adobe Portable Document Format