Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles

Touileb, Samia; Mikhailov, Vladislav; Kroka, Marie Ingeborg; Velldal, Erik; Øvrelid, Lilja

Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles

dc.contributor.author	Touileb, Samia
dc.contributor.author	Mikhailov, Vladislav
dc.contributor.author	Kroka, Marie Ingeborg
dc.contributor.author	Velldal, Erik
dc.contributor.author	Øvrelid, Lilja
dc.contributor.editor	Johansson, Richard
dc.contributor.editor	Stymne, Sara
dc.coverage.spatial	Tallinn, Estonia
dc.date.accessioned	2025-02-19T08:42:56Z
dc.date.available	2025-02-19T08:42:56Z
dc.date.issued	2025-03
dc.description.abstract	We introduce a dataset of high-quality human-authored summaries of news articles in Norwegian. The dataset is intended for benchmarking of the abstractive summarisation capabilities of generative language models. Each document in the dataset is provided with three different candidate gold-standard summaries written by native Norwegian speakers and all summaries are provided in both of the written variants of Norwegian – Bokmål and Nynorsk. The paper describes details on the data creation effort as well as an evaluation of existing open LLMs for Norwegian on the dataset. We also provide insights from a manual human evaluation, comparing human-authored to model generated summaries. Our results indicate that the dataset provides a challenging LLM benchmark for Norwegian summarisation capabilities.
dc.identifier.uri	https://hdl.handle.net/10062/107266
dc.language.iso	en
dc.publisher	University of Tartu Library
dc.relation.ispartofseries	NEALT Proceedings Series, No. 57
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles
dc.type	Article

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: 2025_nodalida_1_73.pdf
Suurus:: 338.96 KB
Formaat:: Adobe Portable Document Format

Lae alla

Kollektsioonid

Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)