Sirvi Autor "Velldal, Erik" järgi
Nüüd näidatakse 1 - 20 21
- Tulemused lehekülje kohta
- Sorteerimisvalikud
Kirje A Collection of Question Answering Datasets for Norwegian(University of Tartu Library, 2025-03) Mikhailov, Vladislav; Mæhlum, Petter; Langø, Victoria Ovedie Chruickshank; Velldal, Erik; Øvrelid, Lilja; Johansson, Richard; Stymne, SaraThis paper introduces a new suite of question answering datasets for Norwegian; NorOpenBookQA, NorCommonSenseQA, NorTruthfulQA, and NRK-Quiz-QA. The data covers a wide range of skills and knowledge domains, including world knowledge, commonsense reasoning, truthfulness, and knowledge about Norway. Covering both of the written standards of Norwegian – Bokmål and Nynorsk – our datasets comprise over 10k question-answer pairs, created by native speakers. We detail our dataset creation approach and present the results of evaluating 11 language models (LMs) in zero- and few-shot regimes. Most LMs perform better in Bokmål than Nynorsk, struggle most with commonsense reasoning, and are often untruthful in generating answers to questions. All our datasets and annotation materials are publicly available.Kirje Annotating evaluative sentences for sentiment analysis: a dataset for Norwegian(Turku, Finland, Linköping University Electronic Press, pp. 121--130, 2019) Mėhlum, Petter; Barnes, Jeremy; Øvrelid, Lilja; Velldal, Erik; Hartmann, Mareike; Plank, BarbaraKirje Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles(University of Tartu Library, 2025-03) Touileb, Samia; Mikhailov, Vladislav; Kroka, Marie Ingeborg; Velldal, Erik; Øvrelid, Lilja; Johansson, Richard; Stymne, SaraWe introduce a dataset of high-quality human-authored summaries of news articles in Norwegian. The dataset is intended for benchmarking of the abstractive summarisation capabilities of generative language models. Each document in the dataset is provided with three different candidate gold-standard summaries written by native Norwegian speakers and all summaries are provided in both of the written variants of Norwegian – Bokmål and Nynorsk. The paper describes details on the data creation effort as well as an evaluation of existing open LLMs for Norwegian on the dataset. We also provide insights from a manual human evaluation, comparing human-authored to model generated summaries. Our results indicate that the dataset provides a challenging LLM benchmark for Norwegian summarisation capabilities.Kirje HPC-ready Language Analysis for Human Beings(Oslo, Norway, Linköping University Electronic Press, Sweden, pp. 447--452, 2013) Lapponi, Emanuele; Velldal, Erik; Vazov, Nikolay A.; Oepen, Stephan; Oepen, Stephan; Hagen, Kristin; Johannessen, Janne BondiKirje Improving cross-domain dependency parsing with dependency-derived clusters(Vilnius, Lithuania, Linköping University Electronic Press, Sweden, pp. 117--126, 2015) Lien, Jostein; Velldal, Erik; Øvrelid, Lilja; Megyesi, BeátaKirje Joint UD Parsing of Norwegian Bokmål and Nynorsk(Gothenburg, Sweden, Association for Computational Linguistics, pp. 1--10, 2017) Velldal, Erik; Øvrelid, Lilja; Hohle, Petter; Tiedemann, Jörg; Tahmasebi, NinaKirje Large-Scale Contextualised Language Modelling for Norwegian(Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 30--40, 2021) Kutuzov, Andrey; Barnes, Jeremy; Velldal, Erik; Øvrelid, Lilja; Oepen, Stephan; Dobnik, Simon; Øvrelid, LiljaKirje Lexicon information in neural sentiment analysis: a multi-task learning approach(Turku, Finland, Linköping University Electronic Press, pp. 175--186, 2019) Barnes, Jeremy; Touileb, Samia; Øvrelid, Lilja; Velldal, Erik; Hartmann, Mareike; Plank, BarbaraKirje Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback(University of Tartu Library, 2025-03) Rønningstad, Egil; Storset, Lilja Charlotte; Mæhlum, Petter; Øvrelid, Lilja; Velldal, Erik; Johansson, Richard; Stymne, SaraSentiment analysis of patient feedback from the public health domain can aid decision makers in evaluating the provided services. The current paper focuses on free-text comments in patient surveys about general practitioners and psychiatric healthcare, annotated with four sentence-level polarity classes - positive, negative, mixed and neutral - while also attempting to alleviate data scarcity by leveraging general-domain sources in the form of reviews. For several different architectures, we compare in-domain and out-of-domain effects, as well as the effects of training joint multi-domain models.Kirje Multilingual ELMo and the Effects of Corpus Sampling(Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 378--384, 2021) Ravishankar, Vinit; Kutuzov, Andrey; Øvrelid, Lilja; Velldal, Erik; Dobnik, Simon; Øvrelid, LiljaKirje Multilingual Probing of Deep Pre-Trained Contextual Encoders(Turku, Finland, Linköping University Electronic Press, pp. 37--47, 2019) Ravishankar, Vinit; Gökırmak, Memduh; Øvrelid, Lilja; Velldal, Erik; Nivre, Joakim and Derczynski, Leon and Ginter, Filip; Lindi, Bjørn; Oepen, Stephan; Søgaard, Anders; Tidemann, JörgKirje Negation in Norwegian: an annotated dataset(Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 299--308, 2021) Mėhlum, Petter; Barnes, Jeremy; Kurtz, Robin; Øvrelid, Lilja; Velldal, Erik; Dobnik, Simon; Øvrelid, LiljaKirje NorBench – A Benchmark for Norwegian Language Models(University of Tartu Library, 2023-05) Samuel, David; Kutuzov, Andrey; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja; Rønningstad, Egil; Sigdel, Elina; Palatkina, AnnaKirje NorEventGen: generative event extraction from Norwegian news(University of Tartu Library, 2025-03) You, Huiling; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja; Johansson, Richard; Stymne, SaraIn this work, we approach event extraction from Norwegian news text using a generation-based approach which formulates the task as text-to-structure generation. We present experiments assessing the effect of different modeling configurations and provide an analysis of the model predictions and typical system errors. Finally, we apply our system to a large corpus of raw news texts and analyze the resulting distribution of event structures in a fairly representative snap-shot of the Norwegian news landscape.Kirje Optimizing a PoS Tagset for Norwegian Dependency Parsing(Gothenburg, Sweden, Association for Computational Linguistics, pp. 142--151, 2017) Hohle, Petter; Øvrelid, Lilja; Velldal, Erik; Tiedemann, Jörg; Tahmasebi, NinaKirje Random Indexing Re-Hashed(2011-05-09) Velldal, ErikKirje Random Indexing Re-Hashed(Riga, Latvia, Northern European Association for Language Technology (NEALT), pp. 224--229, 2011) Velldal, Erik; Pedersen, Bolette Sandford; Nešpore, Gunta; Skadiņa, IngunaKirje Small Languages, Big Models: A Study of Continual Training on Languages of Norway(University of Tartu Library, 2025-03) Samuel, David; Mikhailov, Vladislav; Velldal, Erik; Øvrelid, Lilja; Charpentier, Lucas Georges Gabriel; Kutuzov, Andrey; Oepen, Stephan; Johansson, Richard; Stymne, SaraTraining large language models requires vast amounts of data, posing a challenge for less widely spoken languages like Norwegian and even more so for truly low-resource languages like Northern Sámi. To address this issue, we present a novel three-stage continual training approach that substantially improves the downstream performance together with the inference efficiency for the target languages. Based on our findings, we train, evaluate, and openly release a new generative language model for Norwegian Bokmål, Nynorsk, and Northern Sámi with 11.4 billion parameters: NorMistral-11B.Kirje The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective(University of Tartu Library, 2025-03) Rosa, Javier de la; Mikhailov, Vladislav; Zhang, Lemei; Wetjen, Freddy; Samuel, David; Liu, Peng; Braaten, Rolv-Arild; Mæhlum, Petter; Birkenes, Magnus Breder; Kutuzov, Andrey; Enstad, Tita; Farsethås, Hans Christian; Brygfjeld, Svein Arne; Gulla, Jon Atle; Oepen, Stephan; Velldal, Erik; Østgulen, Wilfred; Øvrelid, Lilja; Myhre, Aslak Sira; Johansson, Richard; Stymne, SaraThe use of copyrighted materials in training language models raises critical legal and ethical questions. This paper presents a framework for and the results of empirically assessing the impact of publisher-controlled copyrighted corpora on the performance of generative large language models (LLMs) for Norwegian. When evaluated on a diverse set of tasks, we found that adding both books and newspapers to the data mixture of LLMs tend to improve their performance, while the addition of fiction works seems to be detrimental. Our experiments could inform the creation of a compensation scheme for authors whose works contribute to AI development.Kirje Word vectors, reuse, and replicability: Towards a community repository of large-text resources(Gothenburg, Sweden, Association for Computational Linguistics, pp. 271--276, 2017) Fares, Murhaf; Kutuzov, Andrey; Oepen, Stephan; Velldal, Erik; Tiedemann, Jörg; Tahmasebi, Nina