Andmebaasi logo
Valdkonnad ja kollektsioonid
Kogu ADA
  • Eesti
  • English
  • Deutsch
Logi sisse
  1. Esileht
  2. Sirvi autori järgi

Sirvi Autor "Kutuzov, Andrey" järgi

Tulemuste filtreerimiseks trükkige paar esimest tähte
Nüüd näidatakse 1 - 8 8
  • Tulemused lehekülje kohta
  • Sorteerimisvalikud
  • Laen...
    Pisipilt
    Kirje
    Large-Scale Contextualised Language Modelling for Norwegian
    (Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 30--40, 2021) Kutuzov, Andrey; Barnes, Jeremy; Velldal, Erik; Øvrelid, Lilja; Oepen, Stephan; Dobnik, Simon; Øvrelid, Lilja
  • Laen...
    Pisipilt
    Kirje
    Multilingual ELMo and the Effects of Corpus Sampling
    (Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 378--384, 2021) Ravishankar, Vinit; Kutuzov, Andrey; Øvrelid, Lilja; Velldal, Erik; Dobnik, Simon; Øvrelid, Lilja
  • Laen...
    Pisipilt
    Kirje
    NorBench – A Benchmark for Norwegian Language Models
    (University of Tartu Library, 2023-05) Samuel, David; Kutuzov, Andrey; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja; Rønningstad, Egil; Sigdel, Elina; Palatkina, Anna
  • Laen...
    Pisipilt
    Kirje
    Redefining Context Windows for Word Embedding Models: An Experimental Study
    (Gothenburg, Sweden, Association for Computational Linguistics, pp. 284--288, 2017) Lison, Pierre; Kutuzov, Andrey; Tiedemann, Jörg; Tahmasebi, Nina
  • Laen...
    Pisipilt
    Kirje
    Small Languages, Big Models: A Study of Continual Training on Languages of Norway
    (University of Tartu Library, 2025-03) Samuel, David; Mikhailov, Vladislav; Velldal, Erik; Øvrelid, Lilja; Charpentier, Lucas Georges Gabriel; Kutuzov, Andrey; Oepen, Stephan; Johansson, Richard; Stymne, Sara
    Training large language models requires vast amounts of data, posing a challenge for less widely spoken languages like Norwegian and even more so for truly low-resource languages like Northern Sámi. To address this issue, we present a novel three-stage continual training approach that substantially improves the downstream performance together with the inference efficiency for the target languages. Based on our findings, we train, evaluate, and openly release a new generative language model for Norwegian Bokmål, Nynorsk, and Northern Sámi with 11.4 billion parameters: NorMistral-11B.
  • Laen...
    Pisipilt
    Kirje
    The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
    (University of Tartu Library, 2025-03) Rosa, Javier de la; Mikhailov, Vladislav; Zhang, Lemei; Wetjen, Freddy; Samuel, David; Liu, Peng; Braaten, Rolv-Arild; Mæhlum, Petter; Birkenes, Magnus Breder; Kutuzov, Andrey; Enstad, Tita; Farsethås, Hans Christian; Brygfjeld, Svein Arne; Gulla, Jon Atle; Oepen, Stephan; Velldal, Erik; Østgulen, Wilfred; Øvrelid, Lilja; Myhre, Aslak Sira; Johansson, Richard; Stymne, Sara
    The use of copyrighted materials in training language models raises critical legal and ethical questions. This paper presents a framework for and the results of empirically assessing the impact of publisher-controlled copyrighted corpora on the performance of generative large language models (LLMs) for Norwegian. When evaluated on a diverse set of tasks, we found that adding both books and newspapers to the data mixture of LLMs tend to improve their performance, while the addition of fiction works seems to be detrimental. Our experiments could inform the creation of a compensation scheme for authors whose works contribute to AI development.
  • Laen...
    Pisipilt
    Kirje
    To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation
    (Turku, Finland, Linköping University Electronic Press, pp. 22--28, 2019) Kutuzov, Andrey; Kuzmenko, Elizaveta; Nivre, Joakim and Derczynski, Leon and Ginter, Filip; Lindi, Bjørn; Oepen, Stephan; Søgaard, Anders; Tidemann, Jörg
  • Laen...
    Pisipilt
    Kirje
    Word vectors, reuse, and replicability: Towards a community repository of large-text resources
    (Gothenburg, Sweden, Association for Computational Linguistics, pp. 271--276, 2017) Fares, Murhaf; Kutuzov, Andrey; Oepen, Stephan; Velldal, Erik; Tiedemann, Jörg; Tahmasebi, Nina

DSpace tarkvara autoriõigus © 2002-2025 UTLIB

  • Saada tagasisidet