Andmebaasi logo
Valdkonnad ja kollektsioonid
Kogu ADA
Eesti
English
Deutsch
  1. Esileht
  2. Sirvi autori järgi

Sirvi Autor "Jensen, Anette" järgi

Tulemuste filtreerimiseks trükkige paar esimest tähte
Nüüd näidatakse 1 - 1 1
  • Tulemused lehekülje kohta
  • Sorteerimisvalikud
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    MorSeD: Morphological Segmentation of Danish and its Effect on Language Modeling
    (University of Tartu Library, 2025-03) Goot, Rob van der; Jensen, Anette; Schledermann, Emil Allerslev; Kildeberg, Mikkel Wildner; Larsen, Nicolaj; Zhang, Mike; Bassignana, Elisa; Johansson, Richard; Stymne, Sara
    Current language models (LMs) mostly exploit subwords as input units based on statistical co-occurrences of characters. Adjacently, previous work has shown that modeling morphemes can aid performance for Natural Language Processing (NLP) models. However, morphemes are challenging to obtain as there is no annotated data in most languages. In this work, we release a wide-coverage Danish morphological segmentation evaluation set. We evaluate a range of unsupervised token segmenters and evaluate the downstream effect of using morphemes as input units for transformer-based LMs. Our results show that popular subword algorithms perform poorly on this task, scoring at most an F1 of 57.6 compared to 68.0 for an unsupervised morphological segmenter (Morfessor). Furthermore, evaluate a range of segmenters on the task of language modeling.

DSpace tarkvara autoriõigus © 2002-2025 LYRASIS

  • Teavituste seaded
  • Saada tagasisidet