Testing relevant linguistic features in automatic CEFR skill level classification for Icelandic

dc.contributor.authorRichter, Caitlin Laura
dc.contributor.authorIngason, Anton Karl
dc.contributor.authorGlišić, Isidora
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-17T14:34:53Z
dc.date.available2025-02-17T14:34:53Z
dc.date.issued2025-03
dc.description.abstractThis paper explores the use of various linguistic features to develop models for automatic classification of language proficiency on the CEFR scale for Icelandic, a low-resourced and morphologically complex language. We train two classifiers to assess skill level of learner texts. One is used as a baseline and takes in the original unaltered text written by a learner and uses predominantly surface features to assess the level. The other uses both surface and other morphological and lexical features, as well as context vectors from transformer (IceBERT). It takes in both the original and corrected versions of the text and takes into account errors/deviation of the original texts compared to the corrected versions. Both classifiers show promising results, with baseline models achieving between 62.2-67.1% accuracy and dual-version between 75-80.3%.
dc.identifier.urihttps://hdl.handle.net/10062/107213
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleTesting relevant linguistic features in automatic CEFR skill level classification for Icelandic
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_22.pdf
Suurus:
98.73 KB
Formaat:
Adobe Portable Document Format