Boosting up the sentiment analysis models’ accuracy by blending multi-label learning with a large sentiment lexicon

Laen...
Pisipilt

Kuupäev

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Tartu University Library

Abstrakt

This study compares sentiment analysis approaches for Swedish texts using a manually annotated gold-standard dataset. Two methods were examined: i) a multi-label sentiment classifier trained for Swedish, and ii) the Swedish version of VADER, a lexicon-based tool that computes sentiment scores from a vocabulary of polarity-weighted words. The analysis also examined agreement and disagreement between the two methods, with a focus on mixed or context-dependent sentiment. Results indicate that the multi-label classifier aligns more closely with human judgments, especially for medium- or long-text segments with complex or subtle emotional tones. VADER, while prone to errors in idiomatic or nuanced expressions, performs reliably on short, informal utterances, offering computational efficiency and transparency. A hybrid approach combining classifier predictions with lexicon-based scores was investigated to leverage their complementary strengths. Findings underscore the value of rigorous evaluation against human annotations and highlight strategies to improve sentiment analysis in under-resourced languages such as Swedish.

Kirjeldus

Märksõnad

sentiment analysis, multi-label classifier, multi-class model, lexicon-based method (VADER/svVADER), Swedish dataset

Viide