VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings

dc.contributor.authorAvetisyan, Hayastan
dc.contributor.authorBroneske, David
dc.contributor.editorTudor, Crina Madalina
dc.contributor.editorDebess, Iben Nyholm
dc.contributor.editorBruton, Micaella
dc.contributor.editorScalvini, Barbara
dc.contributor.editorIlinykh, Nikolai
dc.contributor.editorHoldt, Špela Arhar
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-14T10:33:23Z
dc.date.available2025-02-14T10:33:23Z
dc.date.issued2025-03
dc.description.abstractUnderstanding and generating morphologically complex verb forms is a critical challenge in Natural Language Processing (NLP), particularly for low-resource languages like Armenian. Armenian's verb morphology encodes multiple layers of grammatical information, such as tense, aspect, mood, voice, person, and number, requiring nuanced computational modeling. We introduce VerbCraft, a novel neural model that integrates explicit morphological classifiers into the mBART-50 architecture. VerbCraft achieves a BLEU score of 0.4899 on test data, compared to the baseline's 0.9975, reflecting its focus on prioritizing morphological precision over fluency. With over 99\% accuracy in aspect and voice predictions and robust performance on rare and irregular verb forms, VerbCraft addresses data scarcity through synthetic data generation with human-in-the-loop validation. Beyond Armenian, it offers a scalable framework for morphologically rich, low-resource languages, paving the way for linguistically informed NLP systems and advancing language preservation efforts.
dc.identifier.urihttps://aclanthology.org/2025.resourceful-1.0/
dc.identifier.urihttps://hdl.handle.net/10062/107123
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleVerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_resourceful_1_25.pdf
Suurus:
190.53 KB
Formaat:
Adobe Portable Document Format