VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings
| dc.contributor.author | Avetisyan, Hayastan | |
| dc.contributor.author | Broneske, David | |
| dc.contributor.editor | Tudor, Crina Madalina | |
| dc.contributor.editor | Debess, Iben Nyholm | |
| dc.contributor.editor | Bruton, Micaella | |
| dc.contributor.editor | Scalvini, Barbara | |
| dc.contributor.editor | Ilinykh, Nikolai | |
| dc.contributor.editor | Holdt, Špela Arhar | |
| dc.coverage.spatial | Tallinn, Estonia | |
| dc.date.accessioned | 2025-02-14T10:33:23Z | |
| dc.date.available | 2025-02-14T10:33:23Z | |
| dc.date.issued | 2025-03 | |
| dc.description.abstract | Understanding and generating morphologically complex verb forms is a critical challenge in Natural Language Processing (NLP), particularly for low-resource languages like Armenian. Armenian's verb morphology encodes multiple layers of grammatical information, such as tense, aspect, mood, voice, person, and number, requiring nuanced computational modeling. We introduce VerbCraft, a novel neural model that integrates explicit morphological classifiers into the mBART-50 architecture. VerbCraft achieves a BLEU score of 0.4899 on test data, compared to the baseline's 0.9975, reflecting its focus on prioritizing morphological precision over fluency. With over 99\% accuracy in aspect and voice predictions and robust performance on rare and irregular verb forms, VerbCraft addresses data scarcity through synthetic data generation with human-in-the-loop validation. Beyond Armenian, it offers a scalable framework for morphologically rich, low-resource languages, paving the way for linguistically informed NLP systems and advancing language preservation efforts. | |
| dc.identifier.uri | https://aclanthology.org/2025.resourceful-1.0/ | |
| dc.identifier.uri | https://hdl.handle.net/10062/107123 | |
| dc.language.iso | en | |
| dc.publisher | University of Tartu Library | |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.title | VerbCraft: Morphologically-Aware Armenian Text Generation Using LLMs in Low-Resource Settings | |
| dc.type | Article |
Failid
Originaal pakett
1 - 1 1
Laen...
- Nimi:
- 2025_resourceful_1_25.pdf
- Suurus:
- 190.53 KB
- Formaat:
- Adobe Portable Document Format