SweLL with pride: How to put a learner corpus to good use

dc.contributor.authorVolodina, Elena
dc.contributor.authorMasciolini, Arianna
dc.contributor.authorMegyesi, Beáta
dc.contributor.authorPrentice, Julia
dc.contributor.authorRudebeck, Lisa
dc.contributor.authorSundberg, Gunlög
dc.contributor.authorWirén, Mats
dc.contributor.editorBouma, Gerlof
dc.contributor.editorDannélls, Dana
dc.contributor.editorKokkinakis, Dimitrios
dc.contributor.editorVolodina, Elena
dc.date.accessioned2025-11-10T11:49:30Z
dc.date.available2025-11-10T11:49:30Z
dc.date.issued2025-11
dc.description.abstractSecond language (L2) learner corpora are collections of language samples that demonstrate learners’ abilities to perform some learning tasks, e.g. an ability to write essays, answer to reading comprehension questions, or talk on a given topic. Such corpora are necessary for both empirical-based research within Second Language Acquisition (SLA), and for development of methods for automatic processing of such data. L2 corpora are notoriously difficult to collect, and their value depends to a greater degree on the representativeness and balance of the sampled data, type of associated metadata and reliability of manual annotations. In this chapter we thoroughly describe the SweLL-gold corpus of L2 Swedish, its annotation, statistics and metadata, and showcase main types of its use, such as (1) in research on SLA through detailed instructions on how to perform corpus searches given SweLL-specific annotation, combined with guidelines for SVALA usage, a tool for correction annotation; and (2) in NLP research on problems such as grammatical error correction through guidelines on how to use the different available file formats that the SweLL-gold corpus is released in. Both cases are further supported by case studies and, where available, relevant scripts ready for reuse by researchers.
dc.identifier.isbn9789908536125
dc.identifier.urihttps://hdl.handle.net/10062/117348
dc.identifier.urihttps://doi.org/10.58009/aere-perennius0178
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofHuminfra handbook: Empowering digital and experimental humanities
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleSweLL with pride: How to put a learner corpus to good use
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Huminfra_Handbook_Chapter9.pdf
Suurus:
1.41 MB
Formaat:
Adobe Portable Document Format