Error rate of automated part-of-speech tagging of Estonian academic learner English

Kaljuste, Karl August

Error rate of automated part-of-speech tagging of Estonian academic learner English

Failid

Kaljuste_BA_2021.pdf (412.91 KB)

Kuupäev

2021

Autorid

Kaljuste, Karl August

Kirjastaja

Tartu Ülikool

Abstrakt

Corpora are a great tool for linguistic research and improving learner language. At the moment, there exists the Tartu Corpus of Estonian Learner English (TCELE). However, it is small and lacking academic learner English. Building a corpus of Estonian academic learner English (EALE) could fill the gap in TCELE and provide worthwhile information for students, teachers and researchers alike. Modern corpora include various types of annotation and tagging words for their part of speech (POS) is the most common of them, but manual tagging is an overwhelmingly long and difficult task. Automated taggers can make this process relatively fast and easy. However, while automated tagger performance has been evaluated with both native writing and learner writing, there is a lack of research of automated tagger performance on academic learner writing. This paper aims to study the accuracy of automated POS tagging of EALE. To achieve this, a corpus of EALE was built and tagged using the Natural Language Toolkit (NLTK) POS tagger with the results compared against a sample of manually added tags.

Märksõnad

akadeemiline õppijakeel, märgendamine

URI

http://hdl.handle.net/10062/74209

Kollektsioonid

Anglistika bakalaureusetööd – Bachelor's theses

Kirje täielik lehekülg

Error rate of automated part-of-speech tagging of Estonian academic learner English

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid