Error rate of automated part-of-speech tagging of Estonian academic learner English

Kaljuste, Karl August

Error rate of automated part-of-speech tagging of Estonian academic learner English

Files

Kaljuste_BA_2021.pdf (412.91 KB)

Date

2021

Authors

Kaljuste, Karl August

Publisher

Tartu Ülikool

Abstract

Corpora are a great tool for linguistic research and improving learner language. At the moment, there exists the Tartu Corpus of Estonian Learner English (TCELE). However, it is small and lacking academic learner English. Building a corpus of Estonian academic learner English (EALE) could fill the gap in TCELE and provide worthwhile information for students, teachers and researchers alike. Modern corpora include various types of annotation and tagging words for their part of speech (POS) is the most common of them, but manual tagging is an overwhelmingly long and difficult task. Automated taggers can make this process relatively fast and easy. However, while automated tagger performance has been evaluated with both native writing and learner writing, there is a lack of research of automated tagger performance on academic learner writing. This paper aims to study the accuracy of automated POS tagging of EALE. To achieve this, a corpus of EALE was built and tagged using the Natural Language Toolkit (NLTK) POS tagger with the results compared against a sample of manually added tags.

Keywords

akadeemiline õppijakeel, märgendamine

URI

http://hdl.handle.net/10062/74209

Collections

Inglise filoloogia bakalaureusetööd – Bachelor's theses

Full item page

Error rate of automated part-of-speech tagging of Estonian academic learner English

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections