Error rate of automated part-of-speech tagging of Estonian academic learner English

Klavan, Jane, juhendajaKaljuste, Karl AugustTartu Ülikool. Humanitaarteaduste ja kunstide valdkondTartu Ülikool. Anglistika osakondTartu Ülikool. Maailma keelte ja kultuuride kolledž2021-09-222021-09-222021http://hdl.handle.net/10062/74209Corpora are a great tool for linguistic research and improving learner language. At the moment, there exists the Tartu Corpus of Estonian Learner English (TCELE). However, it is small and lacking academic learner English. Building a corpus of Estonian academic learner English (EALE) could fill the gap in TCELE and provide worthwhile information for students, teachers and researchers alike. Modern corpora include various types of annotation and tagging words for their part of speech (POS) is the most common of them, but manual tagging is an overwhelmingly long and difficult task. Automated taggers can make this process relatively fast and easy. However, while automated tagger performance has been evaluated with both native writing and learner writing, there is a lack of research of automated tagger performance on academic learner writing. This paper aims to study the accuracy of automated POS tagging of EALE. To achieve this, a corpus of EALE was built and tagged using the Natural Language Toolkit (NLTK) POS tagger with the results compared against a sample of manually added tags.engopenAccessAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/akadeemiline õppijakeelmärgendaminebakalaureusetöödinglise keelkorpused (keelet.)keeleteadusgrammatikasõnaliigidkorpuslingvistikaError rate of automated part-of-speech tagging of Estonian academic learner EnglishThesis