Proceedings of the 9th Workshop on Constraint Grammar and Finite State NLP
Selle kollektsiooni püsiv URIhttps://hdl.handle.net/10062/107144
Sirvi
Sirvi Proceedings of the 9th Workshop on Constraint Grammar and Finite State NLP Autor "Rueter, Jack" järgi
Nüüd näidatakse 1 - 1 1
- Tulemused lehekülje kohta
- Sorteerimisvalikud
listelement.badge.dso-type Kirje , A Mansi FST and spellchecker(University of Tartu Library, 2025-03) Rueter, Jack; Horváth, Csilla; Trosterud, Trond; Trosterud, Trond; Wiechetek, Linda; Pirinen, FlammieThe article presents a finite state transducer and spellchecker for Mansi, an Ob-Ugric Uralic language spoken in northwestern Siberia. Mansi has a rich but mostly agglutinative morphology, with a morphophonology dominated by sandhi phenomena. With a small set of morphophonological rules (32 twolc rules) and a lexicon consisting of 12,000 Mansi entries and a larger set of propernouns we were able to build a transducer covering 98.9 % of a large (700k) newspaper corpus. Being a part of the GiellaLT infrastructure, the transducer was turned into a spellchecker. The most common spelling error in Mansi is the omission of length marks on vowels, and for the 1000 most common words containing long vowels, the spellchecker was able to give a correct suggestion as top-five in 98.3 % of the cases, and as first suggestion in 91.3 % of the cases.