Universal Dependencies Treebank for Uzbek

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

University of Tartu Library

Abstract

We present the first Universal Dependencies treebank for Uzbek, a low-resource language from the Turkic family. The treebank contains 500 sentences (5850 tokens) sourced from the news and fiction genres and it is annotated for lemmas, part-of-speech (POS) tags, morphological features, and dependency relations. We describe our methodology for building the treebank, which consists of a mix of manual and automatic annotation and discuss some constructions of the Uzbek language that pose challenges to the UD framework.

Description

Keywords

Citation