Sirvi Autor "Dorkin, Aleksei, juhendaja" järgi
Nüüd näidatakse 1 - 1 1
- Tulemused lehekülje kohta
- Sorteerimisvalikud
listelement.badge.dso-type Kirje , Optical Character Recognition of Estonian Fraktur(Tartu Ülikool, 2025) Väli, Mattias; Dorkin, Aleksei, juhendaja; Sirts, Kairit, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituutThe DIGAR portal of the National Library of Estonia hosts a diverse collection of historical Estonian newspapers. This publicly accessible dataset provides valuable resources for historians and other researchers, supporting a wide range of scholarly inquiries. For example, it can be used to investigate contemporary public opinion, trace the activities of individuals, and document historical locations. The National Library of Estonia’s Digilab also supplies machine-recognized text; however, recognition accuracy is often limited, particularly for older newspapers and publications printed in Fraktur script. This study focuses on newspapers published prior to 1944, many of which are regional titles characterized by lower print quality and more limited circulation. The primary objective is to enhance the accuracy of existing machine-recognized text corpora by leveraging state-of-the-art text recognition technologies. Specifically, the project employs advanced models from the Qwen2.5-VL family alongside the Transkribus platform. The proposed framework enables efficient and traceable local processing of data retrieved from the digital archive with predefined storage architecture. The resulting cleaned datasets are prepared for downstream processing on other platforms, and accompanying code is provided to facilitate model training. The data, models, and the associated code base are freely available in Huggingface, Transkribus and Github.