Andmebaasi logo
Valdkonnad ja kollektsioonid
Kogu ADA
Eesti
English
Deutsch
  1. Esileht
  2. Sirvi autori järgi

Sirvi Autor "Dorkin, Aleksei, juhendaja" järgi

Tulemuste filtreerimiseks trükkige paar esimest tähte
Nüüd näidatakse 1 - 1 1
  • Tulemused lehekülje kohta
  • Sorteerimisvalikud
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Optical Character Recognition of Estonian Fraktur
    (Tartu Ülikool, 2025) Väli, Mattias; Dorkin, Aleksei, juhendaja; Sirts, Kairit, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
    The DIGAR portal of the National Library of Estonia hosts a diverse collection of historical Estonian newspapers. This publicly accessible dataset provides valuable resources for historians and other researchers, supporting a wide range of scholarly inquiries. For example, it can be used to investigate contemporary public opinion, trace the activities of individuals, and document historical locations. The National Library of Estonia’s Digilab also supplies machine-recognized text; however, recognition accuracy is often limited, particularly for older newspapers and publications printed in Fraktur script. This study focuses on newspapers published prior to 1944, many of which are regional titles characterized by lower print quality and more limited circulation. The primary objective is to enhance the accuracy of existing machine-recognized text corpora by leveraging state-of-the-art text recognition technologies. Specifically, the project employs advanced models from the Qwen2.5-VL family alongside the Transkribus platform. The proposed framework enables efficient and traceable local processing of data retrieved from the digital archive with predefined storage architecture. The resulting cleaned datasets are prepared for downstream processing on other platforms, and accompanying code is provided to facilitate model training. The data, models, and the associated code base are freely available in Huggingface, Transkribus and Github.

DSpace tarkvara autoriõigus © 2002-2025 LYRASIS

  • Teavituste seaded
  • Saada tagasisidet