Sirvi Autor "Danilova, Vera" järgi
Nüüd näidatakse 1 - 2 2
- Tulemused lehekülje kohta
- Sorteerimisvalikud
listelement.badge.dso-type Kirje , Exploring Patient Organization Periodicals with the Topic Timelines Text Visualization Method(Tartu University Library, 2025) Skeppstedt, Maria; Maen, Adam; Danilova, Vera; Aangenendt, Gijs; Burchell, Andrew; Söderfeldt, Ylva; Nermo, Magnus; Papadopoulou Skarp, Frantzeska; Tienken, Susanne; Widholm, Andreas; Blåder, AnnaThe text visualization technique Topic Timelines offers a compact visualization to represent the evolution and clustering of topics over time, while also providing direct access to the texts in which these topics appear. In this paper, we describe how Topic Timelines was further developed within the ActDisease project, by adding functionality for generating timelines using different types of topic extraction techniques and connecting the visualization to existing interfaces for the close reading of texts. Additionally, we evaluate how the updated temporal topic overview can support corpus exploration. The experiments were conducted on a digitalized corpus from the ActDisease project, consisting of patient organization periodicals from the Swedish Diabetes Association, published between 1949 and 1990. Timelines were generated based on topics extracted using sentence transformers clustering and integrated with the ActDisease text database interface - a user interface developed for exploring and reading texts digitalized within the project.listelement.badge.dso-type Kirje , Post-OCR Correction of Historical German Periodicals using LLMs(University of Tartu Library, 2025-03) Danilova, Vera; Aangenendt, Gijs; Tudor, Crina Madalina; Debess, Iben Nyholm; Bruton, Micaella; Scalvini, Barbara; Ilinykh, Nikolai; Holdt, Špela ArharOptical Character Recognition (OCR) is critical for accurate access to historical corpora, providing a foundation for processing pipelines and the reliable interpretation of historical texts. Despite advances, the quality of OCR in historical documents remains limited, often requiring post-OCR correction to address residual errors. Building on recent progress with instruction-tuned Llama 2 models applied to English historical newspapers, we examine the potential of German Llama 2 and Mistral models for post-OCR correction of German medical historical periodicals. We perform instruction tuning using two configurations of training data, augmenting our small annotated dataset with two German datasets from the same time period. The results demonstrate that German Mistral enhances the raw OCR output, achieving a lower average word error rate (WER). However, the average character error rate (CER) either decreases or remains unchanged across all models considered. We perform an analysis of performance within the error groups and provide an interpretation of the results.