Andmebaasi logo
Valdkonnad ja kollektsioonid
Kogu ADA
Eesti
English
Deutsch
  1. Esileht
  2. Sirvi autori järgi

Sirvi Autor "Aangenendt, Gijs" järgi

Tulemuste filtreerimiseks trükkige paar esimest tähte
Nüüd näidatakse 1 - 5 5
  • Tulemused lehekülje kohta
  • Sorteerimisvalikud
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    A machine learning pipeline for digitalising historical printed materials – from data collection to a searchable database
    (University of Tartu Library, 2025-11) Pablo, Dalia Ortiz; Badri, Sushruth; Aangenendt, Gijs; von Bychelberg, Mo ; Lindström, Matts; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Recent developments in the fields of machine learning and computer vision have created new opportunities for the digitalisation of printed historical materials. However, successful integration of machine learning models requires interdisciplinary collaboration between computer- and data scientists, researchers, librarians and/or archivists, and digitisation experts. This chapter describes a comprehensive pipeline designed to address the challenges of digitalising printed historical materials, from document-scanning best practices to incorporating state-of-the-art machine learning techniques. It aims to streamline the management and processing of historical data, making the digitalised materials accessible and searchable through the application of machine learning techniques. The content of this chapter encompasses scanning best practices, annotation approaches, model training, and deployment. This chapter presents a collection of useful tools for each stage of building a machine learning model, step-by-step instructions and example notebooks designed to be easily adapted to other cases.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Applied NLP for humanities research
    (University of Tartu Library, 2025-11) Aangenendt, Gijs; Skeppstedt, Maria; Berglund, Karl; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Natural language processing (NLP) has become a field of interest for many researchers within the humanities. However, framing humanities research questions as NLP problems and identifying suitable methods can be a difficult task. Taking previous and ongoing projects from the Centre for Digital Humanities and Social Sciences at Uppsala University (CDHU) as a point of departure, this chapter presents concrete use cases of how humanities research questions can be approached using various NLP methods and tools, from ready-to use text analysis tools to programming libraries that require basic familiarity with Python. Two case studies from the field of history and literature will be introduced to illuminate how texts can be processed for humanities research purposes. With this chapter, we hope to give the reader the means to directly explore NLP methods for their research as well as encourage further learning.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Exploring Patient Organization Periodicals with the Topic Timelines Text Visualization Method
    (Tartu University Library, 2025) Skeppstedt, Maria; Maen, Adam; Danilova, Vera; Aangenendt, Gijs; Burchell, Andrew; Söderfeldt, Ylva; Nermo, Magnus; Papadopoulou Skarp, Frantzeska; Tienken, Susanne; Widholm, Andreas; Blåder, Anna
    The text visualization technique Topic Timelines offers a compact visualization to represent the evolution and clustering of topics over time, while also providing direct access to the texts in which these topics appear. In this paper, we describe how Topic Timelines was further developed within the ActDisease project, by adding functionality for generating timelines using different types of topic extraction techniques and connecting the visualization to existing interfaces for the close reading of texts. Additionally, we evaluate how the updated temporal topic overview can support corpus exploration. The experiments were conducted on a digitalized corpus from the ActDisease project, consisting of patient organization periodicals from the Swedish Diabetes Association, published between 1949 and 1990. Timelines were generated based on topics extracted using sentence transformers clustering and integrated with the ActDisease text database interface - a user interface developed for exploring and reading texts digitalized within the project.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Post-OCR Correction of Historical German Periodicals using LLMs
    (University of Tartu Library, 2025-03) Danilova, Vera; Aangenendt, Gijs; Tudor, Crina Madalina; Debess, Iben Nyholm; Bruton, Micaella; Scalvini, Barbara; Ilinykh, Nikolai; Holdt, Špela Arhar
    Optical Character Recognition (OCR) is critical for accurate access to historical corpora, providing a foundation for processing pipelines and the reliable interpretation of historical texts. Despite advances, the quality of OCR in historical documents remains limited, often requiring post-OCR correction to address residual errors. Building on recent progress with instruction-tuned Llama 2 models applied to English historical newspapers, we examine the potential of German Llama 2 and Mistral models for post-OCR correction of German medical historical periodicals. We perform instruction tuning using two configurations of training data, augmenting our small annotated dataset with two German datasets from the same time period. The results demonstrate that German Mistral enhances the raw OCR output, achieving a lower average word error rate (WER). However, the average character error rate (CER) either decreases or remains unchanged across all models considered. We perform an analysis of performance within the error groups and provide an interpretation of the results.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    The Word Rain visualisation technique applied to digital history: How to visualise, explore and compare texts using semantically structured word Clouds
    (University of Tartu Library, 2025-11) Skeppstedt, Maria; Ahltorp, Magnus; Kucher, Kostiantyn; Aangenendt, Gijs; Lindström, Matts; Söderfeldt, Ylva; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    The Word Rain text visualisation technique aims to retain the simplicity of the classic word cloud, while addressing some of its limitations. In particular, the Word Rain visualisation uses word embeddings to automatically give the visualised words a semantically meaningful position along the horizontal axis. In this handbook chapter, we showcase how this novel approach for word positioning makes the Word Rain technique suitable for exploring, analysing and comparing texts. More specifically, we show how the Word Rain Python module can be used to visualise longitudinal changes in periodicals published by the Swedish Diabetes Association, and how the Word Rain web service can be used to create visualisations that compare the patient organisation periodicals to journals published by the Swedish Medical Association.

DSpace tarkvara autoriõigus © 2002-2026 LYRASIS

  • Teavituste seaded
  • Saada tagasisidet