Proceedings
Selle valdkonna püsiv URIhttps://hdl.handle.net/10062/4117
NEALT Proceedings Series
ISSN 1736-6305 (Online)
ISSN 1736-8197 (Print)
Series Editor-in-Chief: Marcel Bollmann (until December 31, 2027)
https://nealt-org.github.io/proceedings/
The NEALT Proceeding series publishes peer-reviewed proceedings of scientific events that have a thematic and geographical relation to the purposes of NEALT
Requests for publication can be sent to the editor-in-chief. Published proceedings must document the composition of the program committee, the review procedure, and the acceptance rate in the preface.
ISSN 1736-6305 (Online)
ISSN 1736-8197 (Print)
Series Editor-in-Chief: Marcel Bollmann (until December 31, 2027)
https://nealt-org.github.io/proceedings/
The NEALT Proceeding series publishes peer-reviewed proceedings of scientific events that have a thematic and geographical relation to the purposes of NEALT
Requests for publication can be sent to the editor-in-chief. Published proceedings must document the composition of the program committee, the review procedure, and the acceptance rate in the preface.
Sirvi
Sirvi Proceedings Autor "Aangenendt, Gijs" järgi
Nüüd näidatakse 1 - 4 4
- Tulemused lehekülje kohta
- Sorteerimisvalikud
listelement.badge.dso-type Kirje , A machine learning pipeline for digitalising historical printed materials – from data collection to a searchable database(University of Tartu Library, 2025-11) Pablo, Dalia Ortiz; Badri, Sushruth; Aangenendt, Gijs; von Bychelberg, Mo ; Lindström, Matts; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, ElenaRecent developments in the fields of machine learning and computer vision have created new opportunities for the digitalisation of printed historical materials. However, successful integration of machine learning models requires interdisciplinary collaboration between computer- and data scientists, researchers, librarians and/or archivists, and digitisation experts. This chapter describes a comprehensive pipeline designed to address the challenges of digitalising printed historical materials, from document-scanning best practices to incorporating state-of-the-art machine learning techniques. It aims to streamline the management and processing of historical data, making the digitalised materials accessible and searchable through the application of machine learning techniques. The content of this chapter encompasses scanning best practices, annotation approaches, model training, and deployment. This chapter presents a collection of useful tools for each stage of building a machine learning model, step-by-step instructions and example notebooks designed to be easily adapted to other cases.listelement.badge.dso-type Kirje , Applied NLP for humanities research(University of Tartu Library, 2025-11) Aangenendt, Gijs; Skeppstedt, Maria; Berglund, Karl; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, ElenaNatural language processing (NLP) has become a field of interest for many researchers within the humanities. However, framing humanities research questions as NLP problems and identifying suitable methods can be a difficult task. Taking previous and ongoing projects from the Centre for Digital Humanities and Social Sciences at Uppsala University (CDHU) as a point of departure, this chapter presents concrete use cases of how humanities research questions can be approached using various NLP methods and tools, from ready-to use text analysis tools to programming libraries that require basic familiarity with Python. Two case studies from the field of history and literature will be introduced to illuminate how texts can be processed for humanities research purposes. With this chapter, we hope to give the reader the means to directly explore NLP methods for their research as well as encourage further learning.listelement.badge.dso-type Kirje , Post-OCR Correction of Historical German Periodicals using LLMs(University of Tartu Library, 2025-03) Danilova, Vera; Aangenendt, Gijs; Tudor, Crina Madalina; Debess, Iben Nyholm; Bruton, Micaella; Scalvini, Barbara; Ilinykh, Nikolai; Holdt, Špela ArharOptical Character Recognition (OCR) is critical for accurate access to historical corpora, providing a foundation for processing pipelines and the reliable interpretation of historical texts. Despite advances, the quality of OCR in historical documents remains limited, often requiring post-OCR correction to address residual errors. Building on recent progress with instruction-tuned Llama 2 models applied to English historical newspapers, we examine the potential of German Llama 2 and Mistral models for post-OCR correction of German medical historical periodicals. We perform instruction tuning using two configurations of training data, augmenting our small annotated dataset with two German datasets from the same time period. The results demonstrate that German Mistral enhances the raw OCR output, achieving a lower average word error rate (WER). However, the average character error rate (CER) either decreases or remains unchanged across all models considered. We perform an analysis of performance within the error groups and provide an interpretation of the results.listelement.badge.dso-type Kirje , The Word Rain visualisation technique applied to digital history: How to visualise, explore and compare texts using semantically structured word Clouds(University of Tartu Library, 2025-11) Skeppstedt, Maria; Ahltorp, Magnus; Kucher, Kostiantyn; Aangenendt, Gijs; Lindström, Matts; Söderfeldt, Ylva; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, ElenaThe Word Rain text visualisation technique aims to retain the simplicity of the classic word cloud, while addressing some of its limitations. In particular, the Word Rain visualisation uses word embeddings to automatically give the visualised words a semantically meaningful position along the horizontal axis. In this handbook chapter, we showcase how this novel approach for word positioning makes the Word Rain technique suitable for exploring, analysing and comparing texts. More specifically, we show how the Word Rain Python module can be used to visualise longitudinal changes in periodicals published by the Swedish Diabetes Association, and how the Word Rain web service can be used to create visualisations that compare the patient organisation periodicals to journals published by the Swedish Medical Association.