A Distant Technology? Experiments with a Generative Model for Retouching Noisy Newspaper OCR
Laen...
Failid
Kuupäev
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu University Library
Abstrakt
This paper explores the use of generative models to enhance digitized historical newspaper text. While these models offer new means of addressing noisy OCR, their opaque, probabilistic processes raise epistemological concerns. Within the project The Order of Criticism Revisited, which integrates literary and computational approaches to Swedish criticism, we tested GPT-4o to “retouch” OCR data from the National Library of Sweden using zero-shot prompting. Comparisons with flawed OCR outputs and manually transcribed texts show that the model produced more legible versions, often closer to the originals than the raw OCR. This indicates potential for improving the quality of digitized sources and enabling more robust large-scale analysis. However, drawing on the notions of artificial communication and distant technology, we argue that such models extend analytical capacity while creating perceptual and methodological distance. Their outputs, better seen as probabilistic “retouching” than correction or reconstruction, weaken the link to original sources.
Kirjeldus
Märksõnad
Generative models, digital epistemology, OCR