A Distant Technology? Experiments with a Generative Model for Retouching Noisy Newspaper OCR

dc.contributor.authorBrodén, Daniel
dc.contributor.authorSamuelsson, Lisa
dc.contributor.authorAlfter, David
dc.contributor.authorMalmstedt, Johan
dc.contributor.editorNermo, Magnus
dc.contributor.editorPapadopoulou Skarp, Frantzeska
dc.contributor.editorTienken, Susanne
dc.contributor.editorWidholm, Andreas
dc.contributor.editorBlåder, Anna
dc.date.accessioned2025-12-19T12:25:03Z
dc.date.available2025-12-19T12:25:03Z
dc.date.issued2025
dc.description.abstractThis paper explores the use of generative models to enhance digitized historical newspaper text. While these models offer new means of addressing noisy OCR, their opaque, probabilistic processes raise epistemological concerns. Within the project The Order of Criticism Revisited, which integrates literary and computational approaches to Swedish criticism, we tested GPT-4o to “retouch” OCR data from the National Library of Sweden using zero-shot prompting. Comparisons with flawed OCR outputs and manually transcribed texts show that the model produced more legible versions, often closer to the originals than the raw OCR. This indicates potential for improving the quality of digitized sources and enabling more robust large-scale analysis. However, drawing on the notions of artificial communication and distant technology, we argue that such models extend analytical capacity while creating perceptual and methodological distance. Their outputs, better seen as probabilistic “retouching” than correction or reconstruction, weaken the link to original sources.en
dc.identifier.issn1736-6305
dc.identifier.urihttps://hdl.handle.net/10062/118291
dc.language.isoen
dc.publisherTartu University Library
dc.relation.ispartofseriesNEALT Proceedings Series 60
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectGenerative models
dc.subjectdigital epistemology
dc.subjectOCR
dc.titleA Distant Technology? Experiments with a Generative Model for Retouching Noisy Newspaper OCR
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
paper_1.pdf
Suurus:
809.87 KB
Formaat:
Adobe Portable Document Format