A machine learning pipeline for digitalising historical printed materials – from data collection to a searchable database

Laen...
Pisipilt

Kuupäev

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

University of Tartu Library

Abstrakt

Recent developments in the fields of machine learning and computer vision have created new opportunities for the digitalisation of printed historical materials. However, successful integration of machine learning models requires interdisciplinary collaboration between computer- and data scientists, researchers, librarians and/or archivists, and digitisation experts. This chapter describes a comprehensive pipeline designed to address the challenges of digitalising printed historical materials, from document-scanning best practices to incorporating state-of-the-art machine learning techniques. It aims to streamline the management and processing of historical data, making the digitalised materials accessible and searchable through the application of machine learning techniques. The content of this chapter encompasses scanning best practices, annotation approaches, model training, and deployment. This chapter presents a collection of useful tools for each stage of building a machine learning model, step-by-step instructions and example notebooks designed to be easily adapted to other cases.

Kirjeldus

Märksõnad

Viide