Bruton, MicaellaMegyesi, BeátaAntal, EugenMarák, Pavol2025-05-162025-05-1620251736-6305https://hdl.handle.net/10062/109741Ciphertext manuscripts found in archival collections are often intermingled with plaintext manuscripts in various languages, making the manual analysis required to match the documents labour-intensive and complex. Automating the alignment of these texts to reconstruct corresponding cipher keys is therefore highly beneficial, particularly when handling large volumes of documents. This study introduces a novel approach using modern neural networks, specifically Long Short-Term Memory (LSTM) architectures, to develop an automated method for aligning homophonic substitution ciphertexts with plaintext. These neural models are compared to traditional statistical approaches, demonstrating that LSTMs achieve significant accuracy improvements, including perfect alignment for ciphertexts of 50 characters or less. Additionally, to facilitate practical application, a program has been developed to enable the upload of transcribed ciphertext and plaintext documents, using the optimized models to automatically align the texts and extract the substitution key.enAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttps://creativecommons.org/licenses/by/4.0/Ciphertext alignmentPlaintext alignmentHistorical cryptanalysisNeural cryptanalysisHomophonic substitution ciphersLong Short-Term Memory (LSTM)Text alignmentAutomated key extractionHistorical manuscriptsComputational cryptanalysisSequence-to-sequence modelsFrom Statistics to Neural Networks: Enhancing Ciphertext-Plaintext Alignment in Historical Substitution Ciphers for Automatic Key ExtractionArticle