Proceedings of the 8th International Conference on Historical Cryptology (HistoCrypt 2025)
Selle kollektsiooni püsiv URIhttps://hdl.handle.net/10062/109727
Sirvi
Viimati lisatud
listelement.badge.dso-type Kirje , Solving Anagrams with Integer Linear Programming(Tartu University Library, 2025) Zajac, Pavol; Selep, Tomáš; Antal, Eugen; Antal, Eugen; Marák, Pavol"Given some sequence of letters, an anagram is formed by changing their order to create a different text. In a historical context, anagrams were popular mainly as puzzles, but they are also connected to classical transposition ciphers. To solve an anagram means to rearrange the letter sequence to a form that is acceptable as a word or sentence in some language. In this article, we formalize the anagram solving problem. We focus on anagrams based on a simplified language model based on fixed dictionaries. We study the applicability of known methods for this problem. We propose a method of anagram solving based on integer linear programming. The new method is not strictly superior to existing methods but provides new tools to tackle the problem. The new representation shows potential for integration with Word2Vec representation of words for finding potentially meaningful anagrams in natural languages."listelement.badge.dso-type Kirje , A Florentine ‘polyalphabetic’ cipher in the 15th century(Tartu University Library, 2025) Vito, Marco; Antal, Eugen; Marák, PavolThe 15th century in Italy was a period of revolution in cryptography. Leon Battista Alberti developed the first western polyalphabetic cipher, while the monoalphabetic system spread throughout the peninsula. The aim of this study is to present a never before published late medieval 15th-century Florentine polyalphabetic cipher, explain its functioning, and shed light on a system—specifically the polyalphabetic cipher—that, although seemingly unused during the 15th century, was in fact employed in Florentine diplomacy.listelement.badge.dso-type Kirje , Solving a 750-Letter General Bigram Substitution Challenge(Tartu University Library, 2025) Schmeh, Klaus; Dunin, Elonka; Van Eycke, Jarl; Helm, Louie; Antal, Eugen; Marák, PavolThe general bigram substitution cipher is an encryption method originating in the Renaissance. It operates using a substitution table that maps each possible letter pair (bigram) to a unique replacement. While conceptually straightforward, this cipher is notably challenging to break, particularly when dealing with short ciphertexts. To inspire further research, one of the authors initiated a bigram substitution challenge featuring a 750-character ciphertext. In this paper, we present the solution to that challenge, achieved by two other authors using a hill climbing algorithm combined with a scoring function based on 8-gram (eight-letter sequence) frequencies. Since no prior 8-gram frequency statistics existed for the English language, one of the authors developed a comprehensive dataset by analyzing 2 terabytes of text, including 5.8 million books and the entire content of Wikipedia. This achievement, to our knowledge, marks the shortest bigram substitution ciphertext ever successfully decrypted. Furthermore, we propose a new challenge based on a 600-character ciphertext and invite readers to tackle it, setting the stage for future advancements in this field.listelement.badge.dso-type Kirje , A Caribbean Directory-based Encryption during the American War of Independence(Tartu University Library, 2025) Pierrot, Cécile; Chaline, Olivier; Damoiseau-Malraux, Gaspard; Mekhail, Paul; Perret, Ludovic; Antal, Eugen; Marák, PavolThe corpus of letters we are studying is located at the Archives Nationales d'Outre-Mer in Aix-en-Provence, France. These late 18th-century letters come from Saint Domingue (now Haiti), a French colony in the Caribbean Sea of which Bellecombe, the author, was governor. They were written in the context of the American War of Independence, in which France took part on the side of the Americans. We have reconstructed Bellecombe's correspondence with the Secretary of State for the Navy, in Versailles: the archives contain hundreds of letters in clear and three encrypted letters, including some clear/cipher pages that were our lever for reconstructing part of the key, and 96 % of the encrypted letter that was opaque at first. From a cryptanalytical point of view, Bellecombe used a directory-based encryption. The common use of this type of cipher in the 17th and 18th-century European countries raises the question of the method to be used (then as now!) to decode such messages.listelement.badge.dso-type Kirje , DECODE2LOD: Connecting the DECODE Database with the Linked Open Data Cloud(Tartu University Library, 2025) Palma, Cosimo; Megyesi, Beáta; Antal, Eugen; Marák, PavolThis paper presents a novel approach to enhancing the analytical power and interoperability of historical cryptology data by transforming the DECODE database into a Linked Open Data (LOD) resource. We introduce a methodology for modeling encrypted historical documents and cipher keys as a knowledge graph, encompassing ontology development, data transformation, and SPARQL-based querying. This integration enables complex queries across domains, encourages collaboration beyond cryptology, and aligns DECODE with broader efforts in digital humanities and open science. By bridging historical cryptology with LOD principles, we offer a scalable framework for enriching specialized research databases through semantic technologies.listelement.badge.dso-type Kirje , A new attack on the mysterious inscription of Santa Maria La Nova(Tartu University Library, 2025) Palma, Cosimo; Bonavoglia, Paolo; Rugova, Yll; Antal, Eugen; Marák, PavolExpanding upon the established hypothesis of monoalphabetic substitution with potential transposition and polyalphabetic elements, this analysis of the Santa Maria la Nova epigraph incorporates Ancient Greek, Old Church Slavonic, Old Romanian and Old Albanian, thus exploring the possibility that the cipher’s plaintext derives from historically under-examined languages, particularly those with cultural and historical ties to medieval Naples and its Eastern Mediterranean networks. Special attention is given to aligning the analyzed corpora with the epigraph's actual textual rendering and to the evaluation of multilingualism.listelement.badge.dso-type Kirje , Decipherment of Historical Manuscripts with Unknown or Rare Writings: The DESCRYPT Project(Tartu University Library, 2025) Megyesi, Beáta; Fornés, Alicia; Héder, Mihály; Heil, Raphaela; Kopal, Nils; Láng, Benedek; Rattenborg, Rune; Waldispühl, Michelle; Antal, Eugen; Marák, PavolWe present a newly funded research program, DESCRYPT, aimed at deciphering and analyzing historical texts with rare or unknown scripts. The project leverages advancements in computational linguistics, artificial intelligence (AI), and image processing, alongside traditional philological methods, to develop innovative tools for transcription, recognition, and interpretation of historical writings with rare/unknown scripts, including ciphertexts. By integrating interdisciplinary expertise, DESCRYPT addresses the challenges posed by complex and undeciphered texts, preserving and unlocking the secrets of our shared cultural heritage.listelement.badge.dso-type Kirje , A Typology of Pseudo-Cryptology(Tartu University Library, 2025) Láng, Benedek; Antal, Eugen; Marák, PavolCipher and code systems can be classified in many ways, with numerous typologies available for organizing both modern and historical cryptographic systems based on their structure. In this article, I propose a different type of typology. I organize various ciphers and codes into a system based on the confirmability of their alleged or actual solutions. This approach places side by side ciphers (e.g., monoalphabetic and polyalphabetic) that would otherwise seem far apart in terms of encoding techniques, and it highlights methods (e.g., book ciphers) that typically do not play a central role in cryptology classifications. This typology becomes useful when attempting to navigate the flood of sensational new cipher-breaking claims that surface weekly in popular media, helping to form a preliminary opinion on whether a proposed solution is arbitrary and unfounded or well-grounded and deserving of professional trust.listelement.badge.dso-type Kirje , Antonio Elio “Cipher” and his Polyphonic-Syllabic Cipher(Tartu University Library, 2025) Lasry, George; Biermann, Norbert; Simonetta, Marcello; Antal, Eugen; Marák, PavolAntonio Elio (Helius) (1506–1576) was a Roman Catholic prelate who served as Bishop of Capodistria and Pola and Titular Patriarch of Jerusalem. Also a prolific cryptographer in the service of Pope Paul III, he is credited for the invention of polyphonic ciphers. In this article, we provide an overview of his career and work in cryptography and describe an ingenious polyphonic-syllabic cipher he designed. Although several matching plaintext-ciphertext segments were available, reconstructing the cipher key required a significant and unusual amount of time, underscoring the cipher’s high level of security. Ciphertext-only cryptanalysis for such a cipher would be extremely difficult and nearly impossible, even with modern computing, without prior knowledge of the principles of its complex design.listelement.badge.dso-type Kirje , Overview of Ciphers Used by the Czechoslovak "Maffie"(Tartu University Library, 2025) Krajčovič, Jozef; Antal, Eugen; Antal, Eugen; Marák, PavolThis paper provides an overview of encryption systems and steganographic techniques used by the Czechoslovak Maffie, which was an anti-Austrian underground resistance organization in 1914-1918.listelement.badge.dso-type Kirje , Dutch Cryptanalysis of Four American Diplomatic Codes in World War I(Tartu University Library, 2025) van Kampen, Florentijn; Antal, Eugen; Marák, Pavol"During the First World War, the Netherlands carefully maintained a neutral position. To guard this neutrality, the Dutch authorities monitored the activities of the belligerent surrounding countries. International telecommunications via telephone and telegraph were closely monitored and censored by censorbureaus. In 2019, the Dutch intelligence and security service released a dossier about these censorbureaus to the Dutch National Archive. In that dossier, a previously unknown history of two groups of pioneering codebreakers based at the censorbureaus in Amsterdam and Rotterdam, was uncovered. In 2024, a first publication appeared about this dossier, with particular emphasis on how the local staff successfully broke German codes. Additionally, the Dutch codebreakers successfully broke four American diplomatic codes between June and December 1918. This breakthrough enabled Dutch intelligence to monitor secret diplomatic traffic between American officials in the Netherlands and Washington during and after World War I. This paper examines the systematic cryptanalysis of U.S. Department of State communications by Dutch codebreakers. Through analysis of original documents and surviving codebooks, it identifies the compromised diplomatic codes and places these findings in a broader historical perspective."listelement.badge.dso-type Kirje , Machine learning for text classification in classical cryptography(Tartu University Library, 2025) Foxon, Floe; Antal, Eugen; Marák, PavolThis study furthers previous work on text classification to distinguish between ciphertext and gibberish. The statistical/linguistic properties of four text types were studied: meaningful English text, and three gibberish types (n=1,250 each; total N=5,000). Dimension reduction techniques (PCA, t-SNE, and UMAP) were used to reduce the statistical/linguistic feature space of the texts to two dimensions, revealing distinct regions of (lower dimensional) feature space occupied by each text, with some overlap. Machine learning models including random forests, neural networks (NNs), and support vector machines (SVMs) were used to classify the four text types based on their statistical/linguistic properties. Nested cross-validation revealed better generalization performance for the NNs and SVMs, classifying texts with >90% accuracy. Applied to the Dorabella cryptogram, the models suggest that this text resembles meaningful English text more closely than gibberish types, which comports with the Dorabella cryptogram as a monoalphabetic substitution cipher, but this classification should be interpreted with caution. Features that better separate meaningful English from English-like gibberish are needed, and other encryption schemes/cryptograms should be explored with these methods.listelement.badge.dso-type Kirje , Playfair crib validation as a constraint satisfaction problem(Tartu University Library, 2025) Ekhall, Magnus; Antal, Eugen; Marák, PavolThis paper shows how a crib1 for a Playfair enciphered message can be seen as a constraint satisfaction problem. This problem can then be solved with standard constraint programming tools. The solution will tell whether the crib is possible, and if so, can list all possible Playfair keys that would result in the given crib.listelement.badge.dso-type Kirje , Cryptanalytic and historical challenges with unidentified encrypted documents from the early modern era(Tartu University Library, 2025) Desenclos, Camille; Lasry, George; Antal, Eugen; Marák, PavolIn most cases, historical encrypted documents include some parts in cleartext, such as headers, dates, signatures, or addresses, which allows the origin, date, and language of these documents to be established. An attached decrypted text, similar documents (same encryption, homogeneity of date or origin) in the same volume or box, or the catalog description may assist in that process. However, in a few cases, none of these are available, posing several challenges both from cryptanalytic and historical perspectives. Based on three 16th-century case studies, this paper aims to discuss a multidisciplinary method to proceed from an unidentified encrypted document to a workable transcription – decipherment and identification.listelement.badge.dso-type Kirje , Practical and Organisational Factors in the Development History of the Typex Cipher Machine and its Use at Bletchley Park(Tartu University Library, 2025) Cheetham, Thomas; Antal, Eugen; Marák, PavolThe Typex was Britain’s main cipher machine during the Second World War. The best-described Typex models are the Mark II and the compact Marks III and VI. However, there remain gaps in the Typex ‘family tree’. This paper reviews the development history of Typex and describes several previously unknown models of Typex based on documents produced by the British Signals Intelligence agency, the Government Code and Cypher School, a major user of Typex while based at Bletchley Park during the Second World War. Although these models were not brought into widespread service, the documentation sheds useful light on the design process. The design of successive models of Typex, and their adoption or rejection, had less to do with cryptographic considerations than the various mechanical and practical problems involved in designing a reliable cipher machine compatible with the communications systems used by the British state and armed forces.listelement.badge.dso-type Kirje , From Statistics to Neural Networks: Enhancing Ciphertext-Plaintext Alignment in Historical Substitution Ciphers for Automatic Key Extraction(Tartu University Library, 2025) Bruton, Micaella; Megyesi, Beáta; Antal, Eugen; Marák, PavolCiphertext manuscripts found in archival collections are often intermingled with plaintext manuscripts in various languages, making the manual analysis required to match the documents labour-intensive and complex. Automating the alignment of these texts to reconstruct corresponding cipher keys is therefore highly beneficial, particularly when handling large volumes of documents. This study introduces a novel approach using modern neural networks, specifically Long Short-Term Memory (LSTM) architectures, to develop an automated method for aligning homophonic substitution ciphertexts with plaintext. These neural models are compared to traditional statistical approaches, demonstrating that LSTMs achieve significant accuracy improvements, including perfect alignment for ciphertexts of 50 characters or less. Additionally, to facilitate practical application, a program has been developed to enable the upload of transcribed ciphertext and plaintext documents, using the optimized models to automatically align the texts and extract the substitution key.listelement.badge.dso-type Kirje , New records for Playfair solutions(Tartu University Library, 2025) Bean, Richard; Helm, Louie; Antal, Eugen; Marák, PavolWe give solutions to the 24 letter and 22 letter Playfair challenges proposed in Dunin et al. (2022) A number of methods were tried combining successful approaches of previous solvers, introducing new ideas while using letter-level and word-level approaches. We used vanilla and positional n-gram models for n values from 6 up to 10. However, these did not greatly assist to distinguish the intended solution from other high-scoring solutions. The most effective discriminative approach involved using a multi-terabyte-scale, unpruned large language model from Buck, Heafield, and Van Ooyen (2014) which moved the solution in each case into the top 5,000 ranked possibilities.listelement.badge.dso-type Kirje , Enhancing Classical Cipher Type Detection: Prompt Engineering with Common LLMs versus Usage of Custom AI Models(Tartu University Library, 2025) Bastian, Maik; Esslinger, Bernhard; Hermann, Eckehard; Kopal, Nils; Lampesberger, Harald; Antal, Eugen; Marák, PavolIn the field of cryptography, identifying the type of cipher used in an encrypted message is crucial to effective cryptanalysis. Thus far, from a machine learning perspective, this classification problem has been tackled using specifically designed models, such as the Neural Cipher Identifier (NCID), which require data generation and model training capabilities. The recent advent of Large Language Models (LLMs) raises the following question: Can this classification problem be approached more effectively through prompt engineering? This paper explores various generic strategies for prompt engineering, such as chain-of-thought and in-context learning, by evaluating thousands of generated prompts for classical ciphers using open-source LLMs (on an Nvidia DGX system) and ChatGPT (via a browser interface and API). The classification accuracies achieved through these prompting techniques are compared with those obtained by NCID. Although our findings indicate that NCID still significantly outperforms the use of LLMs for cipher-type detection, the latter offers a more accessible approach to cryptography tasks. Both methods can benefit from domain-specific knowledge in cryptanalysis, highlighting the importance of expert input in improving initial classifications and handling complex cipher types.listelement.badge.dso-type Kirje , Proceedings of the 8th International Conference on Historical Cryptology (HistoCrypt 2025)(Tartu University Library, 2025) Antal, Eugen; Marák, Pavol