Sirvi Autor "Foxon, Floe" järgi

Nüüd näidatakse 1 - 3 3

listelement.badge.access-status Avatud juurdepääs ,
Artificial neural network for hoax cryptogram identification
(Tartu University Library, 2024) Foxon, Floe; Waldispühl, Michelle; Megyesi, Beáta
Numerous putative cryptograms remain unsolved. Some, including the Dorabella cryptogram, have been suggestedas hoaxes, i.e., some sort of gibberish with no meaningful underlying plaintext.The statistical properties of a putative cryptogram may be modelled to determine whether the cryptogram groups moreclosely with real or with randomly generated plaintext. Ten thousand plaintexts from an English-language corpus, and ten thousand (pseudo-)randomly generated English-alphabet gibberish texts were studied through their statistical properties, including the alphabet length; the frequency, separation, and entropy of n-grams; the index of coincidence; Zipf’slaw, and mean associated contact counts. An artificial neural network (deep learning) model was fitted to these data, with a cross-validated mean accuracy of 99.8% (standard deviation: 0.1%). This model correctly predicted that arbitrary, out-of-sample simple substitution ciphers represented meaningful English plaintext (as opposed to gibberish) with probabilities close to 1; correctly predicted that arbitrary, out-of-sample gibberish texts were gibberish (as opposed to simple substitution ciphers) with probabilities close to 1; and assigned a probability of meaningful English plaintext of 0.9996 to the Dorabella cryptogram.
listelement.badge.access-status Avatud juurdepääs ,
Machine learning for text classification in classical cryptography
(Tartu University Library, 2025) Foxon, Floe; Antal, Eugen; Marák, Pavol
This study furthers previous work on text classification to distinguish between ciphertext and gibberish. The statistical/linguistic properties of four text types were studied: meaningful English text, and three gibberish types (n=1,250 each; total N=5,000). Dimension reduction techniques (PCA, t-SNE, and UMAP) were used to reduce the statistical/linguistic feature space of the texts to two dimensions, revealing distinct regions of (lower dimensional) feature space occupied by each text, with some overlap. Machine learning models including random forests, neural networks (NNs), and support vector machines (SVMs) were used to classify the four text types based on their statistical/linguistic properties. Nested cross-validation revealed better generalization performance for the NNs and SVMs, classifying texts with >90% accuracy. Applied to the Dorabella cryptogram, the models suggest that this text resembles meaningful English text more closely than gibberish types, which comports with the Dorabella cryptogram as a monoalphabetic substitution cipher, but this classification should be interpreted with caution. Features that better separate meaningful English from English-like gibberish are needed, and other encryption schemes/cryptograms should be explored with these methods.
listelement.badge.access-status Avatud juurdepääs ,
Statistical Tests for Randomness on a Typewritten Key Stream Extracted With Computer Vision and Classified With a Convolutional Neural Network
(Tartu University Library, 2026-06-22) Foxon, Floe; Desenclos, Camille; Pierrot, Cécile
For a key stream to be cryptographically secure, it must be sufficiently random (i.e., unpredictable). This study tested the randomness of a set of typewritten, WW2-era German diplomatic key stream tables. Character objects were extracted from images of the tables using computer vision, and a bespoke convolutional neural network (convnet) was trained to classify these objects as digits (from 0–9). The convnet had a mean cross-validated testing balanced accuracy of 93.7% (standard deviation: 0.7%). N = 74,979 digits were extracted and classified from the images. Randomness was tested with the arithmetic mean, chi-squared, runs, and Monte Carlo pi tests; the key stream failed all four tests with 95% confidence. One digit appeared to be over-represented, and two others under-represented in the tables. Analysis suggests that the underrepresented digits may be a simple artefact of computer vision error/bias, but the overrepresented digit did not appear to have resulted from computer vision and/or classification error/bias. Reference streams generated with the Mersenne Twister and Linux OS entropy passed all four tests. WW2-era German diplomatic key stream tables may have lacked randomness. The extent to which this could potentially be exploited by cryptanalysts is unknown.