Foxon, FloeWaldispühl, MichelleMegyesi, Beáta2024-05-082024-05-0820241736-6305https://hdl.handle.net/10062/98469https://doi.org/10.58009/aere-perennius0094Numerous putative cryptograms remain unsolved. Some, including the Dorabella cryptogram, have been suggestedas hoaxes, i.e., some sort of gibberish with no meaningful underlying plaintext.The statistical properties of a putative cryptogram may be modelled to determine whether the cryptogram groups moreclosely with real or with randomly generated plaintext. Ten thousand plaintexts from an English-language corpus, and ten thousand (pseudo-)randomly generated English-alphabet gibberish texts were studied through their statistical properties, including the alphabet length; the frequency, separation, and entropy of n-grams; the index of coincidence; Zipf’slaw, and mean associated contact counts. An artificial neural network (deep learning) model was fitted to these data, with a cross-validated mean accuracy of 99.8% (standard deviation: 0.1%). This model correctly predicted that arbitrary, out-of-sample simple substitution ciphers represented meaningful English plaintext (as opposed to gibberish) with probabilities close to 1; correctly predicted that arbitrary, out-of-sample gibberish texts were gibberish (as opposed to simple substitution ciphers) with probabilities close to 1; and assigned a probability of meaningful English plaintext of 0.9996 to the Dorabella cryptogram.enAttribution-NonCommercial-NoDerivatives 4.0 InternationalMachine learningSimple substitution cipherHoaxDorabella cryptogramArtificial neural network for hoax cryptogram identificationArticle