Artificial neural network for hoax cryptogram identification

dc.contributor.authorFoxon, Floe
dc.contributor.editorWaldispühl, Michelle
dc.contributor.editorMegyesi, Beáta
dc.date.accessioned2024-05-08T11:30:25Z
dc.date.available2024-05-08T11:30:25Z
dc.date.issued2024
dc.description.abstractNumerous putative cryptograms remain unsolved. Some, including the Dorabella cryptogram, have been suggestedas hoaxes, i.e., some sort of gibberish with no meaningful underlying plaintext.The statistical properties of a putative cryptogram may be modelled to determine whether the cryptogram groups moreclosely with real or with randomly generated plaintext. Ten thousand plaintexts from an English-language corpus, and ten thousand (pseudo-)randomly generated English-alphabet gibberish texts were studied through their statistical properties, including the alphabet length; the frequency, separation, and entropy of n-grams; the index of coincidence; Zipf’slaw, and mean associated contact counts. An artificial neural network (deep learning) model was fitted to these data, with a cross-validated mean accuracy of 99.8% (standard deviation: 0.1%). This model correctly predicted that arbitrary, out-of-sample simple substitution ciphers represented meaningful English plaintext (as opposed to gibberish) with probabilities close to 1; correctly predicted that arbitrary, out-of-sample gibberish texts were gibberish (as opposed to simple substitution ciphers) with probabilities close to 1; and assigned a probability of meaningful English plaintext of 0.9996 to the Dorabella cryptogram.
dc.identifier.issn1736-6305
dc.identifier.urihttps://hdl.handle.net/10062/98469
dc.identifier.urihttps://doi.org/10.58009/aere-perennius0094
dc.language.isoen
dc.publisherTartu University Library
dc.relation.ispartofseriesNEALT Proceedings Series 53
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectMachine learning
dc.subjectSimple substitution cipher
dc.subjectHoax
dc.subjectDorabella cryptogram
dc.titleArtificial neural network for hoax cryptogram identification
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Article_10.pdf
Suurus:
122.89 KB
Formaat:
Adobe Portable Document Format