Artificial neural network for hoax cryptogram identification

Foxon, Floe

Artificial neural network for hoax cryptogram identification

dc.contributor.author	Foxon, Floe
dc.contributor.editor	Waldispühl, Michelle
dc.contributor.editor	Megyesi, Beáta
dc.date.accessioned	2024-05-08T11:30:25Z
dc.date.available	2024-05-08T11:30:25Z
dc.date.issued	2024
dc.description.abstract	Numerous putative cryptograms remain unsolved. Some, including the Dorabella cryptogram, have been suggestedas hoaxes, i.e., some sort of gibberish with no meaningful underlying plaintext.The statistical properties of a putative cryptogram may be modelled to determine whether the cryptogram groups moreclosely with real or with randomly generated plaintext. Ten thousand plaintexts from an English-language corpus, and ten thousand (pseudo-)randomly generated English-alphabet gibberish texts were studied through their statistical properties, including the alphabet length; the frequency, separation, and entropy of n-grams; the index of coincidence; Zipf’slaw, and mean associated contact counts. An artificial neural network (deep learning) model was fitted to these data, with a cross-validated mean accuracy of 99.8% (standard deviation: 0.1%). This model correctly predicted that arbitrary, out-of-sample simple substitution ciphers represented meaningful English plaintext (as opposed to gibberish) with probabilities close to 1; correctly predicted that arbitrary, out-of-sample gibberish texts were gibberish (as opposed to simple substitution ciphers) with probabilities close to 1; and assigned a probability of meaningful English plaintext of 0.9996 to the Dorabella cryptogram.
dc.identifier.issn	1736-6305
dc.identifier.uri	https://hdl.handle.net/10062/98469
dc.identifier.uri	https://doi.org/10.58009/aere-perennius0094
dc.language.iso	en
dc.publisher	Tartu University Library
dc.relation.ispartofseries	NEALT Proceedings Series 53
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Machine learning
dc.subject	Simple substitution cipher
dc.subject	Hoax
dc.subject	Dorabella cryptogram
dc.title	Artificial neural network for hoax cryptogram identification
dc.type	Article

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: Article_10.pdf
Suurus:: 122.89 KB
Formaat:: Adobe Portable Document Format

Lae alla

Kollektsioonid

Proceedings of the 7th International Conference on Historical Cryptology (HistoCrypt 2024)