Artificial neural network for hoax cryptogram identification
dc.contributor.author | Foxon, Floe | |
dc.contributor.editor | Waldispühl, Michelle | |
dc.contributor.editor | Megyesi, Beáta | |
dc.date.accessioned | 2024-05-08T11:30:25Z | |
dc.date.available | 2024-05-08T11:30:25Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Numerous putative cryptograms remain unsolved. Some, including the Dorabella cryptogram, have been suggestedas hoaxes, i.e., some sort of gibberish with no meaningful underlying plaintext.The statistical properties of a putative cryptogram may be modelled to determine whether the cryptogram groups moreclosely with real or with randomly generated plaintext. Ten thousand plaintexts from an English-language corpus, and ten thousand (pseudo-)randomly generated English-alphabet gibberish texts were studied through their statistical properties, including the alphabet length; the frequency, separation, and entropy of n-grams; the index of coincidence; Zipf’slaw, and mean associated contact counts. An artificial neural network (deep learning) model was fitted to these data, with a cross-validated mean accuracy of 99.8% (standard deviation: 0.1%). This model correctly predicted that arbitrary, out-of-sample simple substitution ciphers represented meaningful English plaintext (as opposed to gibberish) with probabilities close to 1; correctly predicted that arbitrary, out-of-sample gibberish texts were gibberish (as opposed to simple substitution ciphers) with probabilities close to 1; and assigned a probability of meaningful English plaintext of 0.9996 to the Dorabella cryptogram. | |
dc.identifier.issn | 1736-6305 | |
dc.identifier.uri | https://hdl.handle.net/10062/98469 | |
dc.identifier.uri | https://doi.org/10.58009/aere-perennius0094 | |
dc.language.iso | en | |
dc.publisher | Tartu University Library | |
dc.relation.ispartofseries | NEALT Proceedings Series 53 | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | Machine learning | |
dc.subject | Simple substitution cipher | |
dc.subject | Hoax | |
dc.subject | Dorabella cryptogram | |
dc.title | Artificial neural network for hoax cryptogram identification | |
dc.type | Article |
Failid
Originaal pakett
1 - 1 1