Publication Cover

More About Transactions of the Association for Computational Linguistics

Journal Resources

Editorial Info
Editorial Policy
Abstracting and Indexing
Release Schedule
Advertising Info
TACL’s Website

Author Resources


Reader Resources

Most Read

Article Metrics

Altmetric

About article usage data:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean euismod bibendum laoreet. Proin gravida dolor sit amet lacus accumsan et viverra justo commodo. Proin sodales pulvinar tempor. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.


Open Access

Transactions of the Association for Computational Linguistics is Open Access. All content is freely available in electronic format (Full text HTML, PDF, and PDF Plus) to readers across the globe. All articles are published under a CC BY 4.0 license. For more information on allowed uses, please view the CC license.
Support OA at MITP

Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. We then present an approach to decoding anagrammed substitution ciphers, in which the letters within words have been arbitrarily transposed. It obtains the average decryption word accuracy of 93% on a set of 50 ciphertexts in 5 languages. Finally, we report the results on the Voynich manuscript, an unsolved fifteenth century cipher, which suggest Hebrew as the language of the document.

Bradley Hauer
Department of Computing Science, University of Alberta, Edmonton, Canada,
Grzegorz Kondrak
Department of Computing Science, University of Alberta, Edmonton, Canada,

Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. We then present an approach to decoding anagrammed substitution ciphers, in which the letters within words have been arbitrarily transposed. It obtains the average decryption word accuracy of 93% on a set of 50 ciphertexts in 5 languages. Finally, we report the results on the Voynich manuscript, an unsolved fifteenth century cipher, which suggest Hebrew as the language of the document.

Bradley Hauer
Department of Computing Science, University of Alberta, Edmonton, Canada,
Grzegorz Kondrak
Department of Computing Science, University of Alberta, Edmonton, Canada,