Tikalon Header Blog Logo


July 16, 2010

One book that I have on my bookshelf is Lost Languages by Andrew Robinson (McGraw-Hill, 2002). I bought my copy, a hard cover, first edition, in a used book bookstore at a great price. It's a shame that used book bookstores will not exist in just a few years; and brick-and-mortar bookstores will not exist just a few years thereafter. What will really hurt is when paper books have disappeared. A lot of information is becoming more fragile, since it's now just magnetic blips, microscopic optical smudges on plastic, or fleeting electrons in a piece of silicon. Perhaps all traces of our language and scholarship will have disappeared just a few centuries hence.

One chapter in the Lost Languages book is devoted to the Phaistos Disk, a fired clay disk about six inches in diameter that's completely covered on both sides with stamped symbols arrayed in a spiral from center to edge. The disk is about 3,500 years old, and it was found in the Minoan palace of Phaistos. The language is unknown, although there are similarities to Linear A and Linear B. No other examples of the Phaistos script have been found, so it's unlikely that a translation can be made. There's no point of reference. Not so for the language, Ugaritic, which has been automatically deciphered in a computer-assisted comparison to Hebrew.

Phaistos Disc

Both sides of the Phaistos Disk (photo by Maksim).

No, a computer wasn't the first to decipher this dead language. Ugaritic inscriptions were discovered in 1928, and because of this language's similarity to Hebrew, it was manually deciphered in 1932 using common techniques. The manual decipherment was important, otherwise the computer people wouldn't know whether their program was working. The key to language decipherment is findings cognates. Cognates are words in different languages that have a common etymological origin. For example, the English, "silver," and German, "silber;" or the Latin "argentum," French, "argent," and Italian, "argento."

A paper describing the computer approach, "A Statistical Model for Lost Language Decipherment," by Benjamin Snyder and Regina Barzilay of MIT, along with Kevin Knight of USC, will be presented at the Annual Meeting of the Association for Computational Linguistics, July 11-16, 2010.[1-2] The authors used a computer model of Bayesian inference to statistically compare Ugaritic with Hebrew. As they write in their abstract,
"When applied to the ancient Semitic language Ugaritic, the model correctly maps 29 of 30 letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for 60% of the Ugaritic words which have cognates in Hebrew."
Surprisingly, automated computer analysis is not common in attempts to decipher ancient languages. Linguists rely mostly on their intuition, but Snyder, et al. have demonstrated that much of this intuition can be coded into a computer program, especially when a relationship is suspected between the lost language and a known language. A 1999 attempt by others using a Hidden Markov Model character substitution cipher correctly translated only 29% of the cognates. Unfortunately, only one-third of Ugaritic words have Hebrew cognates, so even 60% is not that good. It reminds me of The Dukes of Hazard television episode in which Boss Hogg offers Sheriff Rosco P. Coltrane a cut of profits that were "ten percent of ten percent."

The authors point out that Google's translation program works for only 57 languages (The Heinz subset?). Using principal features of their approach might allow an extension to thousands of languages. Personally, I'd like a translation of the Voynich Manuscript.[3]


  1. Benjamin Snyder, Regina Barzilay and Kevin Knight, "A Statistical Model for Lost Language Decipherment," To Appear, Annual Meeting of the Association for Computational Linguistics, July 11-16, 2010.
  2. Larry Hardesty, "Computer automatically deciphers ancient language," MIT Press Release, June 30, 2010.
  3. Voynich Manuscript Web Site.                                   

Permanent Link to this article