Iraq Banner Desktop

Store Banner Mobile

New AI Algorithm is Cracking Undeciphered Languages

New AI Algorithm is Cracking Undeciphered Languages


Scientists have created a new algorithm to hunt down similarities in ancient languages and it’s set to untangle the mystery of all undeciphered languages.

According to a new MIT report “most languages that have ever existed are no longer spoken.” The study of lost and “undeciphered languages” is made exceptionally challenging as so few ancient records exist to assist common machine-translation tools and algorithms like Google Translate. Since nowhere nearly enough is understood about the grammar, vocabulary, or syntax of ancient languages, many texts remain undeciphered. Without these, an entire body of knowledge about the people who spoke them has been inaccessible - until now says the team from MIT.

Tracking the Evolution of Undeciphered Languages

The team of researchers from MIT s Computer Science and Artificial Intelligence Laboratory (CSAIL) recently created a new computer system that has the ability to “automatically decipher lost languages” without needing advanced knowledge of their relation to other languages - including pauses, punctuation, and inflection. Furthermore, this new system was tested for its capability of automatically determining any relationships between language groups, and in these tests it was established that the Iberian language of Spain is not related to Basque.

One of the five plaques of the Monumento a los Fueros (Paseo de Sarasate, Pamplona). This one was written in 1905 in Basque language and in an adaption of north-eastern Iberian Script. (CC BY SA 3.0)

One of the five plaques of the Monumento a los Fueros (Paseo de Sarasate, Pamplona). This one was written in 1905 in Basque language and in an adaption of north-eastern Iberian Script. (CC BY SA 3.0)

In this new project, that was partly funded by the Intelligence Advanced Research Projects Activity, (IARPA), MIT professor Regina Barzilay explains in a new paper that the system “relies on several principles grounded in insights from historical linguistics” because languages evolve in predictable ways. Dr. Barzilay explains that languages rarely add or omit entire sounds and that certain sound substitutions are likely to occur, for example, words with the “p” sound in the parent language might evolve a “b” sound in the offspring languages, but because of the significant pronunciation gap it is less likely a “p” would become a “k”.

Translating Sounds in the Vast Silence of Cyber Space

By assembling all of the known linguistic patterns, the team of scientists developed a new “decipherment algorithm” that is designed to process and interpret what the researchers describe as the “vast space of possible transformations and the scarcity of a guiding signal in the input.” The new algorithm self-learns by embedding language sounds “into a multidimensional space where differences in pronunciation are reflected in the distance between corresponding vectors.”

What this means is that the new system, or algorithm, enables researchers to isolate language patterns expressing change, and it uses these to form new computational constraints and restrictions, and once these are segmented into words in a lost language, similarities with related languages can be mapped. Basically, it hunts down commonalities in sounds and suggests possible links.

Apart from identifying some signs for numbers, Linear A is still an undeciphered language. (Olaf Tausch/CC BY 3.0)

Apart from identifying some signs for numbers, Linear A is still an undeciphered language. (Olaf Tausch/CC BY 3.0)

Programing the Vampire Phonetic Mirror

Floating in a conceptual cyber-space, the new algorithm acts like a ‘vampire phonetic mirror,’ (my words) in that it reflects any sound structures that it recognizes as similar to others, yet it offers no reflection from unrelated, or unconnected, sounds, (hence the vampire). The system can also identify the proximity between any two given languages and it can accurately determine “language families.” This is why the team applied the new test (algorithm) on the Iberian and Basque languages, “as well as less likely candidates from Romance, Germanic, Turkic and Uralic families.”

While Basque and Latin were found to be closer to Iberian than other languages, they were still far too different to be considered “related,” and the team of scholars are currently in disagreement on the actual related language, with some scholars claiming Iberian doesn’t relate “to any known language,” according to the new paper.

The MIT researchers hope their connecting ancient texts to related words in known languages, a process known as “cognate-based decipherment,” is only the first step in the creation of a super-advanced system that will ultimately be able to identify the semantic meaning of words, even if it is unknown how exactly these ancient words were originally spoken.

Top Image: Rongorongo script is an undeciphered language. Source: Arthur Chapman/CC BY NC 2.0

By Ashley Cowie



As an image maker I take pictographs & heiroglyphics at face value. I do not try to use them, or imagine them as forming a part of what we are SUPPOSED to be able to read. Even supposing there is a reasonable interpretation of heiroglyphics and older languages, and the mistranslations of reigns lengths etc. must posit a limit to our accuracy, it is quite possible that characters change meaning in relation to their neighbours- as punctuation alters sense in old Hebrew, and rather more than that:

 Egyptian Heiroglyphics, Harappa script & Linear A, with Rongorongo tell us, possibly, about another, divinized form of human life.

Interpreters of Heiroglyphics recognize pharoahs names & official titles in those ovoid separated colophons they complain are too often erased! but here as elsewhere the lines of drawings are surely representations of PROCESSES, as well as, possibly, bits of military history and kings praises.

These cultures had more than a single language and I take it that the truly indecipherable were designed to be so, and that they record processes of initiation & divinization by priests, to preserve the means by which one has become reincarnated etc.

Hence the pictures. What ARE these images? Well, the Harappa script in particular uses many many figures, in a shorthand by which their individual attributes, at that stage of the process, are added to them. One little stick man is holding one something, then two of them, and then various objects that could be nets or animals, there are those that remind me of the Long Man of Wilmington, but they ONLY remind me of other pictures, and none (except perhaps those that seem to indicate quantities) make me want to read them- they just dont.

They are quite unlike the basque language or Sanscrit, or Runes, all of which are read the way we read our languages- as sequences of sounds.

These pictorial languages are not languages, but records of events affecting individuals in their path towards God. The argument that “what about all those bas-reliefs in Egypt?” fails, and for this reason: over the long course of divinization for the priesthood initiates had to do something, no?  They were apprenticed to artists and sculptors TO BE ABLE to record their being in an impermeable, eternal way: they did, after all, claim to live and be led by the gods themselves, and so they needed to celebrate it, just as years later Giotto recorded the life of Christ and Michaelangelo (the fallen Angel Michael?) painted the Sistine Chapel and carved the Pieta.

I believe the Rongo rongo script as certainly the Harappa have exactly the same function, but in societies less public, more secretive, and more inti a personal relationship with the Divine, than a public one.

The clay tablets from Mohenjo-Daro and Harappa showing animals, and one a man, record their Divinities at the time. The man is unrecognizeable as are the animals: the equivalent of animal-headed gods in Egypt and Babylon.

The reason these gods are represented with animal heads and a bevy of animal attributes is to safeguard them without betraying their identity. Each of the gods exists always in a human beings body and that body can be killed- as Jesus death bears witness to. So, in the old religions the boddhisattvas and saints of the day are shown as animal forms, or human forms with animal heads, just as they still are today in India, and I would respectfully suggest that the interpretation of the currently indecipherable be given over to the asian scholars whose own languages are pictographic at heart, as we in the West will get nowhere, having wrongly decided, a very long time ago, that people would spend their entire adult lives carving merely, only, the histories of kings into hard rock.

ashley cowie's picture


Ashley is a Scottish historian, author, and documentary filmmaker presenting original perspectives on historical problems in accessible and exciting ways.

He was raised in Wick, a small fishing village in the county of Caithness on the north east coast of... Read More

Next article