Artificial Intelligence Inching Closer to Deciphering Long Lost Languages


With new technology available to us, we’re inching closer to the end of the days when deciphering ancient languages is a painstaking task filled with frustration and confusion. Nifty machines following complex algorithms are helping researchers around the globe as they take on the often monumental task of understanding ancient texts and lost languages.

Big Think reports that linguistic experts estimate there have been approximately 31,000 languages spoken throughout human history. Many of them are now dead and forgotten, but a new AI project may be part of the answer in how to decipher the writing of ancient languages.

How could something like this work? As Big Think points out:

“While languages change, many of the symbols and how the words and characters are distributed stay relatively constant over time. Because of that, you could attempt to decode a long-lost language if you understood its relationship to a known progenitor language.”

And that knowledge is the basis for the work of a joint team of researchers from MIT’s Computer Science and Artificial Intelligence Lab and the AI project called Google Brain. According to Discover Magazine the project team has ‘devised an algorithm that can begin to match words from unknown languages to related words, or cognates, in languages that share the same root.’ By using computing and linguistics advancements the project is make headway in creating algorithms that will help other researchers decipher ancient texts.

And while the algorithm hasn’t been applied to undeciphered languages such as Olmec, Linear A , or Proto-Elamite script yet, the researchers behind it have shown it’s an advance in translating texts that have enough examples to provide a decent dataset for the algorithms to work with. So far their work has focused on training the system with Linear B and Ugaritic – two ancient languages that have mostly been translated by other means in the past.

AI ( peshkova /Adobe Stock ) may be the key to deciphering ancient languages. ( alagunasr /Adobe Stock)

Working with Linear B and Ugaritic

Linear B is a script that was used by the Mycenaean civilization in the Late Bronze Age, 3000-plus years ago. It was first deciphered in 1953 by an architect named Michael Ventris. Ugaritic on the other hand is a cuneiform early Hebrew language that also dates back some 3000 years. It was first identified by French archaeologists in 1929.

Ugaritic script. ( Public Domain )

To test the AI system, Big Think reports the researchers “focused on 4 key properties related to the context and alignment of the characters to be deciphered – distributional similarity, monotonic character mapping, structural sparsity and significant cognate overlap.”

It seems the effort was worthwhile because a report on the project states that: “When applied to the decipherment of Ugaritic, we achieve a 5.5% absolute improvement over state-of-the-art results. We also report the first automatic results in deciphering Linear B, a syllabic language related to ancient Greek, where our model correctly translates 67.3% of cognates.”

Clay tablet inscribed with Linear B script, from the Mycenaean palace of Pylos. (Sharon Mollerus/ CC BY 2.0 )

That means this could be a helpful tool for the researchers who want to speed up their work in studying these ancient scripts. While the creativity and understanding of past cultures is undoubtedly part of translation work – and something AI isn’t able to do yet – Big Think mentions the biggest plus to the new program “it can simply take a brute force approach that would be too exhausting for humans.” With the AI’s help, researchers “can attempt to translate symbols of an unknown alphabet by quickly testing it against symbols from one language after another, running them through everything that is already known.”

What Can AI Translation Teach Us?

By cracking the codes of ancient languages we will be able to gain much more insight into what life was like in ancient cultures. All sorts of insight could be gained on social, political, cultural, and everyday matters.

As technology keeps improving, it makes sense that researchers want to take advantage of it. Why spends hours upon hours painstakingly trying to compare the letters of the most distant with something more recognizable today when a machine can accomplish the same task in much less time (and with far less frustration)?

Artificial intelligence may be able to accomplish the same task in much less time, and with far less frustration. ( christian42/Adobe Stock)

In December 2018, BBC reported that Émilie Pagé-Perron, a researcher in Assyriology at the University of Toronto, was “coordinating a project to machine translate 69,000 Mesopotamian administrative records from the 21st Century BC.”

As Pagé-Perron explained, even though we have garnered much information through archaeological digs and analysis, there’s still a missing element that can be filled in by translating ancient texts:

“We have information about so many different aspects of the lives of Mesopotamian people, and we can’t really profit from the expertise of people in different fields like economics or politics, who if they had access to the sources, could help us tremendously to understand those societies better.”

Jacob Dahl, a professor of Assyriology at the University of Oxford, says that “We have more sources from Mesopotamia than we have from Greece, Rome and ancient Egypt together.” But only 10% of the thousands of examples of tablets and seals have been deciphered – the problem is not the lack of texts to work with, but finding enough experts who can read it.

The BBC report states that the Pagé-Perron team are “training algorithms on a sample of 4,000 ancient administrative texts from a digitised database. Each records transactions or deliveries of sheep, reed bundles or beer to a temple or an individual.” And while Pagé-Perron admits that the standalone texts are less than exciting, she believes “they’re extremely interesting if you take them as groups of texts.”

Irving Finkel, curator of the 130,000 cuneiform tablets in the British Museum’s collection, provides even more of a reason for making these translations of Sumerian writing possible:

“Sumerian is probably the last member of what must have been a large family of languages that goes back thousands and thousands of years. Writing appeared in the world just in time to rescue Sumerian… We’re just lucky that we had some ‘microphone’ that picked it up before it went away with all the others […] It’s actually rather astonishing how interesting it is when you find a human mind across millennia, where it is like talking to them on the telephone. It’s the most exciting thing in the world when you meet one of these people.”

“Sumerian is probably the last member of what must have been a large family of languages that goes back thousands and thousands of years.” ( Andrea Izzotti /Adobe Stock)

Finkel’s excitement is easily applied to other ancient cultures as well. Any ancient text translation may be the next big breakthrough in uncovering the biggest secrets in humanity’s past.

But that’s not to say that AI is anywhere near ready to take over for the good old-fashioned human creativity and social understanding that’s often needed to make the mental leaps that are necessary to making sense of old writing. For now, it’s just another amazing tool that can help researchers on the quest to make sense of what our ancient ancestors thought important enough to jot down or laboriously carve into tablets .

By Alicia McDermott


That would be great, minus the fact that some human would have definitely had their input to 'guide' this AI, to be sure. I'm not a real fan of the Rosetta stone for many reasons regarding the subjectivity surrounding the interpretations. For example, If I'm a Christian, and I'm, in the past, translating a work in a land that speaks about "GODS" and Supreme Beings, etc... Do, I place that in the work, or do I believe it will offend my God, so, I make the ever so slightly change (which does major damage) in wording and not give a true representation of meaning? Absolutely, I find that to be the case in almost everything; especially in archaeology because they want nice timelines with nice stories.

Most of my beliefs are based upon bits of information and my own personal views via discovery.

Neat! I'd love to see some of those really ancient written languages that we haven't been able to make heads or tails of - like Linear A, Rongorongo, the Olmec writing, etc., translated. I'm sure they'd give us amazing insight on those cultures. There's only so much that we can glean and guess from even extensive archaeological work, since we're looking at things from a modern perspective and a limited knowledge of their knowledge.

