Ancient languages hold a treasure trove of information about the culture, politics and commerce of millennia past. Yet, reconstructing them to reveal clues into human history can require decades of painstaking work. Now, scientists at the University of California, Berkeley, have created an automated “time machine,” of sorts, that will greatly accelerate and improve the process of reconstructing hundreds of ancestral languages.
Proto-languages are linguistic ancestors which gave rise to modern languages. These forbears include Proto-Indo-European, Proto-Afroasiatic and Proto-Austronesian. Typically, their reconstruction is a painstaking process which can take linguists many years.
The new software uses probabilistic reasoning which explores logic and statistics in order to perform its reconstructive work. It focused on 637 modern Austronesian languages, and analyzed a database of over 140,00 words to provide a reconstruction of Proto-Austronesian which replicated the work of human linguists at an accuracy of 85 percent – though far more quickly.
Indeed, the researchers posit that a large-scale reconstruction could be performed in a matter of days or even hours in this way.
The computer program is based upon the linguistic theory that words evolve in a way which can be thought of as similar to a family tree. That is, traces of proto-languages remain in the “roots” of languages even as they evolve over time.
Utilizing an algorithm called the Markov chain Monte Carlo sampler, the software sorted through sets of words in the modern Austronesian languages which share a common sound, history and origin. From there, it determined whether the words shared a common mother language – in this case, Proto-Austronesian.
“What excites me about this system is that it takes so many of the great ideas that linguists have had about historical reconstruction, and it automates them at a new scale: more data, more words, more languages, but less time,” said Dan Klein, an associate professor of computer science at UC Berkeley and co-author of a paper on the subject which was published in the journal Proceedings of the National Academy of Sciences.
In addition to reaching into the past, the researchers note their software can also predict the future evolution of words, providing clues as to how languages will change over time.
Source: UC Berkley