Computers may soon be able to interpret unique uses of the human language, according to a recent study published by researchers from UC Berkeley, the University of Toronto and Lehigh University.
Barbara Malt, director of Lehigh University’s cognitive science program, helped develop the computational models that were tested in the study. Malt explained that different words, such as “face,” have accumulated different meanings, or “senses,” over the course of history.
“ ‘Face’ started with the human head,” Malt said, describing how the word is now also used to represent other frontal surfaces or emotional expressions. “The word has gained these senses over time, but yet they’re interrelated.”
Researchers used “The Historical Thesaurus of English” — a historical database of English words — to “time stamp” when new meanings of words had arisen, then used different algorithmic models to predict the historical order in which different word meanings emerged.
The study also featured the contributions of UC Berkeley junior Christian Ramiro, a cognitive science major, who coded multiple algorithms for the project.
The most successful model, named the “nearest-neighbor chaining algorithm,” predicts a chain of words based on the highest semantic similarity between them. The success of this model suggests that throughout history, when new meanings arise in language, they connect to the next-closest meaning of the the word, regardless of how old or new that meaning is, according to Ramiro.
“We’re interested in how words come to have the families of related senses that they have,” Malt said. “In particular … whether (the) development of additional senses (is) constrained by principles of what is (the) most efficient way to use words.”
The paper explores maximally efficient ways of connecting word meanings, according to Malt, and the models are all ways of creating families of those connected meanings. The paper also clarifies efficiency in language as minimizing the costs of generating, interpreting and learning words.
In the future, researchers hope to extend the study to other languages, Ramiro said. Malt added that if there were comparable historical databases for other languages, the researchers would expect to find the same results.
According to Malt and Ramiro, the study has also the potential to improve natural language processing — a branch of artificial intelligence focused on helping computers interpret and manipulate human language.
“Our study is a stepping stone toward understanding the time-varying (or creative) properties of the lexicon in computational terms,” said Yang Xu, University of Toronto computational linguist and former postdoctoral researcher at UC Berkeley, who spearheaded the study, in an email. “An exciting area in the future is to explore how this line of work may intersect with natural language understanding and learning, in children and machines.”