An artificial intelligence can decode words and sentences from brain activity with surprising – but still limited – accuracy. Using just a few seconds of brain activity data, AI guesses what a person has heard. It shows the correct answer in its top 10 options up to 73 percent of the time, researchers found in a preliminary study.
The AI’s “performance was beyond what many people thought was possible at this stage,” said Giovanni Di Liberto, a computer scientist at Trinity College Dublin who was not involved in the research.
Developed at Facebook’s parent company Meta, AI could eventually be used to help thousands of people around the world who are unable to communicate through speech, writing or gestures, researchers report Aug. 25 on arXiv.org. It includes many patients in minimally conscious, stuck, or “vegetative states”—what is now commonly known as unresponsive wakefulness syndrome (SN: 2/8/19).
Most existing technologies to help such patients communicate require risky brain surgery to implant electrodes. This new approach “could provide a viable way to help patients with communication deficits … without the use of invasive methods,” says neuroscientist Jean-Rémi King, a Meta AI researcher currently at the École Normale Supérieure in Paris.
King and his colleagues trained a computational tool to detect words and phrases on 56,000 hours of speech recordings from 53 languages. The tool, also known as a language model, learned to recognize specific features of language both at a fine-grained level — think letters or syllables — and at a broader level, such as a word or a sentence.
The team applied an AI with this language model to databases from four institutions that included brain activity from 169 volunteers. In these databases, the participants listened to different stories and sentences from e.g. Ernest Hemingway The old man and the sea and Lewis Carroll’s Alice‘s Adventure in Adventureland while people’s brains were scanned using either magnetoencephalography or electroencephalography. These techniques measure the magnetic or electrical component of brain signals.
Then, using a computational method that helps account for physical differences between actual brains, the team tried to decode what the participants had heard using just three seconds of brain activity data from each person. The team instructed the AI to match the speech sounds from the story recordings to patterns of brain activity that the AI calculated as similar to what people heard. It then made predictions about what the person might have heard in that short time, given more than 1,000 possibilities.
Using magnetoencephalography, or MEG, the correct answer in the AI’s top 10 guesses was up to 73 percent of the time, the researchers found. With electroencephalography, that value dropped to no more than 30 percent. “[That MEG] the performance is very good,” says Di Liberto, but he is less optimistic about its practical use. “What can we do about it? Nothing. Absolutely nothing.”
The reason, he says, is that MEG requires a bulky and expensive machine. Bringing this technology to clinics will require scientific innovations that make the machines cheaper and easier to use.
It’s also important to understand what “decoding” really means in this study, says Jonathan Brennan, a linguist at the University of Michigan in Ann Arbor. The word is often used to describe the process of deciphering information directly from a source—in this case, speech from brain activity. But the AI could only do this because it was given a limited list of possible correct answers to guess.
“With language, if we want to scale for practical use, it’s not going to cut it because language is infinite,” Brennan says.
What’s more, Di Liberto says, the AI decoded information about participants passively listening to audio, which isn’t directly relevant to nonverbal patients. For it to become a meaningful communication tool, scientists will need to learn to decipher from brain activity what these patients intend to say, including expressions of hunger, discomfort or a simple “yes” or “no.”
The new study is “decoding speech perception, not production,” King agrees. Although speech production is the ultimate goal, “we’re pretty far off for now.”