A computer that aims to translate thoughts into natural sounding speech has been hailed by its developers as an “exhilarating” breakthrough.
their paper in the scientific journal “Nature” on Wednesday.
The device works by using a brain-computer interface (BCI), which works out a person’s speech intentions by matching brain signals to physical movements they would usually activate in a person’s vocal tract – their larynx, jaw, lips and tongue. The data is then translated by a computer into spoken words. The same technique has been used to generate limb movement in people with paralysis.
Previous BCI systems for speech facilitation have focused on typing, generally allowing people to type a maximum of 10 words per minute – massively lagging behind the average speaking speed of around 150 words per minute.
Scientists worked with five volunteers whose brain activity was being monitored as part of a treatment for epilepsy. The researchers recorded activity in a language-producing region of the brain as the volunteers read several hundred sentences aloud.
Researchers working on the project claimed their computer system would not only restore speech, but could eventually reproduce the “musicality” of the human voice that conveys a speaker’s emotions and personality.
“For the first time, this study demonstrates that we can generate entire spoken sentences based on an individual’s brain activity,” Edward Chang, professor of neurological surgery and the study’s senior author, said in a press release. “This is an exhilarating proof of principle that with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss.”
Gopala Anumanchipalli, a speech scientist who led the research, said the breakthrough came by linking brain activity to movements in the mouth and throat during speech, rather than associating brain signals to acoustics and sounds.
“We reasoned that if these speech centers in the brain are encoding movements rather than sounds, we should try to do the same in decoding those signals,” he said in the press release.
Up to 69% of the words generated by the computer were accurately identified by people asked to transcribe the computer’s voice. Researchers said this was a significantly better rate than had been achieved in previous studies.
“We still have a way to go to perfectly mimic spoken language,” said Josh Chartier, a bioengineering graduate student who worked on the research. “We’re quite good at synthesizing slower speech sounds like ‘sh’ and ‘z’ as well as maintaining the rhythms and intonations of speech and the speaker’s gender and identity, but some of the more abrupt sounds like ‘b’s and ‘p’s get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what’s currently available.”