5in5 Hearing: IBM Research Is Poised to Give Caretakers Insight Into the Sounds Babies Make
Illustration courtesy of IBM
Most people take hearing for granted—birds chirping, leaves rustling in the wind and the gentle gurgling of a brook. But listening is much more than a restful way to spend a quiet afternoon. According to IBM Master Inventor Dimitri Kanevsky, hearing is a critical part of our everyday lives.
And he should know. Deaf since early childhood, Kanevsky has spent much of his professional career understanding sound and how it can be used in critical ways, such as determining why a child is crying or if a mudslide is about to happen. And as he describes, he even envisions phones that can self-mute during an important conference call.
Q. Could you describe the patent you and your colleagues recently received involving translating baby sounds to, in essence, give babies a voice? How does that technology work?
A. One aspect of the invention is to identify possible reasons for a baby’s cry. Is the baby hungry; is he or she in some pain or scared? This can be done by creating a large database of baby sounds that are labeled by experts. These labels mark speech segments with reasons for sounds, such as hunger, pain and fright. One can also use sensors—such as those that read brain signals, measure humidity, pulse, temperature, breath, et cetera—that allow us to objectively identify babies’ physical conditions and relate them to babies’ sounds. A machine-learning system abstracts from this training database to map audio segments into the most probable causes of their sounds.
Another aspect of this invention is to translate toddler sound phrases into meaningful sentences. Such systems can work similar to speech-to-speech translation. One needs to create a large database of typical toddler expressions that are translated by experts. Using these parallel corpora, one can train a translation system for toddlers similar to how translation systems for different languages are developed. This can be combined with other modalities, like video or other biometrics, to more accurately classify or translate sounds.
This can be useful in hundreds of scenarios. Consider, for example, the following situation: Imagine a baby who cries a lot at night. Parents have been told not to come to the child immediately each time she cries. It’s bad for the baby—it spoils the child. So when the parents hear their baby crying in another room, they may wonder if they should go to the baby. Our system can understand why babies cry and help parents make informed decisions about what they should do. It can analyze baby sounds and tell them, for example, nothing has happened to the child and there’s no need to go to her now. But in another situation, the system may understand that the baby has a stomachache. She needs help. She needs a warm hand. Or our system could decide that the child is afraid of something and you should go calm her down.
Consider another situation: A new babysitter is with a baby, the child is crying and his parents aren’t at home. What should the babysitter do? Our system may advise that the child is thirsty; give him a drink. Or the system can send a message to the parents’ smartphone.
Q. You’ve written that sensors already help with traffic congestion and reducing water usage. How can these same sensors work in relation to sound?
A. We use a distributive system of sensors to lessen traffic congestion and the reduction of water usage. This means we have several sensors that provide audio information from many sources to a central system that can analyze them and make decisions. These sensors can measure sound signals or pressure. A system of sensors can operate as a layered network that sends well-represented data to a central server that provides formed patterns to be used for analyses. Audio information can be combined with other media from sensors, like video.
comments powered by