How movie-watching marathons will help machines improve lives of people with hearing loss

Siri and the Amazon Echo are already pretty good at understanding speech, but new machine-learning techniques mean our devices will be much smarter at recognising other everyday sounds. How will they do this? By binge-watching videos.

Machine learning advances match sounds to images

applications for facial technology are entering the mainstream but offer huge benefits to disabled people

Thanks to the wealth of labelled data available online, computers are already pretty smart when it comes to recognising pictures. Both Google and Facebook can distinguish many hundreds of individual objects in images, including specific people, pets, cars and foods. They use this capability to ‘auto-tag’ your photos and videos.

As for sounds, the likes of Siri and the Amazon Echo might be great at understanding language and many thousands of spoken commands. But, when it comes to other everyday sounds such as a doorbell, police siren or dog’s bark, the tech isn’t nearly so advanced.

Using software to recognise laughter

That’s soon set to change. A team at the Massachusetts Institute of Technology (MIT) has been using advances in machine-learning to detect what’s happening in a video to match it with the associated sound. So, when someone laughs for example, facial recognition algorithms spot the expression and ‘learn’ that the accompanying sound is laughter.

Yusuf Aytar, who was part of the research team at MIT told New Scientist: "We thought - 'we can actually transfer this visual knowledge that’s been learned by machines to another domain where we don’t {currently} have any data, but {where} we do have this natural synchronisation between images and sounds.”

You might think that we should already be able to teach computers to recognise someone laughing. But, just as it took a huge number of pet pics uploaded to Facebook feeds to teach the software to distinguish a boxer from a bulldog, it takes hundreds, or even thousand, of samples of chuckles and guffaws brought together into a massive data set for it to become proficient at recognising laughter. Computers are dealing with a variety of types of laughter in a wide variety of aural environments.

Separating speech from surrounding sounds

our busy lives require us to check our phones in all sorts of environments

When the software can easily recognise sounds, it can also more effectively ignore them. This will really come in handy when virtual assistants like Siri or the Echo are trying to understand speech when there are a lot of other noises going on.

Like me, you might have tried to talk to your phone on a busy street and almost ended up sending someone a garbled text or setting a timer for four and a half minutes.

As a result of this machine-learning our virtual assistants should more effectively ‘tune out’ that noise - so we’ll see a rise in recognition and a fall in frustration.

Sound support for deaf people

For people who are deaf or have hearing loss, the ability to be alerted to everyday noises may be helpful or, in some cases, crucial. Being informed of a fire alarm by urgent vibrations on your smartphone, or of a warning car horn by a cascade of taps from the smartwatch on your wrist, could be a life-saver.

In the not-too-distant future, subtitles on television and online videos may well be generated ‘on the go’ by speech recognition. This will be assisted in no small part by the software’s ability to recognise, and then filter out, noises.

Moreover, the myriad of sounds in each movie will be automatically recognisable and flagged on-screen as additional subtitles.

Safe and sound with added security

Soon security systems will be able to listen out for sounds such as breaking glass or splintering wood and automatically alert the owner or agency for a quick response.
Again there are obvious benefits for people with hearing loss. From baby monitors to security systems, smart sound-recognition will help provide vital information in a noisy world.

Making sounds searchable

Using Google, Bing or a virtual assistant such as Siri, we all search for phrases on the internet every day. You may also search for images or video - although what you’re really doing is searching for text labels that have almost invariably been supplied by humans. Soon, images and videos will be routinely scanned for objects, and videos and audio-recordings for sounds. When this happens, you’ll be able to do a search for a specific sound within any audio or video, or across the entire internet.

In the not-too-distant future it will feel completely normal to be able to search for ‘DeLorean lightning’, for example, and be taken to the exact point in Back to the Future where Marty McFly is struck by lightning as the DeLorean hits 88MPH. The search for ‘DeLorean’ matched to the image and ‘lightning’ matched to the sound.

It’s not just people with a hearing impairment who will find sound-recognition beneficial. As a blind person who prefers audio as opposed to video, the idea of a universal sound search seems pretty exciting – although smarter object recognition will undoubtedly help me no end in fathoming the very visual world that is today’s internet.

Regardless of your disability the future’s coming fast - and it sounds like it’s going to be good.

More information