More than 500 hours of Youtube footage is uploaded every minute, but almost every second is impossible for me and millions of other visually impaired people (and Google) to understand. The good news is that we're entering a new era for intelligent image recognition. We're seeing new innovations coming onstream from some of the biggest names in tech and I predict that by the end of the decade we'll see the kinds of advances that make a real difference to the experience of millions of users worldwide.
Here come the robots
In my last blog, I talked about Twitter now including an alt text option for images, so that tweeted pictures can be more easily described by those tweeting them, and thus understood by audiences including blind users, visually impaired people, and search engines.
While humans are still best placed to interpret what's in an image, I suspect very few people will take the time to use this new hidden Twitter feature. However, the latest advances in artificial intelligence could see robots able to interpret images and add enough information to make sense of an ever increasing visual multimedia world.
Facebook moves to offer automated interpretation of image contents and Microsoft's moves to also offer automated interpretation of image contents aren't as good as humans yet, but are improving all the time. In fact, I believe that by 2020 such tech will be good enough at pulling out the salient features of an image that it will be indistinguishable from humans in the summaries it produces. It'll be a sort of Turing test for those times when friends are talking you through their holiday snaps.
That's right, my prediction is that by 2020 you won't be able to tell the difference between your friend's description of an image and what a robot tells you it can 'see'.
Microsoft Azure advancements
It's been announced that Microsoft's Azure Media Services is now offering a suite of functionality that promises improved understanding of images and videos for blind users and search engines alike. Azure is the platform for all the company's cloud-based services and this recent addition is a welcome advance in a very exciting (although being blind I might be biased!) area.
According to Techcrunch, one feature of this new offering from Microsoft is the ability to analyse a video and automatically select certain snippets to create a representative summary of the entire video. It also offers improved Optical Character Recognition (OCR) of text within videos.
This makes it much easier for blind users to access that content of a video which may contain written text - such as the opening credits of a film, subtitles, a video of a slideshow or webinar presentation. It is also useful for search engines which must currently rely on image filenames, titles or alt text (if present) to identify relevant images.
Emotions in motion
A couple of other interesting features included in the recent Microsoft announcement are the fact that both face and emotion detection is now available for videos, as well as movement detection (which automatically identifies when there's been activity in a video) that allows for more significant image analysis to be triggered.
We mentioned last time how this software still has a long way to go. But with tech giants such as Google and Microsoft racing to deliver truly intelligent solutions, I'm very hopeful that these accelerating advancements will deliver a more inclusive multimedia future for everyone.