VocalIQ's speech recognition software

Teaching computers to listen, comprehend and learn

Have you ever instructed your car to take you home without taking your hands off the wheel? Or asked your mobile phone to dial your mother or to set a three-minute timer for your boiled egg?

If so, chances are that you have benefited from technology developed at the University of Cambridge.

This story was included in our innovation showcase for our 10th anniversary celebrations in 2016.

Machines have been capable of recording voices for many years, but it is only recently that they’ve been able to take and ‘understand’ oral instructions. Now they can even answer back, conversationally, and engage in dialogue with people, which is an important facilitator in the much-hyped Internet of Things.

“The use of speech to interact with machines has reached a tipping point,” said Steve Young, Professor of Information Engineering at the University of Cambridge. “Without smart conversational interfaces which can adapt to suit the user, the Internet of Things cannot flourish.”

He is well qualified to comment: Young has spent the best part of 20 years creating software that helps people communicate with computers. He was behind two successful University spin-outs: VocalIQ, a Cambridge Enterprise portfolio company he co-founded, which was bought by Apple Inc. in 2015, and Entropic, which he co-founded and which Microsoft acquired in 1999.

In fact, Young was responsible for some of the earliest tools used for speech recognition research. In 1989, working in what was then the Speech Vision and Robotics Group of the Engineering Department, he developed the first version of the Hidden Markov Model Toolkit. Used worldwide, this set of tools still has a loyal following in the speech research community and beyond, with applications extending to speech synthesis, character recognition and even DNA sequencing.

First steps towards commercialisation

The most recent application of Young’s research began to take shape in 2010. Young and Blaise Thomson, a Research Fellow in Engineering Department’s Dialogue Systems Group, approached Cambridge Enterprise with questions about commercialisation prospects for some new work. “When Blaise and Steve first approached CE in 2010, it was clear they had developed something very exciting”, said Gillian Davis, Technology Manager at Cambridge Enterprise. “Their novel algorithms, which enabled two-way dialogue between machine and human, had the potential to dramatically increase the adoption of voice recognition.”

The Technology Transfer office worked with Blaise and Steve, helping to prepare their intellectual property asset for investment. The software was based on more than a decade of research. Some of the work had been done with funding from a consortium of academic institutions and commercial organisations. Before the software could be licensed or sold, it was essential to pin down ownership rights and ensure that any third party rights were respected.

VocalIQ is created

Given the founders’ aim to start a new company to develop the work commercially, the Seed Funds team also got involved to support the new company financially from the University early stage investment funds. In March 2011 VocalIQ was formed—with Young as Chairman and co-founder Thomson as CEO. In 2014 Cambridge Enterprise seeded the young company with £375k with further financing from technology investor Amadeus Capital Partners.

VocalIQ offered transformative new software that used a fresh approach to solve the challenges presented by previous voice recognition systems. The software allowed dialogue between human and machine, providing a seamless interface between people and their mobile devices, televisions and cars.

Previous voice recognition interfaces relied heavily on predefined commands and was limited by the computing power available. VocalIQ’s software offered users the ability to speak naturally to their smart devices. Instead of merely recognising speech, the technology was able to understand and interpret dialogue. It could also learn online so that when it made a mistake, it learned and avoided the mistake in future. “There are no commands for the user to learn,” said Thomson, an expert in machine learning and dialogue system design. “It’s about having a conversation.”

Rather than merely digesting the user’s request “Find a restaurant,” the software learned to grasp nuances. A typical exchange might sound something like this, as a user tells his mobile: “I need to find a nice restaurant for my girlfriend”. The software might then respond: “There’s a really nice place to eat, it has good reviews, and it’s a 10 minute walk from your location. Are you OK with that?” The user might then ask: “Is it suitable for vegetarians?” The software might respond: “Yes, there are vegetarian options available.”

“For all of the many devices we use, we want to find a way to get what we need, in the easiest, safest way possible”, Thomson said. “That’s where voice comes in.”

Industry pundits share his view, with many agreeing that Artificial Intelligence (AI), powered by voice, is the Next Big Thing. All the major players—in Silicon Valley and elsewhere—are vying for supremacy in the AI/voice recognition space, and Cambridge technologists are behind many of the most significant developments.

Steve Young now holds a joint appointment between Cambridge University and Apple Inc., where he is a member of the Siri Development Team.