Voice Control Your Browser with Stëmm


  1. Blog
  2. 2022
  3. 03
  4. Voice Control Your Browser with Stëmm

Back in January, we supported Hack Cambridge - a 24-hour student hackathon. The team behind Stëmm wanted to bring voice control to one of the most used applications globally - Google Chrome. I sat down with Benedek Der, Bianca Sandu, Julius Weisser, and Siddharth Srivastava to ask them about their project.

The team behind Stëmm all study Computer Science at the University of Warwick, are friends, and most of them are also flatmates. While Hack Cambridge was their first in-person hackathon, at Hack Duke in October 2021, they built a Chrome extension that identified COVID facts in a webpage.

Most of the team met up a week before Hack Cambridge to start brainstorming ideas, not aware that themes would be announced on the morning. They marched down to the venue, electronics kit in hand, and realized they would need to rethink their game plan as soon as the opening ceremony took place.

The Project

Fortunately, some of the team saw our live demo at the event that highlighted how easy it is to get started with Deepgram's Speech Recognition API in the browser. While they had to decide which sponsored challenge categories they would incorporate into their project, the team "instantly recognized the vast potential the Deepgram API gives developers by allowing us to use speech recognition in innovative ways within our projects" says Sid.

After bouncing around ideas, they chose to expand their knowledge from October's event. They landed on what would become Stëmm - the aim was to build a browser interface for users with motor disabilities. The team leveraged both the Deepgram API and Chrome API into a Chrome extension that, once installed and given microphone permissions, lets users control Chrome hands-free with voice commands like "chrome, open tab," "chrome, search for recipes," and "chrome, add bookmark."

Command and Control

This use case category is very familiar to us at Deepgram - and we call it "command and control," which allows voice control of systems. Using Deepgram's keywords and search features, along with custom processing, you can build something similar in just a few lines of code.

We've seen it used in web applications, as an interface for games, and dedicated devices.

The Hours Tick By

As you might imagine, Google has a strict set of security provisions for extensions, and during the hackathon this became the main challenge to overcome. I remember having multiple conversations with the Stëmm team over several hours and wondering if they'd be able to overcome the blockers and get their project working, especially given the vague error messages they were battling. Thankfully, they managed to work out the right configuration that allowed their extension to operate.

Once the extension could access a user's microphone and get transcripts from Deepgram, the result used a language processing algorithm built by Benedek & Bianca to identify the commands in the recorded text, and by integrating these commands with the Chrome developer tools, it executes them to control the browser.

The extension is still somewhat limited in terms of commands, but the team directly welcomes contributions to their project repository to add new features. You can find setup and contribution guidelines directly on GitHub.