How Deepgram Works

Last updated 09/14/2021

Deepgram is the only truly end-to-end deep learning automatic speech recognition (ASR) system that is capable of delivering real-time results as words are being spoken and is built to scale for enterprise. Deepgram's powerful API lets you seamlessly implement audio transcription within your web, mobile, and voice applications using either our cloud-hosted platform, or an on-premise deployment when confidential, regulated, or otherwise sensitive audio data is involved.

Deepgram's Engine

Deepgram leverages a backend speech stack that replaces hand-engineered pipelines with heuristics, stats-based, and fully end-to-end AI processing, using hybrid models trained on PCs equipped with powerful GPUs. Each custom model is trained from the ground up and can consume files in multiple formats, ranging from phone calls and podcasts to recorded meetings and videos. Deepgram’s models automatically pick up things like microphone noise profiles, as well as background noise, audio encodings, transmission protocols, accents, valence (i.e., energy), sentiment, topics of conversation, rates of speech, product names, and languages.

Accuracy

Deepgram processes speech, which is stored in what’s called a “deep representation index” that groups sounds by phonetics as opposed to words. Customers can search for words by the way they sound and, even if they’re misspelled, Deepgram can find them. When paired with our trained custom models, we can boost speech recognition accuracy above 90%.

Speed

In addition to improved accuracy, we also speed up both real-time and streaming transcription, all the while handling thousands of simultaneous audio streams.

Features

While accuracy and speed are key to user experience, easy customization is also important to make Deepgram a full-featured, comprehensive ASR platform. Deepgram provides:

Implementing Deepgram

Deepgram provides a library of off-the-shelf speech recognition models that you can use to get started unlocking your audio data immediately, but you can markedly increase accuracy and adjust to complex use cases by training models using machine learning.

Have a complex use case or prefer to forgo training yourself? Deepgram AI experts are available to train an expert model to your needs, delivered in weeks.

To start testing speech recognition functionality within your data, you can use our interactive API Reference (no-code) or build using one of the official Deepgram SDKs (code).

There’s much more to creating a full ASR implementation, and our documentation walks you through other possibilities. But first, why not see Deepgram in action?

Full Ecosystem

To ease your interaction with Deepgram's products, we provide:

  • A full-featured console that allows you to organize data, request API keys, and monitor usage.
  • SDKs for two languages (Node, Python) with more on the way.
  • Use cases guides that walk you through various implementations of Deepgram, including real-time meeting transcription, talk-time analytics, and transcribing Twilio voice calls.