Integrate Deepgram with Zoom
Zoom is a widely-used cloud-based video communications tool that lets you host virtual one-on-one or team meetings, webinars, and live chats and provides audio, video, screen-sharing, and other collaboration features. Zoom offers enhanced Real-Time Messaging Protocol (RTMP) support, which allows you to extract the audio from your content and stream it to Deepgram to get real-time automatic speech recognition for all of your Zoom calls.
To help you integrate between Zoom and Deepgram, we provide an example streaming Python script (
stream.py) that offers an accessible solution off of which you can build.
In this guide, the audio from a Zoom conference call will be streamed to a local server. We will fork the stream to our Python script, which will send the audio to Deepgram, then receive and print transcriptions to the screen. In a real implementation, you will likely want to modify the script to provide a callback URL to which transcriptions can be sent.
Before You Begin
Create a Deepgram Account
Before you can use Deepgram products, you'll need to create a Deepgram account. Signup is free and includes:
$200 in credit, which gives you access to:
- all base models
- pre-recorded and streaming functionality
- all features
Create a Deepgram API Key
To access Deepgram’s API, you'll need to create a Deepgram API Key. Make note of your API Key; you will need it later.
Create a Zoom Pro Account
Before you can use Zoom’s live streaming, you'll need to create a Zoom Pro account and enable livestreaming for meetings and webinars. Make sure to allow streaming to a Custom Live Streaming Service.
To use this solution, you will need to either set up a publicly-available hosted environment with a service like Amazon Web Services or expose a local server to the world with ngrok.
In this guide, we use the RTMP-HLS Docker image, which creates a video streaming server that supports Real-Time Messaging Protocol (RTMP), HTTP Live Streaming (HLS), and Dynamic Adaptive Streaming over HTTP (DASH) streams. This host uses RTMP proper, which works on top of Transmission Control Protocol (TCP) and uses port 1935 by default. Make sure your firewall exposes this port. Also, if you're running
ngrok, make sure you're running it over TCP.
Install Environment Dependencies
Because Zoom supports <<glossary:Real-Time Messaging Protocol (RTMP)>> streaming, in our hosted environment, we use RTMPDump, a toolkit for RTMP streams.
We provide sample scripts in Python format and assume you have already configured a Python (3.6 or greater) development environment.
Install Development Dependencies
Real-time streaming uses WebSockets, a communications protocol that enables full-duplex communication, which means that you can stream new audio to Deepgram at the same time the latest transcription results are streaming back to you. Using WebSockets is further eased by the wide variety of third-party client libraries that have been written to support a range of languages and production environments.
For Python, we use websockets.
Additional dependencies for Python include scipy (a scientific library we will use to handle WAV files), streamlink (a command-line utility that extracts streams from various services and pipes them into a chosen video player), and requests (a simple HTTP library).
Start the RTMP Server
We recommend running the RTMP server in a Docker container. For this guide, we pulled the RTMP-HLS Docker image, which creates a video streaming server that supports RTMP, HLS, and DASH streams. This host runs on port 1935, so make sure your firewall exposes this port.
docker run -d -p 1935:1935 -p 8080:8080 alqutami/rtmp-hls
Download and Configure the Streaming Script
Next, download our example streaming script (
stream.py) and configure it.
Configure Deepgram Authentication
Prior to running the script, you must replace the authentication with your Deepgram username and password.
On line 17 of
YOUR_DEEPGRAM_API_KEY with the API Key you created earlier in this tutorial:
17 'Authorization': 'Token YOUR_DEEPGRAM_API_KEY'
Set Up Your Zoom Conference Call
Next, you will need to start your Zoom meeting and configure your Zoom live-streaming service:
Start your Zoom meeting and join the meeting with computer audio.
Select More… and then Live on Custom Live Streaming Service.
Configure streaming, and select Go Live!.
|Streaming URL||URL or IP address to which you would like to send audio, plus |
|Streaming key||Unique identifier for your meeting instance. You can supply any unique ID. Make note of the value you enter because you will need it again later.|
|Live streaming page URL||URL to a front-end where users can view the live stream. In this example, we intend to use our |
Send Streaming Results to Deepgram
In this example, to fork audio to Deepgram using our example script (
stream.py), we use RTMPDump. To see how we do this, download the sample shell script (
stream_rtmp.sh) and configure it.
Configure the Zoom Streaming Key
Because you could be streaming multiple instances of Zoom at the same time, the script needs to know from which Zoom instance it should get results.
Let's look at the sample shell script more closely. It contains one line:
rtmpdump -r "rtmp://0.0.0.0:1935/live/"$1 --live -o - | python3 stream.py
rtmpdump makes a connection to a specific stream on the specified RTMP server and directs the media content of the stream to our example streaming script (
stream.py) for display in your terminal. Parameters include:
|−r url||URL of the server and media content. Should be in the form |
|−v||Specifies that the media is a live stream. You may not resume or seek in live streams.|
|−o output||Specifies the output file name. In this case, the output is piped to our example streaming script (|
Run the Script
To run the script, from the command line, use the following command:
source stream_rtmp.sh keyname
keyname with the Streaming key you entered in Zoom.
After a brief delay, you should see results of the audio transcription of your livestreaming Zoom call start to appear on your screen.
Speed of returned results depends on both Deepgram and Zoom availability, and the setup of your hosting environment.
When analyzing results, red text represents interim transcripts, while green text represents final transcripts. To learn more about interim and final transcripts, see Interim Results.
Updated 14 days ago