Measuring Streaming Latency

In general terms, real-time streaming latency is the time delay between when a transfer of data begins and when a system begins processing it.

In mathematical terms, it is the difference between the audio cursor (the number of seconds of audio you have currently submitted; we’ll call this X) and the latest transcript cursor (start + duration; we’ll call this Y). Latency is X-Y.

Causes of Latency

Causes of latency include:

Network/transmission latency: Physical distance and network infrastructure can add significant delays to your messages (both to and from Deepgram).
Network stack: How long it takes a message to be routed through the operating system and network driver to the network card itself. When an application wants to send a message to another machine, this must happen, and it can be influenced by numerous factors, including computer load, firewalls, and network traffic. Similarly, the receiving machine’s network card must push the message up through the operating system and into the receiving application.
Serialization/deserialization: How long it takes to interpret a message and convert it into a usable form. To send or receive a message, your client must do this, just as Deepgram's servers must.
Transcription latency: How long it takes to process audio. As powerful as they are, Deepgram's servers still require time to process audio and convert it into usable analytics and transcription results.
Buffer size: The amount of audio sent in each streaming message. Too large of a buffer adds built-in delay to getting results while the audio buffers. Too small of a buffer adds a lot of tiny packets, degrading network performance. Streaming buffer sizes should be between 20 milliseconds and 250 milliseconds of audio, with 100 milliseconds often striking a good balance.

Typically, transcription latency is the largest contributor, but non-transcription latencies often compose 10-30% of total latency costs, which is significant when you need immediate transcripts.

Measuring Non-transcription Latency

Measuring non-transcription legacy is a complicated process that relies heavily on your testing computer, its current load, the client-side programming language, network configuration, and physical location. The most traditional tool of the trade is ping, which sends a special message to a remote server asking it to echo back.

For security purposes, Deepgram blocks pings; instead, for a more realistic measurement, you can measure the time it takes to connect to Deepgram's servers:

curl -sSf -w "latency: %{time_connect}\n" -so /dev/null https://api.deepgram.com
# latency: {number in decimal seconds}

In this example, we see that it takes 111 milliseconds to establish a TCP connection to Deepgram.

Measuring Transcription Latency

Transcription latency cannot be measured directly. To calculate transcription latency, you must first calculate total latency and then subtract non-transcription latency.

Calculating Total Latency

To calculate total latency:

Measure amount of audio submitted. Track the number of seconds of audio you’ve submitted to Deepgram. This represents the audio cursor; we’ll call this X.
Measure the amount of audio processed. Every time you receive an interim transcript (i.e., containing "is_final": false) from Deepgram, record the start + duration value. This is the transcript cursor; we’ll call this Y.
Find total latency: Subtract Y from X to calculate total latency (X-Y).

ℹ️
To learn more about why you should only include interim transcripts in this calculation, see Understand Interim Transcripts: How do I measure latency with interim results?.

Calculating Transcription Latency

To calculate transcription latency, subtract non-transcription latency from total latency:

Transcription Latency = Total Latency - Non-Transcription Latency

Let’s look at an example. Download our latency Python example script, prepare an audio file (or use our sample WAV file), and run the example code. (If you're measuring latency for an on-premise Deepgram installation, you'll need to edit the API endpoint on line 43 to point to your installation's URL.)

python3 latency.py -k 'YOUR_DEEPGRAM_API_KEY' /PATH/TO/AUDIO.wav

When run, the script submits the identified audio file to Deepgram in real time and prints the total latency to the screen:

Channels = 2, Sample Rate = 48000 Hz, Sample width = 2 bytes, Size = 18540124 bytes
Measuring... Audio cursor = 2.580, Transcript cursor = 0.220
Measuring... Audio cursor = 2.580, Transcript cursor = 0.420
Measuring... Audio cursor = 2.580, Transcript cursor = 0.620
Measuring... Audio cursor = 2.580, Transcript cursor = 0.840
...
Min latency: 0.080
Avg latency: 0.674
Max latency: 1.180

In this example, total latency averages 674 milliseconds, while previously we measured non-transcription latency at approximately 111 milliseconds. So in this example, transcription latency is 674-111=563 ms.