Measuring Streaming Latency
In general terms, real-time streaming latency is the time delay between when a transfer of data begins and when a system begins processing it.
In mathematical terms, it is the difference between the audio cursor (the number of seconds of audio you have currently submitted; we’ll call this X) and the latest transcript cursor (
duration; we’ll call this Y). Latency is X-Y.
Causes of Latency
Causes of latency include:
- Network/transmission latency: Physical distance and network infrastructure can add significant delays to your messages (both to and from Deepgram).
- Network stack: How long it takes a message to be routed through the operating system and network driver to the network card itself. When an application wants to send a message to another machine, this must happen, and it can be influenced by numerous factors, including computer load, firewalls, and network traffic. Similarly, the receiving machine’s network card must push the message up through the operating system and into the receiving application.
- Serialization/deserialization: How long it takes to interpret a message and convert it into a usable form. To send or receive a message, your client must do this, just as Deepgram’s servers must.
- Transcription latency: How long it takes to process audio. As powerful as they are, Deepgram’s servers still require time to process audio and convert it into usable analytics and transcription results.
Typically, transcription latency is the largest contributor, but non-transcription latencies often compose 10-30% of total latency costs, which is significant when you need immediate transcripts.
Measuring Non-transcription Latency
Measuring non-transcription legacy is a complicated process that relies heavily on your testing computer, its current load, the client-side programming language, network configuration, and physical location. The most traditional tool of the trade is
ping, which sends a special message to a remote server asking it to
For security purposes, Deepgram blocks pings; instead, for a more realistic measurement, you can measure the time it takes to connect to Deepgram’s servers:
In this example, we see that it takes
111 milliseconds to establish a TCP connection to Deepgram.
Measuring Transcription Latency
Transcription latency cannot be measured directly. To calculate transcription latency, you must first calculate total latency and then subtract non-transcription latency.
Calculating Total Latency
To calculate total latency:
- Measure amount of audio submitted. Track the number of seconds of audio you’ve submitted to Deepgram. This represents the audio cursor; we’ll call this X.
- Measure the amount of audio processed. Every time you receive an interim transcript (i.e., containing
"is_final": false) from Deepgram, record the
durationvalue. This is the transcript cursor; we’ll call this Y.
- Find total latency: Subtract Y from X to calculate total latency (X-Y).
To learn more about why you should only include interim transcripts in this calculation, see Understand Interim Transcripts: How do I measure latency with interim results?.
Calculating Transcription Latency
To calculate transcription latency, subtract non-transcription latency from total latency:
Transcription Latency = Total Latency - Non-Transcription Latency
Let’s look at an example. Download our latency Python example script, prepare an audio file (or use our sample WAV file), and run the example code. (If you’re measuring latency for an on-premise Deepgram installation, you’ll need to edit the API endpoint on line 43 to point to your installation’s URL.)
When run, the script submits the identified audio file to Deepgram in real time and prints the total latency to the screen:
Channels = 2, Sample Rate = 48000 Hz, Sample width = 2 bytes, Size = 18540124 bytes Measuring... Audio cursor = 2.580, Transcript cursor = 0.220 Measuring... Audio cursor = 2.580, Transcript cursor = 0.420 Measuring... Audio cursor = 2.580, Transcript cursor = 0.620 Measuring... Audio cursor = 2.580, Transcript cursor = 0.840 ... Min latency: 0.080 Avg latency: 0.674 Max latency: 1.180
In this example, total latency averages
674 milliseconds, while previously we measured non-transcription latency at approximately
111 milliseconds. So in this example, transcription latency is