In general terms, real-time streaming latency is the time delay between when a transfer of data begins and when a system begins processing it.
In mathematical terms, it is the difference between the audio cursor (the number of seconds of audio you have currently submitted; we’ll call this X) and the latest transcript cursor (
duration; we’ll call this Y). Latency is X-Y.
Causes of latency include:
Typically, transcription latency is the largest contributor, but non-transcription latencies often compose 10-30% of total latency costs, which is significant when you need immediate transcripts.
Measuring non-transcription legacy is a complicated process that relies heavily on your testing computer, its current load, the client-side programming language, network configuration, and physical location. The most traditional tool of the trade is
ping, which sends a special message to a remote server asking it to
For security purposes, Deepgram blocks pings; instead, for a more realistic measurement, you can measure the time it takes to connect to Deepgram's servers:
In this example, we see that it takes
111 milliseconds to establish a TCP connection to Deepgram.
Transcription latency cannot be measured directly. To calculate transcription latency, you must first calculate total latency and then subtract non-transcription latency.
To calculate total latency:
"is_final": false) from Deepgram, record the
durationvalue. This is the transcript cursor; we’ll call this Y.
To learn more about why you should only include interim transcripts in this calculation, see Understand Interim Transcripts: How do I measure latency with interim results?.
To calculate transcription latency, subtract non-transcription latency from total latency:
Transcription Latency = Total Latency - Non-Transcription Latency
python$ python latency.py -u 'USER:PASS' /PATH/TO/AUDIO.wav
When run, the script submits the identified audio file to Deepgram in real time and prints the total latency to the screen:
jsonChannels = 2, Sample Rate = 48000 Hz, Sample width = 2 bytes, Size = 18540124 bytes Measuring... Audio cursor = 2.580, Transcript cursor = 0.220 Measuring... Audio cursor = 2.580, Transcript cursor = 0.420 Measuring... Audio cursor = 2.580, Transcript cursor = 0.620 Measuring... Audio cursor = 2.580, Transcript cursor = 0.840 ... Min latency: 0.080 Avg latency: 0.674 Max latency: 1.180
In this example, total latency averages
674 milliseconds, while previously we measured non-transcription latency at approximately
111 milliseconds. So in this example, transcription latency is