In general terms, real-time streaming latency is the time delay between when a transfer of data begins and when a system begins processing it.
In mathematical terms, it is the difference between the audio cursor (the number of seconds of audio you have currently submitted; we’ll call this X) and the latest transcript cursor (start
+ duration
; we’ll call this Y). Latency is X-Y.
Causes of latency include:
Typically, transcription latency is the largest contributor, but non-transcription latencies often compose 10-30% of total latency costs, which is significant when you need immediate transcripts.
Measuring non-transcription legacy is a complicated process that relies heavily on your testing computer, its current load, the client-side programming language, network configuration, and physical location. The most traditional tool of the trade is ping
, which sends a special message to a remote server asking it to echo
back.
For security purposes, Deepgram blocks pings; instead, for a more realistic measurement, you can measure the time it takes to connect to Deepgram's servers:
javascript$ curl -sSf -o /dev/null -w '%{time_connect}\n' brain.deepgram.com 0.111473
In this example, we see that it takes 111
milliseconds to establish a TCP connection to Deepgram.
Transcription latency cannot be measured directly. To calculate transcription latency, you must first calculate total latency and then subtract non-transcription latency.
To calculate total latency:
"is_final": false
) from Deepgram, record the start
+ duration
value. This is the transcript cursor; we’ll call this Y.To learn more about why you should only include interim transcripts in this calculation, see Understand Interim Transcripts: How do I measure latency with interim results?.
To calculate transcription latency, subtract non-transcription latency from total latency:
Transcription Latency = Total Latency - Non-Transcription Latency
Let’s look at an example. Download our latency Python example script, prepare an audio file (or use our sample WAV file), and run the example code:
python$ python latency.py -u 'USER:PASS' /PATH/TO/AUDIO.wav
When run, the script submits the identified audio file to Deepgram in real time and prints the total latency to the screen:
jsonChannels = 2, Sample Rate = 48000 Hz, Sample width = 2 bytes, Size = 18540124 bytes Measuring... Audio cursor = 2.580, Transcript cursor = 0.220 Measuring... Audio cursor = 2.580, Transcript cursor = 0.420 Measuring... Audio cursor = 2.580, Transcript cursor = 0.620 Measuring... Audio cursor = 2.580, Transcript cursor = 0.840 ... Min latency: 0.080 Avg latency: 0.674 Max latency: 1.180
In this example, total latency averages 674
milliseconds, while previously we measured non-transcription latency at approximately 111
milliseconds. So in this example, transcription latency is 674
-111
=563
ms.