Voice Activity Detection (VAD)
Deepgram’s Voice Activity Detection (VAD) feature monitors incoming audio and detects when a sufficiently long pause is detected. By default, the length of time Deepgram uses for voice activity detection (VAD) is 10 ms, but you can configure this value using the
VAD is used when the Endpointing feature is enabled for streaming audio.
Some examples of use cases for VAD include:
- Audio with elderly speakers who pause longer while speaking than the average speaker.
- Audio with differently-abled speakers who pause longer while speaking than the average speaker.
- Audio with speakers who speak with shorter pauses than the average speaker.
To enable VAD, when you call Deepgram’s API, add a
vad_turnoff parameter in the query string and set it to the length of time (in milliseconds) of silence after which Deepgram will decide that a speaker has finished speaking. The default values is 10 ms.
For an example of audio streaming, see Getting Started with Streaming Audio.
To learn about the results, see Endpointing.
Did you find what you were looking for?