Speech Started

Speech Started sends a message when the start of speech is detected in live streaming audio.

vad_events boolean.

The Speech Started feature provided by Deepgram offers a solution to detect the start of speech while transcribing live streaming audio.

SpeechStarted leverages the Voice Activity Detector (VAD) to promptly detect the start of speech post-silence. By gauging tonal nuances in human speech, the VAD can effectively differentiate between silent and non-silent audio segments, providing immediate notification of speech detection.

When this feature is enabled, Deepgram will send a message when the onset of speech is detected.

Enable Feature

To enable the SpeechStarted event, include the parameter vad_events=true in your request:

vad_events=true

You'll then begin receiving messages upon speech starting.

Results

The JSON message sent when the start of speech is detected looks similar to this:

{
  "type": "SpeechStarted",
  "channel": [
    0,
    1
  ],
  "timestamp": 9.54
}
  • The type field is always SpeechStarted for this event.
  • The channel field is interpreted as [A,B], where A is the channel index, and B is the total number of channels. The above example is channel 0 of single-channel audio.
  • The timestamp field is the time at which speech was first detected.

🚧

The timestamp is not intended to match precisely with the first word's timestamp in the subsequent transcript, as the ASR and word-timing models are separate from the VAD speech detection.