Utterances
Utterances segments speech into meaningful semantic units.
utterances
boolean Default: false
Try this feature out in our API Playground!
Deepgram’s Utterances feature allows the chosen model to interact more naturally and effectively with speakers’ spontaneous speech patterns. For example, when humans speak to each other conversationally, they often pause mid-sentence to reformulate their thoughts, or stop and restart a badly-worded sentence.
Enable Feature
To enable utterances, use the following parameter in the query string when you call Deepgram’s /listen
endpoint :
utterances=true
To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.
Replace YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
Analyze Response
When the file is finished processing, you’ll receive a JSON response. Let’s look more closely at the utterances
object:
In this response, we see that each utterance contains:
start
: Start time (in seconds) from the beginning of the audio stream.end
: End time (in seconds) from the beginning of the audio stream.confidence
: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.channel
: Audio channel to which the utterance belongs. When using multichannel audio, utterances are chronologically ordered by channel.transcript
: Transcript for the audio segment being processed.words
: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
Advanced Processing
You may also want to enable speaker diarization, which will detect and identify speakers for utterances, and punctuation.
To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.
Replace YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
When the file is finished processing, you’ll receive a JSON response that has the same basic structure as before. Let’s take a closer look at the new utterances
object:
In this response, notice that the content of transcript
in each utterance is now punctuated, and each word
object in the words
array contains two new parameters:
speaker
: Integer indicating the speaker who is saying the word being processed.punctuated_word
: Word being processed with added punctuation, if any.
To improve readability, you can use a JSON processor to parse the JSON. In this example, we use JQ.
Replace YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
When the file is finished processing, you’ll receive the following response: