Diarization
Diarize recognizes speaker changes and assigns a speaker to each word in the transcript.
diarize
boolean Default: false
Try this feature out in our API Playground!
Enable Feature
To enable Diarization, use the following parameter in the query string when you call Deepgram’s /listen
endpoint :
diarize=true
To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.
curl \
--request POST \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'Content-Type: audio/wav' \
--data-binary @youraudio.wav \
--url 'https://api.deepgram.com/v1/listen?diarize=true'
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
Analyze Response
For this example, we use an MP3 audio file that contains the beginning of a customer call with Premier Phone Services. If you would like to follow along, you can download it.
When the file is finished processing, you’ll receive a JSON response. Let's look more closely at the words
object within the alternatives
object within this response.
Pre-Recorded
When using diarization for pre-recorded audio, both speaker
and speaker_confidence
values will be returned:
...
"alternatives":[
{
...
"words": [
{
"word":"hello",
"start":15.259043,
"end":15.338787,
"confidence":0.9721591,
"speaker":0,
"speaker_confidence":0.5853265
},
...
]
}
]
Live Streaming
When using diarization for live streaming audio, only the speaker
value will be returned:
...
"alternatives":[
{
...
"words": [
{
"word":"hello",
"start":15.259043,
"end":15.338787,
"confidence":0.9721591,
"speaker":0
},
...
]
}
]
Use the API reference or the API Playground to view the detailed response.
Format Response
To improve readability, you can use a JSON processor to parse the JSON. In this example, we use JQ and further improve readability by turning on Deepgram’s punctuation and utterances features:
curl \
--request POST \
--url 'https://api.deepgram.com/v1/listen?diarize=true&punctuate=true&utterances=true' \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'content-type: audio/mp3' \
--data-binary @Premier_broken-phone_numbers.mp3 | jq -r ".results.utterances[] | \"[Speaker:\(.speaker)] \(.transcript)\""
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
When the file is finished processing, you’ll receive the following response:
[Speaker:0] Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes.
[Speaker:0] My name is Beth, and I will be assisting you today. How are you doing?
[Speaker:1] Not too bad. How are you today?
[Speaker:0] I'm doing well. Thank you. May I please have your name?
[Speaker:1] My name is Blake...
To learn more about when to use Deepgram's Diarization or Multichannel feature, see When to Use the Multichannel and Diarization Features.
Updated about 2 months ago