Diarization
Diarize recognizes speaker changes and assigns a speaker to each word in the transcript.
To learn more about diarization and multichannel audio, and to learn when to use Deepgram's Diarization or Multichannel feature, see Understanding when to Use the Multichannel and Diarization Features.
Enable Feature
To enable Diarization, use the following parameter in the query string when you call Deepgram’s /listen
endpoint :
diarize=true
To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.
curl \
--request POST \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'Content-Type: audio/wav' \
--data-binary @youraudio.wav \
--url 'https://api.deepgram.com/v1/listen?diarize=true'
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
Analyze Response
For this example, we use an MP3 audio file that contains the beginning of a customer call with Premier Phone Services. If you would like to follow along, you can download it.
When the file is finished processing, you’ll receive a JSON response. Let's look more closely at the words
object within the alternatives
object within this response.
Pre-Recorded
When using diarization for pre-recorded audio, both speaker
and speaker_confidence
values will be returned:
...
"alternatives":[
{
...
"words": [
{
"word":"hello",
"start":15.259043,
"end":15.338787,
"confidence":0.9721591,
"speaker":0,
"speaker_confidence":0.5853265
},
...
]
}
]
Live Streaming
When using diarization for live streaming audio, only the speaker
value will be returned:
...
"alternatives":[
{
...
"words": [
{
"word":"hello",
"start":15.259043,
"end":15.338787,
"confidence":0.9721591,
"speaker":0
},
...
]
}
]
Use the API reference or the API Playground to view the detailed response.
Format Response
To improve readability, you can use a JSON processor to parse the JSON. In this example, we use JQ and further improve readability by turning on Deepgram’s punctuation and utterances features:
curl \
--request POST \
--url 'https://api.deepgram.com/v1/listen?diarize=true&punctuate=true&utterances=true' \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'content-type: audio/mp3' \
--data-binary @Premier_broken-phone_numbers.mp3 | jq -r ".results.utterances[] | \"[Speaker:\(.speaker)] \(.transcript)\""
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
When the file is finished processing, you’ll receive the following response:
[Speaker:0] Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes.
[Speaker:0] My name is Beth, and I will be assisting you today. How are you doing?
[Speaker:1] Not too bad. How are you today?
[Speaker:0] I'm doing well. Thank you. May I please have your name?
[Speaker:1] My name is Blake...
Use Cases
An example of a use case for Diarization includes:
- Customers who have audio with multiple speakers and want to separate audio by speaker.
By default, Deepgram applies its general AI model, which is a good, general purpose model for everyday situations. To learn more about the customization possible with Deepgram's API, check out the Deepgram API Reference.
Updated 23 days ago