Speaker Diarization | Deepgram's Docs

Try this feature out in our API Playground.

diarize boolean Default: false

Pre-recorded Streaming All available languages

Enable Feature

To enable Diarization, use the following parameter in the query string when you call Deepgram’s /listen endpoint :

diarize=true

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

cURL

$ curl \
>   --request POST \
>   --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
>   --header 'Content-Type: audio/wav' \
>   --data-binary @youraudio.wav \
>   --url 'https://api.deepgram.com/v1/listen?diarize=true'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

For this example, we use an MP3 audio file that contains the beginning of a customer call with Premier Phone Services. If you would like to follow along, you can download it.

When the file is finished processing, you’ll receive a JSON response. Let’s look more closely at the words object within the alternatives object within this response.

Pre-Recorded

When using diarization for pre-recorded audio, both speaker and speaker_confidence values will be returned:

JSON

1 ...
2 "alternatives":[
3   {
4     ...
5     "words": [
6       {
7         "word":"hello",
8         "start":15.259043,
9         "end":15.338787,
10         "confidence":0.9721591,
11         "speaker":0,
12         "speaker_confidence":0.5853265
13       },
14     ...
15     ]
16   }
17 ]

Live Streaming

When using diarization for live streaming audio, only the speaker value will be returned:

JSON

1 ...
2 "alternatives":[
3   {
4     ...
5     "words": [
6       {
7         "word":"hello",
8         "start":15.259043,
9         "end":15.338787,
10         "confidence":0.9721591,
11         "speaker":0
12       },
13     ...
14     ]
15   }
16 ]

Format Response

To improve readability, you can use a JSON processor to parse the JSON. In this example, we use JQ and further improve readability by turning on Deepgram’s punctuation and utterances features:

cURL

$ curl \
>   --request POST \
>   --url 'https://api.deepgram.com/v1/listen?diarize=true&punctuate=true&utterances=true' \
>   --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
>   --header 'content-type: audio/mp3' \
>   --data-binary @Premier_broken-phone_numbers.mp3 | jq -r ".results.utterances[] | \"[Speaker:\(.speaker)] \(.transcript)\""

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

When the file is finished processing, you’ll receive the following response:

[Speaker:0] Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes.
[Speaker:0] My name is Beth, and I will be assisting you today. How are you doing?
[Speaker:1] Not too bad. How are you today?
[Speaker:0] I'm doing well. Thank you. May I please have your name?
[Speaker:1] My name is Blake...

To learn more about when to use Deepgram’s Diarization or Multichannel feature, see When to Use the Multichannel and Diarization Features.

What’s Next

Understanding When to Use the Multichannel and Diarization Features