Diarization

Diarize recognizes speaker changes and assigns a speaker to each word in the transcript.

diarize boolean. Default: false

To learn more about diarization and multichannel audio, and to learn when to use Deepgram's Diarization or Multichannel feature, see Understanding when to Use the Multichannel and Diarization Features.

Enable Feature

To enable Diarization, use the following parameter in the query string when you call Deepgram’s /listen endpoint :

diarize=true

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?diarize=true'

🚧

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

📘

For this example, we use an MP3 audio file that contains the beginning of a customer call with Premier Phone Services. If you would like to follow along, you can download it.

When the file is finished processing, you’ll receive a JSON response. Let's look more closely at the words object within the alternatives object within this response.

Pre-Recorded

When using diarization for pre-recorded audio, both speaker and speaker_confidence values will be returned:

...
"alternatives":[
  {
    ...
    "words": [
      {
        "word":"hello",
        "start":15.259043,
        "end":15.338787,
        "confidence":0.9721591,
        "speaker":0,
        "speaker_confidence":0.5853265
      },
    ...
    ]
  }
]

Live Streaming

When using diarization for live streaming audio, only the speaker value will be returned:

...
"alternatives":[
  {
    ...
    "words": [
      {
        "word":"hello",
        "start":15.259043,
        "end":15.338787,
        "confidence":0.9721591,
        "speaker":0
      },
    ...
    ]
  }
]

Use the API reference or the API Playground to view the detailed response.

Format Response

To improve readability, you can use a JSON processor to parse the JSON. In this example, we use JQ and further improve readability by turning on Deepgram’s punctuation and utterances features:

curl \
  --request POST \
  --url 'https://api.deepgram.com/v1/listen?diarize=true&punctuate=true&utterances=true' \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'content-type: audio/mp3' \
  --data-binary @Premier_broken-phone_numbers.mp3 | jq -r ".results.utterances[] | \"[Speaker:\(.speaker)] \(.transcript)\""

🚧

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

When the file is finished processing, you’ll receive the following response:

[Speaker:0] Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes.
[Speaker:0] My name is Beth, and I will be assisting you today. How are you doing?
[Speaker:1] Not too bad. How are you today?
[Speaker:0] I'm doing well. Thank you. May I please have your name?
[Speaker:1] My name is Blake...

Use Cases

An example of a use case for Diarization includes:

  • Customers who have audio with multiple speakers and want to separate audio by speaker.

ℹ️

By default, Deepgram applies its general AI model, which is a good, general purpose model for everyday situations. To learn more about the customization possible with Deepgram's API, check out the Deepgram API Reference.