Diarization

Deepgram’s Diarize feature recognizes speaker changes and assigns a speaker to each word in the transcript.

To learn more about diarization and multichannel audio, and to learn when to use Deepgram's Diarization or Multichannel feature, see Understanding when to Use the Multichannel and Diarization Features.

Use Cases

An example of a use case for Diarization includes:

Customers who use audio with multiple speakers and want transcripts to appear in a more readable format.

Enable Feature

To enable Diarization, when you call Deepgram’s API, add a diarize parameter set to true in the query string:

diarize=true

⚠️

We've recently released an improved version of Diarization (PRE-RECORDED only). Its advanced speaker separation accurately identifies speakers in complex audio streams, reducing errors where two speakers are identified as one. Additionally, the new diarizer can identify and count speakers more accurately, reducing instances where one speaker is split between two labels. The result is more readable transcripts.

With the release of improved diarization, we will be deprecating the diarize_version parameter and will be retiring the old diarizer on April 3, 2023.

Currently, you can call the old diarizer using the below URL: https://api.deepgram.com/v1/listen?tier=enhanced&diarize=true&diarize_version=2021-07-14.0

To access the latest diarizer, all you need to do is add diarize=true to your URL: https://api.deepgram.com/v1/listen?diarize=true

We encourage you to switch to the improved Diarizer as soon as possible to ensure that you are taking advantage of the latest advancements in our technology.

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

ℹ️

Be sure to replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?diarize=true'

Analyze Response

⚠️

For this example, we use an MP3 audio file that contains the beginning of a customer call with Premier Phone Services. If you would like to follow along, you can download it.

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[]
      }
    ]
  }

Let's look more closely at the alternatives object:

...
"alternatives":[
  {
    "transcript": "hello and thank you for calling premier phone service please be aware that this call may be recorded for quality and training purposes my name is beth and i will be assisting you today how are you doing not too bad how are you today i'm doing well thank you may i please have your name my name is blake...",
    "confidence": 0.9917229,
    "words": [
      {"word":"hello","start":15.259043,"end":15.338787,"confidence":0.9721591,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"and","start":15.418532,"end":15.617893,"confidence":0.99877876,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"thank","start":15.617893,"end":15.777383,"confidence":0.99722916,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"you","start":15.777383,"end":15.9368725,"confidence":0.9976786,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"for","start":15.9368725,"end":16.096361,"confidence":0.9996244,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"calling","start":16.096361,"end":16.495085,"confidence":0.9992551,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"premier","start":16.495085,"end":16.73432,"confidence":0.916555,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"phone","start":16.73432,"end":16.973553,"confidence":0.89162886,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"service","start":16.973553,"end":17.372276,"confidence":0.985968,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"please","start":17.571638,"end":17.731129,"confidence":0.99930406,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"be","start":17.731129,"end":17.930489,"confidence":0.9928549,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"aware","start":17.930489,"end":18.08998,"confidence":0.9999629,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"that","start":18.08998,"end":18.209597,"confidence":0.99647444,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"this","start":18.209597,"end":18.408958,"confidence":0.999246,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"call","start":18.408958,"end":18.608318,"confidence":0.999718,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"may","start":18.608318,"end":18.727936,"confidence":0.91868997,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"be","start":18.727936,"end":19.007042,"confidence":0.99454945,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"recorded","start":19.007042,"end":19.286148,"confidence":0.99981517,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"for","start":19.286148,"end":19.445639,"confidence":0.9999294,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"quality","start":19.445639,"end":19.684872,"confidence":0.9993426,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"and","start":19.684872,"end":19.844362,"confidence":0.88343453,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"training","start":19.844362,"end":20.083595,"confidence":0.9999572,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"purposes","start":20.083595,"end":20.562063,"confidence":0.9995696,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"my","start":21.040531,"end":21.160149,"confidence":0.9997398,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"name","start":21.160149,"end":21.319637,"confidence":0.9984106,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"is","start":21.319637,"end":21.399384,"confidence":0.99928075,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"beth","start":21.558872,"end":21.67849,"confidence":0.99869114,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"and","start":21.798107,"end":21.917725,"confidence":0.99258536,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"i","start":21.957596,"end":22.077213,"confidence":0.99329174,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"will","start":22.077213,"end":22.236702,"confidence":0.9709169,"speaker":0,"speaker_confidence":0.5853265},
      {"word":"be","start":22.236702,"end":22.436064,"confidence":0.99613035,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"assisting","start":22.436064,"end":22.755043,"confidence":0.99984825,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"you","start":22.755043,"end":22.994278,"confidence":0.9999144,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"today","start":22.994278,"end":23.113894,"confidence":0.99716824,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"how","start":23.472744,"end":23.55249,"confidence":0.9945734,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"are","start":23.55249,"end":23.672108,"confidence":0.9951912,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"you","start":23.672108,"end":23.791723,"confidence":0.99860495,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"doing","start":23.791723,"end":24.030956,"confidence":0.9998969,"speaker":0,"speaker_confidence":0.45310462},
      {"word":"not","start":25.283688,"end":25.563297,"confidence":0.6391793,"speaker":1,"speaker_confidence":0.57565314},
      {"word":"too","start":25.563297,"end":25.842907,"confidence":0.66280407,"speaker":1,"speaker_confidence":0.57565314},
      {"word":"bad","start":25.842907,"end":26.322235,"confidence":0.8838786,"speaker":1,"speaker_confidence":0.57565314},
      {"word":"how","start":26.482012,"end":26.601845,"confidence":0.998323,"speaker":1,"speaker_confidence":0.57565314},
      {"word":"are","start":26.601845,"end":26.721678,"confidence":0.9984762,"speaker":1,"speaker_confidence":0.57565314},
      {"word":"you","start":26.721678,"end":27.04123,"confidence":0.99331033,"speaker":1,"speaker_confidence":0.57565314},
      {"word":"today","start":27.04123,"end":27.121119,"confidence":0.998047,"speaker":1,"speaker_confidence":0.57565314},
      {"word":"i'm","start":28.079777,"end":28.2795,"confidence":0.9888536,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"doing","start":28.2795,"end":28.519163,"confidence":0.99951184,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"well","start":28.519163,"end":28.599052,"confidence":0.99951184,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"thank","start":28.758827,"end":28.95855,"confidence":0.99407357,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"you","start":28.95855,"end":29.45855,"confidence":0.95705205,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"may","start":29.677544,"end":29.717487,"confidence":0.99993396,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"i","start":29.797375,"end":29.997097,"confidence":0.9842502,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"please","start":29.997097,"end":30.156872,"confidence":0.99816424,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"have","start":30.156872,"end":30.31665,"confidence":0.9994549,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"your","start":30.31665,"end":30.476425,"confidence":0.99891937,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"name","start":30.476425,"end":30.596258,"confidence":0.9955912,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"my","start":31.72929,"end":31.888773,"confidence":0.9984237,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"name","start":31.888773,"end":32.048256,"confidence":0.998847,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"is","start":32.048256,"end":32.28748,"confidence":0.996455,"speaker":0,"speaker_confidence":0.35385597},
      {"word":"blake","start":32.446964,"end":32.686188,"confidence":0.9848967,"speaker":0,"speaker_confidence":0.35385597},
    ...
    ]
  }
]

In this response, we see that each alternative contains:

  • transcript: Transcript for the audio being processed.
  • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, a word confidence value, a speaker identifier, and a speaker confidence value.

ℹ️

By default, Deepgram applies its general AI model, which is a good, general purpose model for everyday situations. To learn more about the customization possible with Deepgram's API, check out the Deepgram API Reference.

Format Response

To improve readability, you can use a JSON processor to parse the JSON. In this example, we use JQ and further improve readability by turning on Deepgram’s punctuation and utterances features:

ℹ️

Be sure to replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console.

curl \
  --request POST \
  --url 'https://api.deepgram.com/v1/listen?diarize=true&punctuate=true&utterances=true' \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'content-type: audio/mp3' \
  --data-binary @Premier_broken-phone_numbers.mp3 | jq -r '.results.utterances[] | "[Speaker:\(.speaker)] \(.transcript)"'

When the file is finished processing, you’ll receive the following response:

[Speaker:0] Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes.
[Speaker:0] My name is Beth, and I will be assisting you today. How are you doing?
[Speaker:1] Not too bad. How are you today?
[Speaker:0] I'm doing well. Thank you. May I please have your name?
[Speaker:1] My name is Blake...