1. Documentation
  2. Features
  3. Paragraphs

Paragraphs

PRE-RECORDED

Deepgram's Paragraphs feature splits audio into paragraphs to improve transcript readability. When Paragraphs is enabled, at least one of the following features must also be enabled: Punctuation, Diarization, Multichannel. When the Diarization feature is enabled and multiple speakers are present, paragraphs breaks are influenced by speaker changes. When the Multichannel feature is enabled, paragraphs breaks are influenced by channel changes.

Use Cases

Examples of use cases for Paragraphs include:

  • Customers who require easy transcript formatting.
  • Customers who would like to be able to format speaker summaries more effortlessly.

Enable Feature

To enable Paragraphs, when you call Deepgram’s API, add a paragraphs parameter in the querystring and set it to true:

When Paragraphs is enabled, at least one of the following features must also be enabled: Punctuation, Diarization, Multichannel. In this example, we have chosen Punctuation.

paragraphs=true&punctuate=true

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

Be sure to replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console.

curl
curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?paragraphs=true&punctuate=true'

When Paragraphs is enabled, either the Punctuation, Diarization, or Multichannel feature must also be enabled.

Analyze Response

For this example, we use a WAV audio file that contains an interview about speech analytics. If you would like to follow along, you can download it.

In our terminal, we run the following cURL command:

curl
curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'content-type: application/json' \
  --data '{"url":"https://developers.deepgram.com/data/audio/interview_speech-analytics.wav"}' \
  --url 'https://api.deepgram.com/v1/listen?paragraphs=true&punctuate=true'

If you're following along, be sure to replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console.

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata":{
    "transaction_key":"string",
    "request_id":"string",
    "sha256":"string",
    "created":"string",
    "duration":0,
    "channels":0,
    "models":[],
    "model_info": {}
  },
  "results": {
    "channels": [
      {
        "alternatives":[]
      }
    ]
  }
}

Let's look more closely at the alternatives object:

...
"alternatives": [
  {
    "transcript":"another big problem in the speech analytics space. When customers first bring the software on. Is that they they are blown away by the fact that an engine can monitor hundreds of Kpis. Right? Everything from my new compliance issues to, you know, human human interaction, empathy measurements to upsell aptitude to closing aptitude, They're hundreds literally of Kpis that one to look at. And the speech analytics companies have typically gone to the customer and really bang that trump. We'll get all of these things that we're gonna help you keep an eye on. The reality, however, is that a company even a contact center manager They can't keep track in their brain even if they have a report in front of them. Of that many Kpis. Mh. And frankly, it's overwhelming. So what successful companies do is they bite off no more than they chew at any given time. The reality is is you can only train a call center agent on a maximum of three skills at any given day. Right? And by focusing on focusing on problem areas, for a week for a month depending on how bad things are. And then once you've mastered that skill to take a baseline of of your performance and move on to the next worst skill. Right? Is the way that companies succeed using this product?",
    "confidence":0.9926758,
    "words":[
      {
        "word": "another",
        "start": 0.33959904,
        "end": 0.839599,
        "confidence": 0.9995117,
        "punctuated_word": "another"
      },
      {
        "word": "big",
        "start": 0.89893866,
        "end": 1.3989387,
        "confidence": 0.99902344,
        "punctuated_word": "big"
      },
      {
        "word": "problem",
        "start": 1.6580424,
        "end": 2.1580424,
        "confidence": 0.99853516,
        "punctuated_word": "problem"
      },
      {
        "word": "in",
        "start": 2.257335,
        "end": 2.4171462,
        "confidence": 0.99902344,
        "punctuated_word": "in"
      },
      {
        "word": "the",
        "start": 2.4171462,
        "end": 2.6568632,
        "confidence": 0.8642578,
        "punctuated_word": "the"
      },
      {
        "word": "speech",
        "start": 2.6568632,
        "end": 3.0963442,
        "confidence": 1,
        "punctuated_word": "speech"
      },
      {
        "word": "analytics",
        "start": 3.0963442,
        "end": 3.4559197,
        "confidence": 0.92871094,
        "punctuated_word": "analytics"
      },
      {
        "word": "space",
        "start": 3.4559197,
        "end": 3.7755425,
        "confidence": 0.99853516,
        "punctuated_word": "space."
      },
      ...
    ],
    "paragraphs":{
      "transcript":"\nanother big problem in the speech analytics space. When customers first bring the software on. Is that they they are blown away by the fact that an engine can monitor hundreds of Kpis. Right? Everything from my new compliance issues to, you know, human human interaction, empathy measurements to upsell aptitude to closing aptitude, They're hundreds literally of Kpis that one to look at.\n\nAnd the speech analytics companies have typically gone to the customer and really bang that trump. We'll get all of these things that we're gonna help you keep an eye on. The reality, however, is that a company even a contact center manager They can't keep track in their brain even if they have a report in front of them. Of that many Kpis. Mh.\n\nAnd frankly, it's overwhelming. So what successful companies do is they bite off no more than they chew at any given time. The reality is is you can only train a call center agent on a maximum of three skills at any given day. Right? And by focusing on focusing on problem areas, for a week for a month depending on how bad things are.\n\nAnd then once you've mastered that skill to take a baseline of of your performance and move on to the next worst skill. Right? Is the way that companies succeed using this product?",
      "paragraphs":[
        {
          "sentences": [
            {
              "text": "another big problem in the speech analytics space.",
              "start": 0.33959904,
              "end": 3.7755425
            },
            {
              "text": "When customers first bring the software on.",
              "start": 4.614552,
              "end": 6.7720046
            },
            {
              "text": "Is that they they are blown away by the fact that an engine can monitor hundreds of Kpis.",
              "start": 7.091627,
              "end": 14.648386
            },
            {
              "text": "Right?",
              "start": 15.446236,
              "end": 15.526021
            },
            {
              "text": "Everything from my new compliance issues to, you know, human human interaction, empathy measurements to upsell aptitude to closing aptitude, They're hundreds literally of Kpis that one to look at.",
              "start": 16.26,
              "end": 32.06585
            }
          ],
          "num_words": 64,
          "start": 0.33959904,
          "end": 32.06585
        },
        {
          "sentences": [
            {
              "text": "And the speech analytics companies have typically gone to the customer and really bang that trump.",
              "start": 33.15874,
              "end": 39.021103
            },
            {
              "text": "We'll get all of these things that we're gonna help you keep an eye on.",
              "start": 39.260384,
              "end": 42.09186
            },
            {
              "text": "The reality, however, is that a company even a contact center manager They can't keep track in their brain even if they have a report in front of them.",
              "start": 42.864708,
              "end": 52.005383
            },
            {
              "text": "Of that many Kpis.",
              "start": 52.619083,
              "end": 53.797592
            },
            {
              "text": "Mh.",
              "start": 54.215576,
              "end": 54.415134
            }
          ],
          "num_words": 65,
          "start": 33.15874,
          "end": 54.415134
        },
        {
          "sentences": [
            {
              "text": "And frankly, it's overwhelming.",
              "start": 54.534874,
              "end": 56.272152
            },
            {
              "text": "So what successful companies do is they bite off no more than they chew at any given time.",
              "start": 56.889698,
              "end": 63.072376
            },
            {
              "text": "The reality is is you can only train a call center agent on a maximum of three skills at any given day.",
              "start": 63.370712,
              "end": 70.67546
            },
            {
              "text": "Right?",
              "start": 71.20884,
              "end": 71.70884
            },
            {
              "text": "And by focusing on focusing on problem areas, for a week for a month depending on how bad things are.",
              "start": 73.201065,
              "end": 83.075005
            }
          ],
          "num_words": 65,
          "start": 54.534874,
          "end": 83.075005
        },
        {
          "sentences": [
            {
              "text": "And then once you've mastered that skill to take a baseline of of your performance and move on to the next worst skill.",
              "start": 83.87501,
              "end": 92.01001
            },
            {
              "text": "Right?",
              "start": 92.450005,
              "end": 92.89001
            },
            {
              "text": "Is the way that companies succeed using this product?",
              "start": 93.130005,
              "end": 95.810005
            }
          ],
          "num_words": 33,
          "start": 83.87501,
          "end": 83.075005
        }
      ]
    }
  }
]

In this response, we see that each alternative contains:

  • transcript: Transcript for the audio being processed.
  • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
  • paragraphs: Object containing the information about paragraph divisions for the audio being processed.

And we see that each paragraphs object contains:

  • transcript: Transcript for the audio being processed, including line breaks where the transcript is divided into paragraphs.
  • paragraphs: Object containing sentences in the paragraph. Each nested paragraphs object contains:
    • sentences: Object containing each sentence in the paragraph, along with a count of the number of words in the paragraph, and the start and end times for each paragraph.
    • num_words: Count of the number of words in the paragraph.
    • start: Number of seconds into the audio stream that the paragraph starts.
    • end: Number of seconds into the audio stream that the paragraph ends.

Finally, we see that each sentence object contains:

  • text: Text contained in the sentence.
  • start: Number of seconds into the audio stream that the sentence starts.
  • end: Number of seconds into the audio stream that the sentence ends.

By default, Deepgram applies its Base tier, general AI model, which is a good, general-purpose model for everyday situations. To learn more about the customization possible with Deepgram's API, check out the Deepgram API Reference.

FEEDBACK