Summarization V1 (deprecated)

Deprecated Pre-recorded Streaming

📘
V2 of our Summarization offers improved performance in terms of quality, content, and readability of generated summaries. For the best results moving forward, we recommend leveraging V2 of our summarization. You can find API Doc for Summarization V2 here.

Deepgram’s Summarization feature summarizes sections of content in submitted audio and returns these summaries in the JSON response.

When Summarization is enabled, the Punctuation feature will be enabled by default.

Use Cases

Some examples of uses for Summarization include:

Customers who want to reduce manual effort by automatically generating call notes and meeting summaries.
Customers who need to navigate through a large number of calls and analyze important conversations through generated summaries.
Podcast listeners who want to identify interesting conversations through auto-generated, meaningful podcast previews.

Enable Feature

To enable Summarization, when you call Deepgram’s API, add a summarize parameter set to true in the query string:

⚠️
When Summarization is enabled, Punctuation will also be enabled by default.

summarize=true&punctuate=true

To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.

ℹ️
Be sure to replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key. You can create an API Key in the Deepgram Console.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?summarize=true&punctuate=true'

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[],
      }
    ]
  }
}

Let's look more closely at the alternatives object:

"alternatives":[
  {
    "transcript": "This episode is brought to you by levels. Very excited about...",
    "confidence": 0.99107355,
    "words": [],
    "summaries": [
      {
        "summary": "This episode is brought to you by levels. With levels you can see how different foods affect your health with real time feedback. The levels app interprets your glucose data and provides a simple score after you eat a meal.",
        "start_word": 0,
        "end_word": 623
      },
      {
        "summary": "Dr. Ferris is joined by Dr. K, a professor of laboratory medicine and pathology at the University of Washington School of Medicine. Matt K is the founding director of the healthy aging and longevity research Institute.",
        "start_word": 623,
        "end_word": 1227
      },
      ...
    ]
  }
]

In this response, we see that each alternative contains:

transcript: Transcript for the audio being processed.
confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
summaries: Object containing the information about summaries for the audio being processed.

And we see that each summaries object contains:

summary: Summary of the audio section being summarized.
start_word: Index of the first word in the section of audio being summarized.
end_word: Index of the last word in the section of audio being summarized.

ℹ️
By default, Deepgram applies its Base-tier, general AI model, which is a good, general-purpose model for everyday situations. To learn more about the customization possible with Deepgram's API, check out the Deepgram API Reference.