Getting Started
An introduction to using Deepgram’s audio intelligence features to analyze audio using Deepgram SDKs.
In this guide, you’ll learn how to analyze audio using Deepgram’s intelligence features: Summarization, Topic Detection, Intent Recognition, and Sentiment Analysis. The code examples use Deepgram’s SDKs.
Before you start, you’ll need to follow the steps in the Make Your First API Request guide to obtain a Deepgram API key, and configure your environment if you are choosing to use a Deepgram SDK.
What is Audio Intelligence?
Deepgram’s Audio Intelligence API lets users send an audio source to Deepgram, and Deepgram will perform one of four types of analysis on the content of the audio after it has been transcribed. Read about each feature in its individual feature guides:
API Playground
First, quickly explore Deepgram Audio Intelligence in our API Playground.
Try this feature out in our API Playground!
Make the Request
A request made using one of the audio intelligence features will follow the same form for each of the features; therefore, this guide will walk you through how to make one request, and you can use the feature(s) of your choice depending on which feature you want to use (Summarization, Topic Detection, Intent Recognition, or Sentiment Analysis).
If you have made a request to transcribe prerecorded audio using Deepgram’s API, then you already know how to make an audio intelligence request. Audio intelligence requests are done exactly the same way!
Choose Your Audio Source
An audio source can be sent to Deepgram as an audio file or as a url of a hosted audio file. These are referred to as a local file request or a remote file request (which is a hosted url such as https://YOUR_FILE_URL.txt
).
Local File Request
This example shows how to analyze a local audio file as your audio source.
Remote File Request
This example shows how to analyze a remote audio file (a URL that hosts your audio file).
Start the Application
Run your application from the terminal.
See Results
Your results will appear in your shell.
Analyze the Response
When the file is finished processing (often after only a few seconds), you’ll receive a JSON response. (Note that some sections are omitted in order to demonstrate relevant properties.)
Following are explanations of each of the example responses. Be sure to click the tabs in the code block above to view the example response for each text analysis feature.
All Audio Intelligence Features
Because Deepgram transcribes the audio before conducting the analysis, you will always see properties related to the transcription response, such as the following.
In the results
object, we see:
transcript
: the transcript for the audio segment being processed.confidence
: a floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.words
: an object containing eachword
in the transcript, along with itsstart
time andend
time (in seconds) from the beginning of the audio stream, and aconfidence
value.
Summarization
In the metadata
object, we see:
summary_info
: information about the model used and the input/output tokens. Summarization pricing is based on the number of input and output tokens. Read more at deepgram.com/pricing.summary
: theshort
property in this object gives you the summary of the audio you requested to be analyzed.
Topic Detection
In the metadata
object, we see:
topics_info
: information about the model used and the input/output tokens. Topic Detection pricing is based on the number of input and output tokens. Read more at deepgram.com/pricing.
In the results
object, we see:
-
topics
(object): contains the data about Topic Detection. -
segments
: each segment object contains a span of text taken from the transcript; thistext
segment is analyzed for its topic. -
topics
(array): a list of topic objects, each containing thetopic
and aconfidence_score
.topic
: Deepgram analyzes the segmented transcript to identify the main topic of each.confidence_score
: a floating point value between 0 and 1 indicating the overall reliability of the analysis.
Intent Recognition
In the metadata
object, we see:
intents_info
: information about the model used and the input/output tokens. Intent Recognition pricing is based on the number of input and output tokens. Read more at deepgram.com/pricing.
In the results
object, we see:
-
intents
(object): contains the data about Intent Recognition. -
segments
: each segment object contains a span of text taken from the transcript; thistext
segment is analyzed for its intent. -
intents
(array): a list of intent objects, each containing theintent
and aconfidence_score
.intent
: Deepgram analyzes the segmented transcript to identify the intent of each.confidence_score
: a floating point value between 0 and 1 indicating the overall reliability of the analysis.
Sentiment Analysis
In the metadata
object, we see:
sentiment_info
: information about the model used and the input/output tokens. Sentiment Analysis pricing is based on the number of input and output tokens. Read more at deepgram.com/pricing.
In the results
object, we see:
sentiments
(object): contains the data about Sentiment Analysis.segments
: each segment object contains a span of text taken from the transcript; these segments of text show when the sentiment shifts throughout the text, and each one is analyzed for its sentiment.sentiment
can bepositive
,negative
, orneutral
.sentiment_score
: a floating point value between -1 and 1 representing the sentiment of the associated span of text, with -1 being the most negative sentiment, and 1 being the most positive sentiment.average
: the average sentiment for the entire transcript.
Limits
Language
At this time, audio analysis features only work for English language transcriptions.
Token Limit
The input token limit is 150K tokens. When that limit is exceeded, a 400
error will be thrown