Entity Detection
Entity Detection identifies and extracts key entities from content in submitted audio.
detect_entities
boolean. Default: false
Entity Detection
When Entity Detection is enabled, the Punctuation feature will be enabled by default.
Enable Feature
To enable Entity Detection, when you call Deepgram’s API, add a detect_entities
parameter set to true
in the query string:
detect_entities=true&punctuate=true
When Entity Detection is enabled, Punctuation will also be enabled by default.
To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.
curl \
--request POST \
--header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
--header 'Content-Type: audio/wav' \
--data-binary @youraudio.wav \
--url 'https://api.deepgram.com/v1/listen?detect_entities=true&punctuate=true'
Replace
YOUR_DEEPGRAM_API_KEY
with your Deepgram API Key.
Analyze Response
When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:
{
"metadata": {
"transaction_key": "string",
"request_id": "string",
"sha256": "string",
"created": "string",
"duration": 0,
"channels": 0
},
"results": {
"channels": [
{
"alternatives":[],
}
]
}
Let's look more closely at the alternatives
object:
"alternatives":[
{
"transcript":"Welcome to the Ai show. I'm Scott Stephenson, cofounder of Deepgram...",
"confidence":0.9816771,
"words": [...],
"entities":[
{
"label":"NAME",
"value":" Scott Stephenson",
"confidence":0.9999924,
"start_word":6,
"end_word":8
},
{
"label":"ORG",
"value":" Deepgram",
"confidence":0.9999757,
"start_word":10,
"end_word":11
},
{
"label": "CARDINAL_NUM",
"value": "one",
"confidence": 1,
"start_word": 186,
"end_word": 187
},
...
]
}
]
In this response, we see that each alternative contains:
transcript
: Transcript for the audio being processed.confidence
: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.words
: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.entities
: Object containing the information about entities for the audio being processed.
And we see that each entities
object contains:
label
: Type of entity identified.value
: Text of entity identified.confidence
: Floating point value between 0 and 1 that indicates entity reliability. Larger values indicate higher confidence.start_word
: Location of the first character of the first word in the section of audio being inspected for entities.end_word
: Location of the first character of the last word in the section of audio being inspected for entities.
All entities are available in English.
Identifiable Entities
View all options here: Supported Entity Types
Use Cases
Some examples of uses for Entity Detection include:
- Customers who want to improve Conversational AI and Voice Assistant by triggering particular workflows and responses based on identified name, address, location, and other key entities.
- Customers who want to enhance customer service and user experience by extracting meaningful and relevant information about key entities such as a person, organization, email, and phone number.
- Customers who want to derive meaningful and actionable insights from the audio data based on identified entities in conversations.
Updated 5 months ago