Entity Detection

Entity Detection identifies and extracts key entities from content in submitted audio.

detect_entities boolean. Default: false

Entity Detection

When Entity Detection is enabled, the Punctuation feature will be enabled by default.

Enable Feature

To enable Entity Detection, when you call Deepgram’s API, add a detect_entities parameter set to true in the query string:

detect_entities=true&punctuate=true

When Entity Detection is enabled, Punctuation will also be enabled by default.

To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.

cURL
$curl \
> --request POST \
> --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
> --header 'Content-Type: audio/wav' \
> --data-binary @youraudio.wav \
> --url 'https://api.deepgram.com/v1/listen?detect_entities=true&punctuate=true'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

JSON
1{
2 "metadata": {
3 "transaction_key": "string",
4 "request_id": "string",
5 "sha256": "string",
6 "created": "string",
7 "duration": 0,
8 "channels": 0
9 },
10 "results": {
11 "channels": [
12 {
13 "alternatives":[],
14 }
15 ]
16 }

Let’s look more closely at the alternatives object:

JSON
1"alternatives":[
2 {
3 "transcript":"Welcome to the Ai show. I'm Scott Stephenson, cofounder of Deepgram...",
4 "confidence":0.9816771,
5 "words": [...],
6 "entities":[
7 {
8 "label":"NAME",
9 "value":" Scott Stephenson",
10 "confidence":0.9999924,
11 "start_word":6,
12 "end_word":8
13 },
14 {
15 "label":"ORG",
16 "value":" Deepgram",
17 "confidence":0.9999757,
18 "start_word":10,
19 "end_word":11
20 },
21 {
22 "label": "CARDINAL_NUM",
23 "value": "one",
24 "confidence": 1,
25 "start_word": 186,
26 "end_word": 187
27 },
28 ...
29 ]
30 }
31]

In this response, we see that each alternative contains:

  • transcript: Transcript for the audio being processed.
  • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
  • entities: Object containing the information about entities for the audio being processed.

And we see that each entities object contains:

  • label: Type of entity identified.
  • value: Text of entity identified.
  • confidence: Floating point value between 0 and 1 that indicates entity reliability. Larger values indicate higher confidence.
  • start_word: Location of the first character of the first word in the section of audio being inspected for entities.
  • end_word: Location of the first character of the last word in the section of audio being inspected for entities.

All entities are available in English.

Identifiable Entities

View all options here: Supported Entity Types

Use Cases

Some examples of uses for Entity Detection include:

  • Customers who want to improve Conversational AI and Voice Assistant by triggering particular workflows and responses based on identified name, address, location, and other key entities.
  • Customers who want to enhance customer service and user experience by extracting meaningful and relevant information about key entities such as a person, organization, email, and phone number.
  • Customers who want to derive meaningful and actionable insights from the audio data based on identified entities in conversations.
Built with