Entity Detection

When Entity Detection is enabled, the Punctuation feature will be enabled by default.

Enable Feature

To enable Entity Detection, when you call Deepgram’s API, add a detect_entities parameter set to true in the query string:

detect_entities=true&punctuate=true

📘
When Entity Detection is enabled, Punctuation will also be enabled by default.

To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?detect_entities=true&punctuate=true'

🚧
Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[],
      }
    ]
  }

Let's look more closely at the alternatives object:

"alternatives":[
  {
    "transcript":"Welcome to the Ai show. I'm Scott Stephenson, cofounder of Deepgram...",
    "confidence":0.9816771,
    "words": [...],
    "entities":[
      {
        "label":"NAME",
        "value":" Scott Stephenson",
        "confidence":0.9999924,
        "start_word":6,
        "end_word":8
      },
      {
        "label":"ORG",
        "value":" Deepgram",
        "confidence":0.9999757,
        "start_word":10,
        "end_word":11
      },
      {
        "label": "CARDINAL_NUM",
        "value": "one",
        "confidence": 1,
        "start_word": 186,
        "end_word": 187
      },
      ...
    ]
  }
]

In this response, we see that each alternative contains:

transcript: Transcript for the audio being processed.
confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
entities: Object containing the information about entities for the audio being processed.

And we see that each entities object contains:

label: Type of entity identified.
value: Text of entity identified.
confidence: Floating point value between 0 and 1 that indicates entity reliability. Larger values indicate higher confidence.
start_word: Location of the first character of the first word in the section of audio being inspected for entities.
end_word: Location of the first character of the last word in the section of audio being inspected for entities.

📘
All entities are available in English.

Identifiable Entities

Entities that our API can identify include the following:

Label	Description	Spoken Example	Written Example
NAME	Person names, including fictional characters	Henry Cavill	Henry Cavill
ORG	Organizations, including companies, government, and non-profit organizations	Bank of Switzerland	Bank of Switzerland
ORG_PROD	Products of organizations	iPhone	iPhone
LOCATION	Geographic locations, including famous buildings and places	Statue of Liberty	Statue of Liberty
LANGUAGE	Human-spoken languages	Spanish	Spanish
GENDER	Genders	female	female
EMAIL	Email strings	duygu at deepgram dot com	[email protected]
URL	URL strings	www dot deepgram dot com	www.deepgram.com
CARDINAL_NUM	Cardinal numbers, including floating points	twenty five	25
ORDINAL_NUM	Ordinal numbers	twenty fifth	25th
DATE	Dates, including weekdays, months, and years	after fifth May	after 5th May
TIME	Times of the day	two pm	14:00 pm
MONEY	Money entities	five thousand euro	5000 Euro
QUANTITY	Amounts with units	five grams	5 g
PERCENT	Percentages	twenty five percent	25%
US_PHONE_NUM	US phone numbers, both international and domestic	five five five three oh seven two	555-3072
PHONE_NUM	General phone numbers	oh five three oh seven two one	0 530 721
LOCATION_ADDRESS	Open address strings	one hundred chestnut street, Chicago, Illinois six oh six one one	100 Chestnut Street, Chicago, IL 60611
LOCATION_CITY	City names	Chicago	Chicago
LOCATION_STATE	State names	Illinois	Illinois
LOCATION_TOWN	Town names	Worcester	Worcester
LOCATION_ZIP	ZIP strings	six six one one one	66111
LOCATION_COUNTRY	Country names	Cambodia	Cambodia
ACCOUNT_NUM	Account numbers	one two eight oh eight six	128086
CARD_NUM	Credit card numbers	four one one one one one one one one one one one one one one one	4111111111111111
CVV_NUM	Credit card verification values	five five five	555
CARD_EXPIRY_DATE	Credit card expiration dates	eleven twenty five	11/25
SSN_NUM	US social security numbers	one one one two two oh oh oh oh	111-22-0000
COMMS_CHAN	Radio channel names	channel eleven	channel 11
NATO_ALPHA	Spelling descriptions with letters and words	t as in Texas	t as in Texas
MULTIPLICATIVE	Multiplicative expressions	three times	three times
SEQUENCE	Sequences of numbers that don't fall into other numeric categories	five five seven	5 5 7
EMOTION	Names of emotions	happiest customer ever	happiest customer ever

Use Cases

Some examples of uses for Entity Detection include:

Customers who want to improve Conversational AI and Voice Assistant by triggering particular workflows and responses based on identified name, address, location, and other key entities.
Customers who want to enhance customer service and user experience by extracting meaningful and relevant information about key entities such as a person, organization, email, and phone number.
Customers who want to derive meaningful and actionable insights from the audio data based on identified entities in conversations.