Entity Detection

Entity Detection identifies and extracts key entities from content in submitted audio.

detect_entities boolean. Default: false

Entity Detection

When Entity Detection is enabled, the Punctuation feature will be enabled by default.

Enable Feature

To enable Entity Detection, when you call Deepgram’s API, add a detect_entities parameter set to true in the query string:

detect_entities=true&punctuate=true

📘

When Entity Detection is enabled, Punctuation will also be enabled by default.

To transcribe audio from a file on your computer, run the following curl command in a terminal or your favorite API client.

curl \
  --request POST \
  --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
  --header 'Content-Type: audio/wav' \
  --data-binary @youraudio.wav \
  --url 'https://api.deepgram.com/v1/listen?detect_entities=true&punctuate=true'

🚧

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Analyze Response

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

{
  "metadata": {
    "transaction_key": "string",
    "request_id": "string",
    "sha256": "string",
    "created": "string",
    "duration": 0,
    "channels": 0
  },
  "results": {
    "channels": [
      {
        "alternatives":[],
      }
    ]
  }

Let's look more closely at the alternatives object:

"alternatives":[
  {
    "transcript":"Welcome to the Ai show. I'm Scott Stephenson, cofounder of Deepgram...",
    "confidence":0.9816771,
    "words": [...],
    "entities":[
      {
        "label":"NAME",
        "value":" Scott Stephenson",
        "confidence":0.9999924,
        "start_word":6,
        "end_word":8
      },
      {
        "label":"ORG",
        "value":" Deepgram",
        "confidence":0.9999757,
        "start_word":10,
        "end_word":11
      },
      {
        "label": "CARDINAL_NUM",
        "value": "one",
        "confidence": 1,
        "start_word": 186,
        "end_word": 187
      },
      ...
    ]
  }
]

In this response, we see that each alternative contains:

  • transcript: Transcript for the audio being processed.
  • confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.
  • words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.
  • entities: Object containing the information about entities for the audio being processed.

And we see that each entities object contains:

  • label: Type of entity identified.
  • value: Text of entity identified.
  • confidence: Floating point value between 0 and 1 that indicates entity reliability. Larger values indicate higher confidence.
  • start_word: Location of the first character of the first word in the section of audio being inspected for entities.
  • end_word: Location of the first character of the last word in the section of audio being inspected for entities.

📘

All entities are available in English.

Identifiable Entities

Entities that our API can identify include the following:

LabelDescriptionSpoken ExampleWritten Example
NAMEPerson names, including fictional charactersHenry CavillHenry Cavill
ORGOrganizations, including companies, government, and non-profit organizationsBank of SwitzerlandBank of Switzerland
ORG_PRODProducts of organizationsiPhoneiPhone
LOCATIONGeographic locations, including famous buildings and placesStatue of LibertyStatue of Liberty
LANGUAGEHuman-spoken languagesSpanishSpanish
GENDERGendersfemalefemale
EMAILEmail stringsduygu at deepgram dot com[email protected]
URLURL stringswww dot deepgram dot comwww.deepgram.com
CARDINAL_NUMCardinal numbers, including floating pointstwenty five25
ORDINAL_NUMOrdinal numberstwenty fifth25th
DATEDates, including weekdays, months, and yearsafter fifth Mayafter 5th May
TIMETimes of the daytwo pm14:00 pm
MONEYMoney entitiesfive thousand euro5000 Euro
QUANTITYAmounts with unitsfive grams5 g
PERCENTPercentagestwenty five percent25%
US_PHONE_NUMUS phone numbers, both international and domesticfive five five three oh seven two555-3072
PHONE_NUMGeneral phone numbersoh five three oh seven two one0 530 721
LOCATION_ADDRESSOpen address stringsone hundred chestnut street, Chicago, Illinois six oh six one one100 Chestnut Street, Chicago, IL 60611
LOCATION_CITYCity namesChicagoChicago
LOCATION_STATEState namesIllinoisIllinois
LOCATION_TOWNTown namesWorcesterWorcester
LOCATION_ZIPZIP stringssix six one one one66111
LOCATION_COUNTRYCountry namesCambodiaCambodia
ACCOUNT_NUMAccount numbersone two eight oh eight six128086
CARD_NUMCredit card numbersfour one one one one one one one one one one one one one one one4111111111111111
CVV_NUMCredit card verification valuesfive five five555
CARD_EXPIRY_DATECredit card expiration dateseleven twenty five11/25
SSN_NUMUS social security numbersone one one two two oh oh oh oh111-22-0000
COMMS_CHANRadio channel nameschannel elevenchannel 11
NATO_ALPHASpelling descriptions with letters and wordst as in Texast as in Texas
MULTIPLICATIVEMultiplicative expressionsthree timesthree times
SEQUENCESequences of numbers that don't fall into other numeric categoriesfive five seven5 5 7
EMOTIONNames of emotionshappiest customer everhappiest customer ever

Use Cases

Some examples of uses for Entity Detection include:

  • Customers who want to improve Conversational AI and Voice Assistant by triggering particular workflows and responses based on identified name, address, location, and other key entities.
  • Customers who want to enhance customer service and user experience by extracting meaningful and relevant information about key entities such as a person, organization, email, and phone number.
  • Customers who want to derive meaningful and actionable insights from the audio data based on identified entities in conversations.