For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Ask AIPlaygroundLoginFree API Key
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
HomeAPI ReferenceVoice AgentSpeech-to-TextText-to-SpeechIntelligenceSelf-Hosted Deployments
    • Getting Started with Speech to Text
  • Pre-Recorded Audio
    • Getting Started
    • Feature Overview
    • Template Apps
  • Streaming Audio
    • Compare Flux to Nova-3
  • Models and Languages
    • Models & Languages Overview
    • Languages Support
    • Language Detection
    • Multilingual Codeswitching
    • Model Options
    • Version
  • Formatting
    • Speaker Diarization
    • Dictation
    • Filler Words
    • Measurements
    • Numerals
    • Paragraphs
    • Profanity Filtering
    • Punctuation
    • Redaction
    • Smart Formatting
    • Supported Entity Types
    • Utterances
    • Utterance Split
  • Custom Vocabulary
    • Find and Replace
    • Keyterm Prompting
    • Keywords
    • Search
  • Media Input Settings
    • Channels
    • Encoding
    • Multichannel
    • Sample Rate
  • Results Processing
    • Understanding Word Confidence Scores
    • STT Callback
    • STT Tagging
    • Extra Metadata
  • Migrating
    • Migrating From Amazon Web Services (AWS) Transcribe to Deepgram
    • Migrating From Google Speech-to-Text (STT) to Deepgram
    • Migrating From OpenAI Whisper to Deepgram
    • Migrating from AssemblyAI Speech-to-Text to Deepgram
LogoLogo
Ask AIPlaygroundLoginFree API Key
On this page
  • Enable Feature
  • Search for a Term
  • Search for Multiple Terms
  • Search for a Phrase
  • Analyze Response
Custom Vocabulary

Search

Search searches for terms or phrases in submitted audio.
Was this page helpful?
Previous

Media Input Settings

Media input settings allow you to define the parameters for audio data submitted to for processing.
Next
Built with
Deepgram API Playground
Try this feature out in our API Playground.

search string

Pre-recorded Streaming:NovaStreaming: Flux All available languages

Deepgram’s Search feature searches for terms or phrases by matching acoustic patterns in audio (which we have found is more accurate than matching for text patterns in transcripts) and returns results in the response JSON object. This allows you to accurately identify whether a phrase was uttered in submitted audio by letting Deepgram “hear” whether the phrase was uttered rather than by trying to look for sufficiently close matches in the text transcript.

Because the search feature is looking for phonetic matches, it works best on longer, multisyllabic terms, or even on short to medium-length phrases.

Enable Feature

To enable Search, when you call Deepgram’s API, add a search parameter in the querystring and set it to your chosen search term or phrase:

search=TERM_OR_PHRASE

You can include up to 50 search terms per request.

To transcribe audio from a file on your computer, run the following cURL command in a terminal or your favorite API client.

cURL
$curl \
> --request POST \
> --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
> --header 'Content-Type: audio/wav' \
> --data-binary @youraudio.wav \
> --url 'https://api.deepgram.com/v1/listen?search=TERM_OR_PHRASE'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

Search for a Term

To search for a single term, send one instance of the search parameter in the query string when calling the API:

search=epistemology

Search for Multiple Terms

You can search for multiple terms individually:

search=epistemology&search=warwick

Search for a Phrase

You can search for a phrase. URL-encode the phrase when submitting it.

search=social%20epistemology

Analyze Response

The term “epistemology” in this audio file is sufficiently technical that our model may not transcribe it accurately, but our phonetic search will be able to find it.

In our terminal, we run the following cURL command:

cURL
$curl \
> --request POST \
> --header 'Authorization: Token YOUR_DEEPGRAM_API_KEY' \
> --header 'Content-Type: audio/wav' \
> --data-binary @epistemology.wav \
> --url 'https://api.deepgram.com/v1/listen?search=epistemology'

Replace YOUR_DEEPGRAM_API_KEY with your Deepgram API Key.

When the file is finished processing (often after only a few seconds), you’ll receive a JSON response that has the following basic structure:

JSON
1{
2 "metadata": {
3 "transaction_key": "string",
4 "request_id": "string",
5 "sha256": "string",
6 "created": "string",
7 "duration": 0,
8 "channels": 0
9 },
10 "results": {
11 "channels": [
12 {
13 "search": [],
14 "alternatives": []
15 }
16 ]
17 }
18}

Let’s look more closely at the alternatives object:

JSON
1...
2"alternatives":[
3 {
4 "transcript": "hello this is steve fuller i'm a professor of social epi at the university of warwick and the question before us today is what is a and why is it important epi is the branch philosophy that is concerned with the nature of knowledge",
5 "confidence": 0.9773828,
6 "words":[
7 {"word":"hello","start":1.2560788,"end":1.3358299,"confidence":0.9822957},
8 ...
9 ]
10 }
11]

The audio file contains multiple occurrences of the word “epistemology”. Nova 2 correctly transcribes epistemology each instance correctly.

Now, let’s take a look at the search object:

JSON
1...
2"search": [
3 {
4 "query": "epistemology",
5 "hits": [
6 {"confidence":1,"start":3.9676142,"end":4.7651243,"snippet":"social epistemology"},
7 {"confidence":1,"start":13.805,"end":14.645,"snippet":"epistemology"},
8 {"confidence":0.9782986,"start":10.605,"end":11.565,"snippet":"is epistemology"},
9 {"confidence":0.21875,"start":8.513423,"end":9.625,"snippet":"before us today"},
10 {"confidence":0.074074,"start":15.245,"end":16.005,"snippet":"branch of philosophy"},
11 {"confidence":0.045138836,"start":5.1240044,"end":5.6822615,"snippet":"university of"},
12 {"confidence":0.0234375,"start":17.165,"end":18.115,"snippet":"nature of knowledge"},
13 {"confidence":0,"start":1.1763278,"end":1.8940871,"snippet":"hello this is steve"},
14 {"confidence":0,"start":5.6025105,"end":7.3969088,"snippet":"of warwick and"}
15 ]
16 }
17]
18...

In this response, we see that each search contains:

  • query: Word that has been requested.
  • hits: Object containing each time the search term was found in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

In this part of the response, notice:

  • we have received multiple hits for the word. The first two have high confidence of 1 and the third has a confidence of 97.8%.
  • the results contain the start and end times for each place in the audio where the model heard the word.
  • after the first three hits, there’s a steep decline in the model’s confidence that it heard the requested word, and none of these hits do in fact correspond to the requested word.