JavaScript SDK
Framework-agnostic client for connecting to Deepgram’s Voice Agent API from any browser JavaScript environment.
This page covers the core JavaScript SDK, which works with vanilla JS, Vue, Svelte, Angular, and any other framework. If you are using React, see React Hooks and Provider. For a drop-in embeddable solution, see Widget.
Installation
Usage
Connect to a Deepgram voice agent, capture microphone audio, and play back the agent’s speech:
Replace YOUR_AGENT_ID with a Reusable Agent Configuration UUID, or pass an inline agent config object instead. See Agent Configuration for both patterns.
AgentSession
The core class that manages the WebSocket connection to the Voice Agent API. It handles authentication, automatic reconnection with exponential backoff, keep-alive pings, and audio buffering before the server acknowledges settings.
Configuration
Auth options:
Agent options:
The agent field accepts either a string agent ID from the Deepgram Console or an inline AgentSettingsObject with listen, think, and speak configuration.
Audio options:
Reconnection options:
Connection States
The state property reflects the current connection lifecycle:
idle— session created butconnect()has not been called.connecting— WebSocket connection attempt in progress.connected— WebSocket is open and the session is active.reconnecting— connection lost; attempting to reconnect with backoff.disconnected— connection closed and no further reconnection attempts will be made.
Methods
connect(): Promise<void> — establish the WebSocket connection to the agent. Resolves when the socket is open. The SDK automatically sends a Settings message after receiving the Welcome event.
disconnect(): void — close the connection and cancel any pending reconnection attempts.
sendAudio(data: ArrayBuffer): void — send a PCM audio frame to the agent. Frames sent before the server fires settings-applied are queued internally and flushed automatically once the agent is ready. This prevents dropped frames during connection setup.
injectUserMessage(content: string): void — inject a text message into the conversation as the user.
injectAgentMessage(message: string): void — inject a text message into the conversation as the agent.
updatePrompt(prompt: string): void — change the agent’s system prompt mid-session.
updateSpeak(settings): void — change TTS settings mid-session. Accepts a SpeakSettings object or an array.
updateThink(settings): void — change LLM settings mid-session. Accepts a ThinkSettings object or an array.
sendFunctionCallResponse(id, name, content): void — respond to a function call request from the agent. See Function Calls for a complete example.
getId(): string | null — returns the session ID assigned by the server. Available after the welcome event fires. Returns null before connection.
Events
Subscribe with session.on(event, callback) and unsubscribe with session.off(event, callback).
Connection lifecycle:
Agent protocol:
Audio:
Errors:
AgentMicrophone
Captures PCM audio from the user’s microphone using the Web Audio API. Optionally integrates Silero VAD for voice activity detection so audio is only transmitted during speech.
The first argument is a callback invoked with each captured audio frame as an ArrayBuffer. Pass session.sendAudio to stream audio directly to the agent.
Options
VAD options (when vad is an object):
VAD requires the optional peer dependencies @ricky0123/vad-web and onnxruntime-web. Install them separately: npm install @ricky0123/vad-web onnxruntime-web.
Methods
start(): Promise<void> — request microphone permission via getUserMedia and begin capturing audio frames.
stop(): void — stop capturing audio, disconnect the audio worklet, and release the microphone device.
mute(): void — pause audio transmission without releasing the microphone. The device remains active but frames are not forwarded.
unmute(): void — resume audio transmission after a mute() call.
muted: boolean — read-only property indicating whether the microphone is currently muted.
getInputVolume(): number — returns the current RMS input volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.
getInputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum or waveform visualizations.
Events
Subscribe with mic.on(event, callback) and unsubscribe with mic.off(event, callback).
AgentPlayer
Decodes and plays PCM audio received from the agent. Maintains a playback queue with interrupt support for barge-in, and exposes volume and frequency analysis APIs for building custom visualizations.
Options
Methods
queue(data: ArrayBuffer): void — decode a PCM audio buffer and add it to the playback queue. Buffers are played in order with sample-accurate scheduling.
interrupt(): void — immediately stop playback and discard all queued audio. Use when the user starts speaking to enable barge-in.
getRemainingPlaybackTime(): number — returns the number of seconds of audio still queued for playback. Returns 0 when idle. Useful for delaying mode transitions until the agent finishes speaking.
mute(): void — stop playback and discard queued audio. Subsequent calls to queue() are silently dropped until unmute() is called.
unmute(): void — resume accepting and playing queued audio.
muted: boolean — read-only property indicating whether the player is currently muted.
setVolume(volume: number): void — set the playback volume as a value between 0 and 1. Default: 1.
volume: number — read-only property returning the current playback volume (0—1).
getOutputVolume(): number — returns the current RMS output volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.
getOutputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum visualizations of the agent’s speech.
dispose(): void — close the underlying AudioContext and free resources. Call when the session is no longer needed.
Function Calls
The agent can request client-side function calls during a conversation. Listen for the function-call-request event, execute the requested function, and respond with the result using sendFunctionCallResponse.
Each item in the functions array contains:
The agent pauses until it receives a response for each requested function call.
Token Caching
The SDK caches authentication tokens internally. When you provide a tokenFactory in the auth configuration, the AgentSession wraps it in a caching layer that avoids requesting a new token on every reconnect. Tokens are cached for 4 minutes by default, which is safe for Deepgram’s 5-minute short-lived keys. The cache is automatically invalidated before each reconnection attempt to ensure a fresh token is available.
No additional setup is required. The caching behavior is automatic and transparent.