JavaScript SDK
Framework-agnostic client for connecting to Deepgram’s Voice Agent API from any browser JavaScript environment.
Framework-agnostic client for connecting to Deepgram’s Voice Agent API from any browser JavaScript environment.
This page covers the core JavaScript SDK, which works with vanilla JS, Vue, Svelte, Angular, and any other framework. If you are using React, see React Hooks and Provider. For a drop-in embeddable solution, see Widget.
Connect to a Deepgram voice agent, capture microphone audio, and play back the agent’s speech:
Replace YOUR_AGENT_ID with a Reusable Agent Configuration UUID, or pass an inline agent config object instead. See Agent Configuration for both patterns.
The core class that manages the WebSocket connection to the Voice Agent API. It handles authentication, automatic reconnection with exponential backoff, keep-alive pings, and audio buffering before the server acknowledges settings.
Auth options:
Agent options:
The agent field accepts either a string agent ID from the Deepgram Console or an inline AgentSettingsObject with listen, think, and speak configuration.
Audio options:
Reconnection options:
The state property reflects the current connection lifecycle:
idle — session created but connect() has not been called.connecting — WebSocket connection attempt in progress.connected — WebSocket is open and the session is active.reconnecting — connection lost; attempting to reconnect with backoff.disconnected — connection closed and no further reconnection attempts will be made.connect(): Promise<void> — establish the WebSocket connection to the agent. Resolves when the socket is open. The SDK automatically sends a Settings message after receiving the Welcome event.
disconnect(): void — close the connection and cancel any pending reconnection attempts.
sendAudio(data: ArrayBuffer): void — send a PCM audio frame to the agent. Frames sent before the server fires settings-applied are queued internally and flushed automatically once the agent is ready. This prevents dropped frames during connection setup.
injectUserMessage(content: string): void — inject a text message into the conversation as the user.
injectAgentMessage(message: string): void — inject a text message into the conversation as the agent.
updatePrompt(prompt: string): void — change the agent’s system prompt mid-session.
updateSpeak(settings): void — change TTS settings mid-session. Accepts a SpeakSettings object or an array.
updateThink(settings): void — change LLM settings mid-session. Accepts a ThinkSettings object or an array.
sendFunctionCallResponse(id, name, content): void — respond to a function call request from the agent. See Function Calls for a complete example.
getId(): string | null — returns the session ID assigned by the server. Available after the welcome event fires. Returns null before connection.
Subscribe with session.on(event, callback) and unsubscribe with session.off(event, callback).
Connection lifecycle:
Agent protocol:
Audio:
Errors:
Captures PCM audio from the user’s microphone using the Web Audio API. Optionally integrates Silero VAD for voice activity detection so audio is only transmitted during speech.
The first argument is a callback invoked with each captured audio frame as an ArrayBuffer. Pass session.sendAudio to stream audio directly to the agent.
VAD options (when vad is an object):
VAD requires the optional peer dependencies @ricky0123/vad-web and onnxruntime-web. Install them separately: npm install @ricky0123/vad-web onnxruntime-web.
start(): Promise<void> — request microphone permission via getUserMedia and begin capturing audio frames.
stop(): void — stop capturing audio, disconnect the audio worklet, and release the microphone device.
mute(): void — pause audio transmission without releasing the microphone. The device remains active but frames are not forwarded.
unmute(): void — resume audio transmission after a mute() call.
muted: boolean — read-only property indicating whether the microphone is currently muted.
getInputVolume(): number — returns the current RMS input volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.
getInputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum or waveform visualizations.
Subscribe with mic.on(event, callback) and unsubscribe with mic.off(event, callback).
Decodes and plays PCM audio received from the agent. Maintains a playback queue with interrupt support for barge-in, and exposes volume and frequency analysis APIs for building custom visualizations.
queue(data: ArrayBuffer): void — decode a PCM audio buffer and add it to the playback queue. Buffers are played in order with sample-accurate scheduling.
interrupt(): void — immediately stop playback and discard all queued audio. Use when the user starts speaking to enable barge-in.
getRemainingPlaybackTime(): number — returns the number of seconds of audio still queued for playback. Returns 0 when idle. Useful for delaying mode transitions until the agent finishes speaking.
mute(): void — stop playback and discard queued audio. Subsequent calls to queue() are silently dropped until unmute() is called.
unmute(): void — resume accepting and playing queued audio.
muted: boolean — read-only property indicating whether the player is currently muted.
setVolume(volume: number): void — set the playback volume as a value between 0 and 1. Default: 1.
volume: number — read-only property returning the current playback volume (0—1).
getOutputVolume(): number — returns the current RMS output volume level as a value between 0 and 1. Call per animation frame for real-time visualizations.
getOutputByteFrequencyData(): Uint8Array — returns frequency domain data as a Uint8Array with values 0—255. Use for spectrum visualizations of the agent’s speech.
dispose(): void — close the underlying AudioContext and free resources. Call when the session is no longer needed.
The agent can request client-side function calls during a conversation. Listen for the function-call-request event, execute the requested function, and respond with the result using sendFunctionCallResponse.
Each item in the functions array contains:
The agent pauses until it receives a response for each requested function call.
The SDK caches authentication tokens internally. When you provide a tokenFactory in the auth configuration, the AgentSession wraps it in a caching layer that avoids requesting a new token on every reconnect. Tokens are cached for 4 minutes by default, which is safe for Deepgram’s 5-minute short-lived keys. The cache is automatically invalidated before each reconnection attempt to ensure a fresh token is available.
No additional setup is required. The caching behavior is automatic and transparent.