Browser Agent SDK
Voice AI for the browser — from a single script tag to a fully custom React application.
The demo above requires a Deepgram account. Sign up free to try it, or keep reading to understand the architecture first.
Choose Your Approach
Start from how much control you need, not which package to install.
“I want a voice agent on my site in five minutes.” Use the Widget. One install, one function call. Pick from six layouts — sidebar, floating, inline, button, embedded, or orb. No framework required.
“I’m building a React app and want pre-built components.” Use React UI Components. Conversation view, animated orb, mic/speaker buttons, text input, and a waveform visualizer — all styled through CSS custom properties that work alongside your existing design system.
“I’m building a React app but want full control over the UI.” Use React Hooks. Provider context and focused hooks for state, conversation history, microphone control, audio playback, and client-side function calling. You build every pixel; the hooks manage every connection.
“I’m using Vue, Svelte, Angular, or vanilla JS.”
Use the JavaScript SDK. The core AgentSession class is framework-agnostic. Pair it with AgentMicrophone for capture and AgentPlayer for playback. Wire the events into whatever UI layer you prefer.
Architecture
Four packages, each building on the one below it. Install only the layer you need — everything above comes with it.
Every layer shares the same connection logic, audio pipeline, and event model. The difference is how much UI you want handled for you.
Each layer pulls in the layer below as a dependency, and re-exports the parts you need from above. Installing @deepgram/ui brings in @deepgram/react and @deepgram/agents automatically and re-exports the hooks, provider, and SDK types — you import everything you need from @deepgram/ui alone. The same pattern applies one layer down: @deepgram/react brings in @deepgram/agents and re-exports its types.
What You Get at Every Layer
- Reconnection with exponential backoff and jitter. Configurable max attempts, base delay, and ceiling. Connections recover without user intervention.
- Playback-aware mode tracking. The SDK knows when audio has actually finished playing in the browser — not just when the server finished sending it. Mode transitions from “speaking” to “listening” wait for the audio queue to drain.
- Audio buffering before settings are applied. Microphone frames captured before the server acknowledges your configuration are queued and flushed automatically.
- Voice Activity Detection. Optional Silero VAD runs client-side for precise speech endpoint detection.
- KeepAlive pings. Automatic heartbeat prevents idle WebSocket disconnects. Interval is configurable.
- Typed event emitter. Every server message —
Welcome,ConversationText,AgentAudioDone,FunctionCallRequest, and more — has a typed event. Subscribe to exactly what you need. - Custom WebSocket URL support. Connect to proxied or self-hosted endpoints by overriding the default Deepgram URL.
What the React Layer Adds
- Client-side function calling scoped to React component lifecycle. Register tool handlers with
useAgentClientTool— they mount and unmount with the component. Dynamic tools are checked first, then the provider falls back toonFunctionCall. - Conversation state management. The
useAgentConversationhook accumulates transcript events into a structured message array with roles, content, and IDs. - Mode awareness. The
useAgentModehook tracks whether the agent is idle, listening, or speaking — with the playback-aware delay described above.
What the UI Layer Adds
- 26 CSS custom properties with
light-dark()adaptive defaults. Works with Tailwind, CSS Modules, plain CSS — anything that can set a custom property. - Canvas 2D visualizations. The orb and waveform components use Canvas 2D, not WebGL. They render well on low-power devices and do not require GPU acceleration.
data-dg-*attribute selectors instead of class names. Your existing CSS framework cannot collide with component styles.
What the Widget Adds
- Six layouts:
sidebar,floating,inline,button,embedded,orb. Each adapts to its container and responds to the host page’s color scheme. - Preact runtime bundled internally. The UMD build is self-contained — no React dependency, no build tooling, no framework assumptions. The ESM build is also available for projects with a bundler.
- Single
init()call returns a teardown function. Mount and unmount cleanly in single-page applications without leaking event listeners or audio contexts. - Full theming support. Pass a theme object to
init()or use CSS custom properties from the host page. The widget inheritslight-dark()behavior automatically.
Authentication
Browser applications must never expose API keys to the client. Use a token factory — a function that returns a short-lived token from your server.
The browser WebSocket constructor does not support custom headers. The SDK works around this by passing the token as a Sec-WebSocket-Protocol value — the only header browsers allow on a WebSocket handshake. This is handled internally; you just return a token string from your factory function.
The token factory is called before every connection and reconnection attempt. Tokens stay fresh even across network interruptions.
The apiKey option exists for local development only. Never ship it in client-side code.
Agent Configuration
Configure the agent by referencing one you created in the Deepgram Console, or define the full configuration inline:
Agent IDs let you change behavior from the console without redeploying your application. Inline configuration gives you version control and the ability to construct prompts dynamically at runtime.
You can combine both approaches — reference an agent ID for base configuration and override specific settings inline. See the individual package guides for details.