Prompting Voice Agents
A good system prompt turns a Voice Agent from a generic LLM into an assistant with a clear identity, a defined job, and predictable behavior on a live call. This guide walks through how to write one and where each piece lives in your Voice Agent configuration.
Who this guide is for
This guide is for anyone writing or editing a Voice Agent prompt: conversation designers, support leads, product managers, and engineers. It assumes you have a working agent and you’re shaping how it behaves on a call.
If you don’t have one running yet, the fastest path is the Deepgram playground — paste any prompt from this guide and talk to the agent in your browser. Or stand up your own agent with the Voice Agent quickstart.
Why voice prompting is different from text prompting
Every word the model produces will be read aloud by a TTS engine, in real time, to a human who will interrupt, mishear, multitask, and hang up. That changes the rules in four ways.
No formatting. TTS reads characters literally — **important** becomes “star star important star star.” No markdown, bullet lists, headers, or code blocks.
Turns are short. One or two sentences per turn. Long responses increase time-to-first-word and get interrupted.
Data has to be written for the ear. Order IDs, phone numbers, dates, and emails all need to be spoken in a form humans can follow.
The prompt sits inside a pipeline. STT can mishear, TTS can mispronounce, the LLM can hallucinate. A strong prompt accounts for all three.
Where your prompt lives
Your prompt is the agent.think.prompt field on the Settings message. See Configure Voice Agent for the full schema and Settings for per-field reference.
Two things worth knowing up front: the prompt is capped at 25,000 characters with a managed LLM (no cap if you bring your own endpoint). Tighter prompts give you lower latency, so treat the cap as a ceiling, not a target.
You can also update the prompt mid-conversation using the Update Prompt message, so you don’t have to cram every phase of a long call into a single static prompt.
What the prompt can and can’t enforce
No amount of prompt engineering reliably prevents every unwanted LLM behavior. You can write “produce ONLY an empty string when inactive” and the model may still say “I’m in standby mode.” You can write “never say ‘Let me check’” and it will anyway.
Think of the prompt as policy and your server code as enforcement. For anything reliability-critical — suppressing output when the agent should be silent, gating audio during function calls, enforcing turn-taking — you need server-side controls (like output gates that drop ConversationText and audio bytes), not just prompt instructions.
Rule of thumb: if you’re writing increasingly emphatic rules (“ABSOLUTELY NEVER,” “UNDER NO CIRCUMSTANCES”) and the model still violates them, move the enforcement to server code.
The ten sections of a strong voice agent prompt
A good prompt is built from ten sections. Not every agent needs all ten — an agent without function calling can skip section 8 — but the ones you skip tend to show up as bugs in production.
1. Role and identity
Who the agent is, who it represents, and — critically — what a successful call looks like. Most prompts skip the third. Adding a goal statement gives the model something to measure itself against. Name the agent, name the company, state the job, state the win condition.
2. Personality and tone
The agent’s voice and style. Define it once — don’t repeat tone instructions in other sections; it dilutes them and eats characters. Two or three adjectives plus a “this, not that” contrast is enough.
3. Environment and channel
What the agent can assume about the call: phone or web widget, expected audio quality, whether the caller is in a quiet room or a moving car. These assumptions shape behavior — a phone agent should confirm spellings more aggressively than a web-widget agent because mobile audio drops consonants. Giving the model this context up front saves you from writing dozens of individual rules for failure modes.
4. About your callers
Who typically calls, what they want, and what emotional state they’re in. This grounds the agent and prevents generic-assistant drift. A pharmacy refill line (callers skew older, sometimes frustrated by hold times) and a dev tools support line (callers are technical, impatient) should behave differently even if their scope is similar.
5. Scope — what the agent can and can’t help with
Two halves: what the agent does, and what it doesn’t do plus what to do instead. Out-of-scope without a fallback is a dead end. Don’t just say “don’t answer billing questions” — say “if asked about billing, say: I can’t help with billing from here, but I can transfer you to the billing team.” The fallback is what makes the rule work on a live call.
6. Conversational approach
How the agent moves through a call. For simple agents, a short paragraph on pacing and confirmation is enough. For complex agents, describe an explicit flow — greeting → intake → verification → resolution → wrap-up — with the goal of each state and conditions for advancing.
For branching flows, consider using the Update Prompt message to swap prompts mid-call instead of describing all branches at once. It’s cheaper on latency and easier to debug.
7. Speaking style rules
The most voice-specific section and the one that pays back the most on production quality. Cover: target sentence length, the hard ban on markdown in output, how to read numbers/dates/IDs aloud, how to handle interruptions, silence, and background noise, and when to use brief acknowledgments.
Some additional best practices:
- For normal responses, don’t use stalling phrases like “one moment” or “let me check.” The agent should write as if the information is already in front of it — latency is the stall. Function calls are the exception — see section 8.
- Don’t try to control pausing with
[pause]in the prompt — TTS reads brackets literally. See TTS Models for what TTS can and can’t do on its own.
8. Function calling
If your agent uses functions, the prompt needs to tell the model when to call them and what to do with the results. The function schemas themselves live in agent.think.functions (see Function Calling), not in the prompt.
Function descriptions matter enormously. A description like “respond naturally to the query field” can cause the model to skip the function and generate a text answer. Change it to “call this to retrieve all relevant records for the query, then respond with the combined results” and the behavior changes immediately. Use action-oriented verbs: “call this to…,” “retrieve…,” “look up…” — not “returns…” or “provides…”
Write explicit triggers in the prompt. Don’t rely on the function description alone: “When the caller gives you a confirmation code, call lookup_appointment.” This is especially important with multiple functions.
Pre-function narration is a real problem. The model may generate text like “Let me check the weather systems…” before the function executes, and TTS reads it aloud. You can prompt against this (“do NOT narrate what you’re about to do — just call the function”), but server-side suppression of pre-function text output is the reliable fix.
Fill the silence during function execution. Function calls create real latency and the caller hears nothing. Use InjectAgentMessage to have your server inject filler (“One moment while I pull that up”) while the function runs. This is server-injected, not LLM-generated, so it’s reliable. Gotcha: make sure your server-injected filler phrases don’t overlap with your “never say this” list in the prompt — having “Let me check” in both makes debugging very confusing.
Multi-function chaining is unreliable on smaller models. A prompt that says “call ALL FOUR functions” may result in a smaller model calling one and stopping. This isn’t a prompt problem — it’s a model capability limitation. Test with the specific model you plan to deploy.
Model strings on Deepgram may differ from provider docs. For example, claude-4-5-haiku-latest may return INVALID_SETTINGS - model not available while claude-haiku-4-5 works. Always test your exact model string before building around it. See LLM Models for available models.
9. Boundaries and escalation
Two parts doing different jobs. Guardrails are hard don’ts: topics to refuse, data never to invent, promises not to make. Escalation is exit ramps: handoff to a human, voicemail, abusive callers, repeated misunderstandings, and emergencies. Each exit ramp should name the trigger and the action. Keeping them separate makes each easier to test and update.
10. Terminology and pronunciation
Two sub-parts. The pronunciation guide tells TTS how to say tricky words (provider names, drugs, acronyms) with phonetic hints. See TTS Models for available providers and voices. The glossary tells the LLM how to understand domain terms callers use — synonyms, jargon, abbreviations. If callers say “the new patient form” when they mean “intake paperwork,” the model should know they’re the same thing.
Two complete examples
Example 1: Appointment scheduling (no function calling)
A full prompt for a healthcare scheduling agent, based on the public demo agent with an explicit goal, Environment block, escalation path, and pronunciation section added.
The “CRITICAL” block at the top is the speaking style section (section 7). It’s first because TTS-breaking output is the biggest quality risk. The remaining sections map directly to the ten-section framework. This agent doesn’t use function calling, so section 8 is skipped. Note what’s not in the prompt: the greeting (agent.greeting), the LLM choice (agent.think.provider), and the voice (agent.speak.provider).
Example 2: Weather alert agent (with function calling)
A weather information agent that uses function calling. The prompt references when to call functions and what to do with results; the function schemas live in agent.think.functions.
The function definitions live in agent.think.functions:
Notice the action-oriented descriptions: “Call this to get…” rather than “Returns weather data for a location.” This significantly improves function-calling reliability — passive descriptions can cause the model to generate a text response instead of calling the function.
Iterating on your prompt
- Listen to recordings, not the prompt on a page. Most production problems come from things that read fine but sound wrong — numbers read as digits, tone drift, stall words the model produced despite instructions.
- Change one section at a time. If you edit personality and scope in the same pass, you won’t know which caused the regression.
- Keep a changelog at the top of the prompt as a comment block.
- Budget characters from the start. It’s easier to grow into the 25,000-character cap than to cut back under it.
- Use Update Prompt for behavior that shifts mid-call. One prompt for intake, another for resolution, swapped when intent is confirmed.
What doesn’t belong in the prompt
The most common prompt bug is trying to control things that should be controlled from outside the prompt.
- Greeting →
agent.greeting(spoken by TTS directly, not LLM-generated). See Settings. - Conversation history →
agent.context. Don’t reconstruct prior turns in the system prompt. - Function schemas →
agent.think.functions. The prompt says when to call; the schema defines how. See Function Calling. - LLM selection →
agent.think.provider. See LLM Models. - TTS voice and rate →
agent.speak.provider. See TTS Models. - Language and multilingual → config, not prompt. See Multilingual Voice Agent.
Common pitfalls
- Repeating tone guidance in every section. Define it once; trust the model to carry it.
- Writing like it’s a ChatGPT prompt. Markdown and bullet lists read fine on a page and terribly through TTS. If you’re writing
**or-in output templates, stop. - Defining in-scope without out-of-scope. Missing fallbacks are the biggest source of “the agent said something weird” bug reports.
- Forgetting an escalation path. Every agent needs at least one exit ramp to a human.
- Using written-form data without a pronunciation rule. Order IDs, dates, phone numbers all need guidance on how they should sound aloud.
- Over-specifying a state machine until it consumes half the character budget. If your flow is that complex, split it across prompts with Update Prompt.
- Treating the prompt as the only control surface. If the model keeps violating emphatic rules, move enforcement to server code.
Next steps
Voice Agent quickstart · Configure Voice Agent · Update Prompt · Function Calling