Prompt engineering for voice AI is fundamentally different from text-based AI. Voice conversations happen in real-time, require natural speech patterns, and must handle interruptions gracefully. This guide covers everything you need to write effective AI receptionist prompts—based on official VAPI and ElevenLabs documentation.
Prompt Engineering Fundamentals
The goal of prompt engineering is to create instructions that guide the AI to produce accurate, relevant responses while maintaining natural conversation flow. Your "success rate" is the percentage of requests your agent handles from start to finish without human intervention.
Voice vs Text: Key Difference
The Design-Test-Refine Process
According to VAPI's official documentation, effective prompt engineering follows a structured iterative process:
- Design: Craft your initial prompt considering the specific task, context, and desired outcome.
- Test: Run the prompt through the AI and evaluate if responses align with expectations.
- Refine: Adjust based on test results—reword, add detail, or change phrasing to avoid ambiguity.
- Repeat: Iterate until the AI's output is accurate and relevant. Success rate should improve each cycle.
Measuring Success
Track these metrics to measure prompt effectiveness:
- First-call resolution: Calls resolved without transfer or callback
- Misroute rate: How often callers end up in the wrong place
- User churn: How often users disengage from the conversation
- Conversation repair attempts: How often the AI needs to clarify or repeat
Prompt Structure & Sections
Well-structured prompts break down into clear sections, each focused on a specific aspect of the AI's behavior. This organization helps the AI understand its role and respond consistently.
Essential Prompt Sections
[Identity] — Define the AI's persona, name, and role. Example: "You are Sarah, a friendly and professional receptionist for Riverside Dental."
[Style] — Set tone, communication style, and personality traits. Be warm and professional, keep responses concise, use a calm reassuring tone.
[Response Guidelines] — Specify formatting, pacing, and structural rules. Ask one question at a time, spell out numbers, confirm important details, keep responses under 2-3 sentences.
[Task] — Outline specific objectives and conversation steps. Define the greeting, intent detection, information collection, and closing sequence.
[Error Handling] — Define fallback behavior for edge cases. What to do when the AI can't understand, when questions are outside its knowledge, when to transfer.
Complete Prompt Example
[Identity]
You are Alex, a friendly and efficient receptionist for Summit Plumbing & HVAC.
You help callers schedule service appointments, answer basic questions about
services, and ensure urgent issues are prioritized appropriately.
[Style]
- Warm, professional, and efficient
- Conversational but not overly casual
- Empathetic when callers describe problems
- Concise—this is a phone conversation, not an email
[Response Guidelines]
- Ask one question at a time, then wait for response
- Spell out times and dates clearly (say "Tuesday, January fourteenth at two PM")
- Confirm critical details by repeating them back
- Never invent information—if unsure, offer to have someone call back
- Keep responses to 1-3 sentences maximum
[Task]
1. Greet caller warmly: "Thanks for calling Summit Plumbing and HVAC, this is Alex. How can I help you today?"
2. Determine if this is: scheduling, emergency, question, or existing appointment
3. For emergencies (no heat, flooding, gas smell): immediately collect address and transfer
4. For scheduling: collect name, phone, address, service type, and preferred timing
5. Confirm all details before ending: "Just to confirm, I have you down for..."
[Error Handling]
If the caller's response is unclear, ask a clarifying question: "I want to make sure I
get this right—did you say Tuesday or Thursday?"
If asked something outside your knowledge, say: "That's a great question. Let me have
one of our technicians call you back with the details. What's the best number to reach you?"Voice Settings (ElevenLabs)
Voice settings control how the AI sounds—not just what it says. These parameters significantly impact caller perception and trust. The following recommendations are based on ElevenLabs' official documentation.
Core Voice Parameters
Stability (0.0 - 1.0) — Controls consistency and emotional variation. Lower values (0.3-0.5) create more dynamic, expressive delivery but may occasionally sound unstable. Higher values (0.6-0.85) produce consistent but potentially monotonous output. Recommended for receptionists: 0.50-0.65
Similarity Boost (0.0 - 1.0) — Determines how closely the AI adheres to the original voice characteristics. Higher values boost clarity and consistency. Very high values may introduce distortion. Recommended: 0.75-0.80
Speed (0.7 - 1.2) — Adjusts speech rate. Natural conversations typically occur at 0.9-1.1x speed. Slow down for complex information like addresses. Recommended: 0.95-1.05
Style Exaggeration (0.0 - 1.0) — Amplifies the style of the original speaker. Consumes additional resources and may increase latency. Keep at 0 for professional receptionist roles.
Recommended Settings by Use Case
| Use Case | Stability | Similarity | Speed | Style |
|---|---|---|---|---|
| Professional Receptionist | 0.55 | 0.75 | 1.0 | 0.0 |
| Medical/Dental Office | 0.50 | 0.75 | 0.95 | 0.0 |
| Legal Services | 0.60 | 0.80 | 0.95 | 0.0 |
| Home Services (HVAC, Plumbing) | 0.55 | 0.75 | 1.0 | 0.05 |
| Restaurant/Hospitality | 0.45 | 0.70 | 1.05 | 0.10 |
ElevenLabs Model Selection
eleven_flash_v2_5 for ultra-low 75ms latency. If quality is more important than speed, use eleven_multilingual_v2 for more nuanced expression.Conversation Flow Design
Conversation flow determines how the AI guides callers through interactions. Good flow design anticipates common paths, handles unexpected inputs, and creates natural turn-taking patterns.
The Acknowledge-Confirm-Prompt Pattern
Every AI response should follow this three-step rhythm, which mirrors natural human conversation:
- Acknowledge: Show you heard them — "Got it..."
- Confirm: Reflect understanding — "...you're looking for a Tuesday appointment..."
- Prompt: Move to next step — "...is morning or afternoon better for you?"
Response Timing Control
VAPI allows you to control when the agent should wait for user response before proceeding. Use the <wait for user response> directive in your conversation flow:
[Conversation Flow]
1. Greet: "Thanks for calling! How can I help you today?"
<wait for user response>
2. If scheduling appointment:
Ask: "Great, I can help with that. What's your name?"
<wait for user response>
3. Ask: "And what phone number should we use to reach you?"
<wait for user response>
4. Ask: "What day works best for you?"
<wait for user response>
5. Confirm: "Perfect. I have [name] scheduled for [day] at [time].
We'll send a confirmation text to [phone]. Is there anything else?"
<wait for user response>
6. Close: "Wonderful, you're all set. Have a great day!"Handling Branching Logic
Real conversations branch based on caller intent. Structure your flow to handle multiple paths without creating dead ends:
[Task]
After greeting, determine caller intent:
IF caller wants to schedule:
→ Proceed to [Scheduling Flow]
IF caller has emergency (no heat, flooding, gas smell):
→ Say: "I understand this is urgent. Let me get your address
and transfer you to our emergency line immediately."
→ Collect address, then trigger transfer tool
IF caller has question about services/pricing:
→ Answer from knowledge base if available
→ If unknown: "That's a great question. Let me have someone
call you back with those details. What's your number?"
IF caller wants to cancel/reschedule:
→ Ask for name and original appointment date
→ Proceed to [Reschedule Flow]Avoid Infinite Loops
Error Handling & Fallbacks
Error handling is what separates amateur setups from professional implementations. Every prompt needs explicit instructions for handling confusion, unclear input, and situations outside the AI's knowledge.
Conversation Repair Techniques
When something goes wrong, use this four-step repair sequence:
- Acknowledge: "I apologize, I may have misunderstood."
- Restate: "It sounded like you said Tuesday at 3 PM."
- Clarify: "Was that correct, or did you mean something different?"
- Confirm: "Perfect, I've updated that to Wednesday at 3 PM."
No-Match Handling
When the AI can't understand or match the caller's intent, use progressive fallbacks:
[Error Handling]
First no-match:
"I didn't quite catch that. Could you tell me again what you're calling about?"
Second no-match:
"I'm having a little trouble understanding. Are you calling to
schedule an appointment, ask a question, or something else?"
Third no-match:
"I want to make sure I help you correctly. Let me transfer you
to someone who can assist. One moment please."
→ Trigger transfer to human
[Guardrails]
- Never invent information you weren't given
- If asked about pricing you don't know, offer to have someone call back
- Never confirm appointments without all required information
- If caller seems distressed, prioritize empathy over efficiencySilent Transfers
Per VAPI documentation: If the AI determines that the user needs to be transferred, it should not send any text response back to the user. Instead, silently call the transfer tool. This creates a seamless experience.
Prompt Example for Silent Transfers
Industry-Specific Templates
Different industries have unique requirements, terminology, and caller expectations. These templates provide starting points you can customize for specific businesses.
Medical & Dental
Key considerations: HIPAA awareness—never confirm patient details without verification. Calm, reassuring tone for anxious patients. Clear emergency protocols. Insurance and new patient questions.
Sample greeting: "Thank you for calling Riverside Dental. I'm your virtual assistant. Are you calling to schedule an appointment, confirm an existing booking, or ask about our services?"
Emergency handling: "If this is a dental emergency, please say 'emergency' now and I'll connect you immediately. Otherwise, how can I help you today?"
Legal Services
Key considerations: Professional, authoritative tone. Confidentiality awareness. Clear intake process for new clients. Attorney availability and callback expectations.
Sample greeting: "Thank you for calling Smith & Associates Law Firm. How may I direct your call?"
Boundary setting: "We do not provide legal advice over the phone. I can schedule a consultation with an attorney to discuss your situation."
Home Services (HVAC, Plumbing, Electrical)
Key considerations: Emergency prioritization (no heat, flooding, gas). Service area confirmation. Scheduling flexibility for service windows. Basic troubleshooting when appropriate.
Sample greeting: "Thanks for calling Summit Plumbing. This is Alex. Are you calling about a service issue or to schedule an appointment?"
Emergency detection: "If you smell gas or have active flooding, please stay on the line for immediate assistance."
Real Estate
Key considerations: Property inquiry handling. Showing scheduling. Agent availability and callback. Lead qualification questions.
Sample greeting: "Thank you for calling Horizon Realty. I can help you with property information or connect you with an agent. Are you looking to buy, sell, or rent?"
Restaurant
Key considerations: Reservation handling with party size and time. Hours and location information. Menu and dietary restriction questions. Takeout/delivery options.
Sample greeting: "Thanks for calling Bella Trattoria. Would you like to make a reservation or do you have a question about our menu?"
Special handling: "For parties of 8 or more, I'll need to check availability with our manager. Let me get your contact information."
Testing & Optimization
Prompt engineering is an iterative process. Deploy, measure, and refine continuously based on real call data.
Testing Checklist
Greeting tests:
- Picks up within 2 rings
- States business name clearly
- Asks how to help
Intent recognition tests:
- Correctly identifies scheduling requests
- Recognizes emergencies
- Handles FAQs appropriately
Information collection tests:
- Captures name, phone, address accurately
- Confirms spelling of names
- Validates dates and times
Error recovery tests:
- Handles mumbled input gracefully
- Recovers from misunderstandings
- Knows when to transfer
Latency Optimization
Target sub-500ms end-to-end latency for natural-feeling conversations. Key optimizations:
- Use low-latency models: ElevenLabs Flash v2.5 (75ms), GPT-4o-mini (80-150ms)
- Keep
maxTokensat 150-200 for voice applications - Disable unnecessary features like format turns
- Optimize turn detection settings—default settings can add 1.5+ seconds
- Use LLM temperature of 0.3-0.5 for consistent, predictable responses
Optimization Tips
- Review call transcripts weekly—look for patterns in confusion or repeated questions
- Track where callers request human help—these are optimization opportunities
- Test with real phone calls, not just the playground—audio quality matters
- Start with your highest-volume use cases before handling edge cases
- Keep prompts as specific as possible to limit randomness
Common Mistakes to Avoid
Responses that are too long. Callers get impatient listening to multi-sentence responses. Keep it under 2-3 sentences. Say "Got it, let me check that for you" instead of "Thank you so much for providing that information. I really appreciate your patience..."
Not spelling out numbers. Text-to-speech reads "3:30" inconsistently. Some voices say "three colon thirty." Write "three thirty PM" in your prompts, not "3:30 PM".
Asking multiple questions at once. Callers can only answer one question at a time. Ask "What day works for you?" then wait. Don't ask "What day and time works for you?"
No fallback for unknown questions. Without explicit instructions, AI may invent information or go silent. Add: "If you don't know the answer, say: Let me have someone call you back with that information."
Generic voice settings. Default settings (stability 0.5, similarity 0.5) aren't optimized for professional voice agents. Use stability 0.55-0.65 and similarity 0.75-0.80.
Not testing on actual phone calls. Playground testing doesn't reveal audio quality issues, latency, or real-world caller behavior. Call your own number as if you're a customer.
Frequently Asked Questions
What are the best ElevenLabs voice settings for AI receptionists?
For professional AI receptionists, use Stability: 0.50-0.65, Similarity Boost: 0.75-0.80, Speed: 0.95-1.05, and Style Exaggeration: 0. These settings balance natural conversation flow with consistent, professional delivery. Use the eleven_flash_v2_5 model for low-latency (75ms) real-time applications.
How should I structure a VAPI voice AI prompt?
Structure VAPI prompts into five sections: [Identity] - Define the AI's persona and role; [Style] - Set tone and communication guidelines; [Response Guidelines] - Specify formatting rules like spelling out numbers and keeping responses to 1-3 sentences; [Task] - Outline conversation flow with step-by-step instructions; [Error Handling] - Define fallback behavior for unclear inputs or edge cases.
What is the ideal response latency for voice AI receptionists?
Target sub-500ms end-to-end latency for natural-feeling conversations. Achieve this by using low-latency models (ElevenLabs Flash v2.5, GPT-4o-mini), keeping maxTokens at 150-200, disabling unnecessary features, and optimizing turn detection settings. Vapi typically achieves 800ms with defaults, optimizable to ~465ms.
How do I handle errors and fallbacks in AI receptionist prompts?
Include explicit fallback instructions. For unclear input: ask clarifying questions. After 2-3 failed attempts: offer to transfer to a human. For unknown questions: offer to have someone call back. Always include guardrails like "Never invent information you weren't given."
What temperature setting should I use for voice AI receptionists?
Use lower temperature (0.3-0.5) for consistent, predictable responses. This reduces randomness and ensures the AI follows scripts reliably. Higher temperatures add creativity but may cause unpredictable responses. For appointment booking and information collection, lower is better.
How do I make my AI receptionist sound more natural?
Keep responses to 1-3 sentences max. Spell out numbers ("three thirty PM" not "3:30 PM"). Add natural speech elements to prompts like hesitations. Use the Acknowledge-Confirm-Prompt pattern. Set voice stability around 0.50-0.55 to allow some emotional variation.
Can AI receptionists handle appointment booking?
Yes. Configure your prompt with a clear conversation flow: greet, identify intent, collect name/phone/date/time, confirm details, and close. Integrate with calendar APIs (Cal.com, Google Calendar, Calendly) via VAPI function calls. Include confirmation loops to verify dates and times before booking.
What's the difference between VAPI and Retell AI for voice agents?
VAPI is an orchestration layer that lets you combine any transcriber (Deepgram, AssemblyAI), LLM (GPT-4, Claude, Llama), and voice (ElevenLabs, PlayHT) providers. It focuses on low-latency optimization and provides advanced features like smart endpointing and backchanneling. Both platforms support similar use cases but VAPI offers more provider flexibility.
Quick Reference: Optimal Settings
| Parameter | Recommended Value |
|---|---|
| ElevenLabs Stability | 0.50 - 0.65 |
| ElevenLabs Similarity | 0.75 - 0.80 |
| Target Latency | < 500ms |
| Max Response Tokens | 150 - 200 |
| LLM Temperature | 0.3 - 0.5 |
| Speech Speed | 0.95 - 1.05x |
| Response Length | 1-3 sentences |
| Style Exaggeration | 0.0 |
This guide is based on official documentation from VAPI and ElevenLabs.