AI Voice Basics

What Is the Best AI Voice Agent? The Honest Decision Framework for 2026

The best AI voice agent depends on your buyer type. A decision framework with real cost math at 200-5,000 calls, demo red flags, and compliance facts.

Alex MorganCo-founder, MapleVoiceJun 12, 2026 · 33 min read

The best AI voice agent is the one that matches your buyer type: small businesses that want the phone handled without managing software do best with a fully managed, done-for-you service; hands-on operators do best with a no-code builder like Synthflow or Goodcall; engineering teams do best with a developer API like Vapi or Retell; and large contact centers do best with an enterprise platform like PolyAI, Sierra, or Replicant. There is no single best agent. There is a best agent for your call volume, your technical resources, and your tolerance for maintenance work.

Every page ranking for this question is a ranked listicle written by a vendor or an affiliate, and all of them bury that answer under ten product reviews. This guide is the part that should come before any list: a 30-second decision tree, real total-cost math at 200, 1,000, and 5,000 calls a month, an industry-by-industry fit map, the compliance facts every listicle skips, a demo red-flags checklist, and the questions that make vendors sweat. If you want a ranked list, we maintain one at /alternatives/best-ai-voice-agent — read this first, then use it.

One promise before we start: every external number in this article is attributed to the page that published it, every example conversation is labeled as illustrative, and we say plainly when a competitor — or a human — is the better choice. The market has enough hype.

The 30-second answer, by buyer type

Four buyer types, four different right answers. This section is the whole article compressed: a verdict table mapping each buyer type to its best-fit category, then a five-question decision tree — answer the questions in order and stop at the first yes. Everything after this section is the evidence and the math.

Two cost realities frame the verdict. At the top end, vellum.ai reports that enterprise platforms like PolyAI and Sierra typically start around $150,000 a year. At the bottom, lindy.ai's hands-on testing found that a developer stack like Vapi advertises $0.05 a minute but lands at roughly $0.15 to $0.30 a minute once you add the language model, voice synthesis, and telephony you must bring yourself. Everything in between is where most businesses actually live — and where the market is weakest.

And the stakes are not abstract. getvoip.com opens its review with two numbers — flagged with source markers the page never actually names, so treat the precision with care — that 85 percent of customers who reach voicemail will not call back, and that a single lost call in home services can represent up to $1,200 in lost revenue. Whatever the exact figures, the direction is right: the real cost of this decision is not the subscription. It is every call that goes unanswered while you deliberate.

  • 1. Do you have engineers who will own this in production? Yes: shortlist developer APIs — Vapi for maximum stack control, Retell for fast deployment with strong post-call analytics, Bland for high-volume outbound. No: keep going.
  • 2. Are you a contact center with 50-plus seats and a six-figure budget? Yes: shortlist PolyAI, Sierra, Replicant, or Cognigy, and plan for an implementation measured in weeks to months. No: keep going.
  • 3. Do you genuinely want to build, tune, and monitor the agent yourself — including the unglamorous hours after launch? Yes: shortlist no-code platforms like Synthflow or Goodcall. No: keep going.
  • 4. Do you miss or fumble more than roughly ten business calls a week? Yes: shortlist done-for-you managed services and judge them on real call recordings, not demos. No: an AI agent may not be worth buying yet — voicemail-to-text or simple call forwarding may be the honest answer.
  • 5. Two overrides that beat everything above: if callers will share health information, only consider vendors that will sign a HIPAA Business Associate Agreement; if you will dial outbound, only consider vendors with TCPA consent controls built in.
Buyer typeBest-fit categoryRepresentative optionsCost shapeBiggest risk
Small business that wants the phone handledDone-for-you managed serviceMapleVoice, plus a small but growing managed categoryFlat monthly feeLock-in — confirm your call data is exportable
Hands-on operator who wants to build itNo-code platformSynthflow, Goodcall, VoiceflowSubscription plus usageYour own hours: tuning and monitoring never really end
Engineering team with developers to spareDeveloper APIVapi, Retell, BlandPer-minute stack: platform plus LLM, TTS, telephonyAll-in cost creep, and you own reliability end to end
Enterprise contact centerManaged enterprise platformPolyAI, Sierra, Replicant, CognigyCustom contracts, typically six figures a year per vellum.aiMonths of implementation and deep vendor dependence

What 'best' actually means: seven criteria that decide it

Of the pages ranking for this query, vellum.ai has the most structured evaluation matrix and lindy.ai has the most hands-on testing. Strip away the overlap and seven criteria decide which agent is best for you. Note what is missing from most vendor pages: lindy.ai calls latency the single most important spec to check and points out that most marketing pages conveniently do not mention it.

  • Latency — the pause between the caller finishing and the agent speaking. Past about half a second the conversation starts to drag, per lindy.ai; vellum.ai draws the hard line at one second.
  • Off-script handling — what happens when a caller says, wait, can you repeat that? In lindy.ai's testing, a Synthflow agent failed exactly that moment and fell back to a canned response.
  • Integrations that act, not just log — booking the actual slot, writing the CRM record, firing the confirmation text. A transcript in a dashboard is not an outcome.
  • Observability — recordings, transcripts, summaries, and outcomes on every call. You cannot improve what you cannot see; lindy.ai credits Retell's post-call visibility as the difference between having an agent and improving one.
  • Pricing transparency — the all-in cost per call at your volume, not the headline per-minute rate. The gap between those two numbers is where budgets die.
  • Compliance fit — a signed BAA for healthcare, TCPA consent workflow for outbound, recording-consent handling for your states. Logos on a pricing page are not compliance.
  • Maintenance ownership — someone must review transcripts and update the agent when your prices, hours, and services change. You, the vendor, or nobody. Nobody is the silent failure mode.

How AI voice agents work — and why latency decides everything

Every AI voice agent runs the same loop. The caller speaks, and the audio travels over the phone network. Speech-to-text converts it into words — about 100 milliseconds on fast providers, according to lindy.ai. A large language model reads the words, decides what to do, and writes a reply — roughly 200 milliseconds, usually the slowest step. Text-to-speech turns the reply into a voice — about 100 to 150 milliseconds. Then it plays back, and if the caller interrupts mid-sentence, barge-in detection stops the playback and restarts the loop with the new input.

Two terms worth knowing precisely, because vendors use them loosely. Endpointing is how the system decides you have finished talking — too aggressive and it talks over you, too cautious and it leaves awkward silence. Barge-in is the agent's ability to stop talking the instant you interrupt, the way a polite human would. Both matter more to how human a call feels than the prettiness of the voice.

Here is what that loop looks like on a real-shaped call. Example (illustrative, not a real recording): Caller: Hi, do you have anything tomorrow afternoon? Agent: We do — I can offer 1:30 or 3:15. (The agent queried the live calendar mid-sentence: that is an integration acting, not logging.) Caller: Actually, wait — does that include the cleaning, or— Agent: (stops speaking the instant the caller starts: that is barge-in) It does. The 3:15 is a full cleaning and exam. Caller: Then 3:15 works. Agent: Booked — I am texting your confirmation now. (The appointment writes back to the calendar and CRM, and the text fires, with no human touching anything.) Every beat in that exchange — the live lookup, the interruption, the write-back — is testable on any vendor's demo line.

For reference points, getvoip.com is the only ranking page that publishes per-vendor latency estimates: roughly 400 to 500 milliseconds for Synthflow, 600 to 800 for Retell, about 800 for Bland, and 800 to 1,200 for PolyAI. It states no measurement methodology, so treat them as rough markers rather than benchmarks — but they at least put honest scale on a spec most vendors hide.

You do not need a lab to test latency yourself. Run this five-minute test on any vendor's demo line: call and ask a normal question, then silently count the gap before the reply — if you can comfortably say one-one-thousand, that is already a full second. Interrupt the agent mid-sentence and see whether it stops and adapts. Ask it to repeat what it just said. Ask one question that is not on the website. Then call again at a busy hour and compare. Five calls will tell you more than any comparison table — including ours.

The four buyer types in depth

Done-for-you managed service. You describe your business; the vendor builds the agent, connects your booking system and CRM, tunes it on real calls, and maintains it as your business changes — for a flat monthly fee. This category barely exists on review sites because the big platforms are DIY products, but it is the right answer for most small businesses, because the scarce resource is not budget — it is the owner's time. Judge any managed vendor on one thing: real, unedited call recordings.

No-code DIY. Synthflow gets you to a working agent fast — lindy.ai had a demo running in about two hours — but the same testing found off-script moments brittle and noted that full CRM write-back workflows are gated behind its enterprise tier. Goodcall, which lindy.ai notes spun out of Google's Area 120, set up in roughly ten minutes and handles structured calls well, but is billed per unique caller and struggles with nuanced conversations. Voiceflow is built for teams designing flows together, though lindy.ai found large flows turn into a debugging maze. The pattern: fast to start, but the ceiling arrives quickly and the tuning work is yours forever.

Developer API. Vapi gives engineers control of every layer — swap the speech recognition, the model, the voice — at the cost of owning error handling, retries, and reliability yourself; lindy.ai's all-in estimate is $0.15 to $0.30 a minute, not the $0.05 headline. Retell gets to production faster and pairs it with the post-call visibility lindy.ai says most voice AI tools lack — sentiment scores, failed-handoff flags, automatic issue triage — though the same review calls its customer support non-existent. Bland runs proprietary models on its own infrastructure and, per vellum.ai, claims scaling to as many as one million concurrent calls — built for outbound volume — though getvoip.com notes it is English-only by default outside enterprise contracts. And lindy.ai's build-vs-buy verdict is worth repeating to this buyer: do not build a voice agent from scratch unless voice AI is core to your product, because platforms like these are already the hybrid middle ground between buying and building.

Enterprise managed platforms. PolyAI is the conversation-quality benchmark: vellum.ai reports production call containment above 80 percent, reaching 87 percent early in some deployments, with contracts typically starting around $150,000 a year. getvoip.com adds the texture: speech recognition trained on millions of real customer calls, a 99.9 percent uptime SLA, several months from first contact to go-live, and — its own estimate — thousands of calls a month needed to justify the investment. Sierra sells brand-tone control with outcome-based pricing; getvoip.com notes you are charged when the agent completes a business objective, and that Sierra shares a co-founder with Salesforce. Replicant focuses on resolving tier-one calls end to end. Cognigy plugs into the Genesys, Avaya, and NICE estates enterprises already own, per lindy.ai. In the mid-market between no-code and these platforms, vellum.ai lists Leaping AI: go-lives in two to four weeks, from $2,500 a month per digital call-center employee, automating up to half of call-center volume by its own claim.

Three adjacent categories are worth knowing so you do not buy the wrong shape of tool — getvoip.com covers all three; the other rankings ignore them. Agent-assist platforms like Cresta coach your human agents in real time rather than replacing them, at per-seat enterprise pricing getvoip.com says may start in the tens of thousands. Outbound sales orchestrators like Regal decide which lead to call and when, weaving calls, texts, and email into one cadence. And helpdesk-native agents like Decagon live inside Zendesk or Intercom and resolve tickets end to end — getvoip.com pegs its contracts at five figures a year even for small support teams, and says it fits best where 60 percent or more of tickets are password resets, order status, and account updates. If your problem matches one of those shapes, a general-purpose voice agent is the wrong purchase.

Which agent type fits your industry

Review sites name-drop verticals; none of them maps industry to a decision. Yet your industry usually constrains the choice more than any feature list, because it dictates the call types you must handle and the compliance you cannot skip. Use this as a shortlist filter — it is guidance, not law, and your actual mix of call types matters more than your label.

  • HVAC, plumbing, and home services — after-hours emergencies, estimate requests, booking. The expensive failure is the missed 2 a.m. call; getvoip.com puts a lost home-services call at up to $1,200. Best fit: done-for-you or no-code, with strict emergency-triage rules and on-call routing.
  • Dental — appointment booking, recall, and insurance questions. Patient information means HIPAA: a signed Business Associate Agreement before launch, full stop. Best fit: a managed service or platform that signs BAAs and has dental call flows already built.
  • Healthcare and med spas — scheduling plus treatment and prescription questions. Nearly every call touches protected health information, so a BAA plus recording-consent discipline is the entry ticket. Best fit: managed with healthcare experience, or an enterprise platform in regulated deployments.
  • Legal — client intake and urgent matters. Privilege-sensitive details demand conservative scripts and all-party recording-consent caution; the agent should capture intake and transfer fast, never advise. Best fit: done-for-you with immediate warm-transfer paths.
  • Restaurants — reservations and phone orders at peak hour, exactly when staff cannot answer. POS integration is the whole game, and PCI DSS applies if cards are taken by phone. Best fit: done-for-you or no-code with true POS write-back.
  • Real estate and mortgage — speed-to-lead callbacks and outbound follow-up. Outbound means TCPA consent capture before the first dial. Best fit: any category, but only vendors with built-in consent workflows.
  • Auto repair and dealerships — service booking, status updates, recall outreach. The agent needs write-back into your scheduler, not just a transcript in a dashboard. Best fit: no-code or managed with a real scheduler integration.
  • Property management — maintenance triage around the clock, with genuine emergencies routed to on-call staff and everything else ticketed for morning. Best fit: done-for-you with tiered triage rules; this vertical is the warm-transfer test case.
  • E-commerce and retail support — order status, returns, account updates. getvoip.com's benchmark for Decagon applies here: helpdesk-native AI fits when 60 percent or more of tickets are tier-one. Best fit: helpdesk-native or no-code tied to your order system.
  • Banking, insurance, and hospitality contact centers — high volume, multilingual, heavily regulated. This is the territory getvoip.com assigns to PolyAI and its peers. Best fit: enterprise managed platforms, six-figure budgets assumed.

The middle path the whole market misses

Look at the market the review sites describe and you will notice it is a barbell. On one end: $29 to $249 a month DIY subscriptions you configure, prompt-tune, and babysit yourself. On the other: $150,000-a-year enterprise contracts with professional services teams. The middle — someone who builds, tunes, and maintains the agent for you at small-business pricing — is structurally missing from every ranking article on page one.

The review sites themselves admit why the middle matters, without drawing the conclusion. getvoip.com warns that a set-it-and-forget-it approach fails and that you must review transcripts and adjust continually — its exact framing is that these systems are a constant work in progress. lindy.ai's reviewer needed real prompt tuning before trusting an agent with live traffic, and watched a no-code agent fumble a simple repeat-that request. Those are labor costs. If you bill your own time at anything above zero, the true price of a $99 DIY plan is $99 plus every hour you spend in the dashboard.

Honest counterpoint: if you have the time and genuinely enjoy the building, DIY is cheaper in cash and teaches you your own call patterns. And if you have engineers, an API will give you control no managed service can match. The done-for-you middle path wins specifically when the owner's hours are the scarcest resource — which, for most small businesses, they are.

What it actually costs: 200, 1,000, and 5,000 calls a month

Per-minute rates are quoted everywhere; an actual monthly bill is computed nowhere on page one. So here is the arithmetic, using a 3.5-minute average call and rates the ranking articles themselves published as of June 2026. These are calculations from public numbers, not quotes — your mileage will vary, which is exactly why you should run this same math on every vendor you shortlist.

Sources for the rates: lindy.ai's real-world all-in estimate of $0.15 to $0.30 a minute for a developer stack; getvoip.com's small-business plan range of roughly $29 to $199 a month with typical per-minute pricing of $0.08 to $0.15; vellum.ai's report that enterprise platforms typically start near $150,000 a year, which is about $12,500 a month before a single call.

  • Hidden line items getvoip.com says to watch for: setup and onboarding charges, integration or API fees, premium feature add-ons, overage penalties, and contract termination fees.
  • One more pattern, via lindy.ai: unlimited minutes is sometimes bounded by a monthly unique-caller allowance — Goodcall bills per unique caller, not per minute. Read the definition of unlimited everywhere you go.
  • getvoip.com's selection framework adds a useful budget threshold: under roughly 1,000 minutes a month, pay-as-you-go usually beats a subscription — and it recommends planning on a 12-to-24-month horizon, not this quarter's bill.
  • For scale: a full-time human receptionist costs tens of thousands of dollars a year in wages alone, and traditional human answering services bill by the minute at rates above anything in this table — context worth holding while you compare columns.
  • Whatever you shortlist, force every vendor to a single comparable number: all-in cost per answered call at your volume, including telephony, model, and voice fees.
Monthly volumeMinutes at 3.5 min/callDIY developer API (all-in $0.15-$0.30/min per lindy.ai)No-code subscription (per getvoip.com ranges)Enterprise platform (per vellum.ai)Done-for-you managed
200 calls700$105-$210, plus your build and tuning hoursEntry plans around $29-$99 usually cover itNot economical at this volumeFlat monthly fee; no meter to watch
1,000 calls3,500$525-$1,050, plus ongoing maintenance hoursMid tiers around $99-$249, with overage riskStill oversized for most businessesSame flat fee
5,000 calls17,500$2,625-$5,250; engineering time now matters less per callTop tiers or custom pricing; effective rates near $0.08-$0.15/minBegins to pencil if you also need 100-plus languages or contact-center integrationSame flat fee; confirm fair-use terms in writing

Page one can't agree what these tools cost

Here is something nobody tells you: the three articles ranking at the top for this exact query quote mutually contradictory prices for the same products. All three figures below were captured from those live pages in June 2026.

And the disagreement is not only about price — they cannot even agree what the products are. lindy.ai describes building Retell agents through a visual flow builder; getvoip.com states flatly that Retell has no visual interface or drag-and-drop builder and is built for developers. Same product, same month, same page one.

The lesson is not that any one reviewer is sloppy. It is that voice AI pricing and packaging change faster than review content, and none of these pages date their numbers. Practical rules: treat every published price as a screenshot of a moving target, date-stamp every quote you collect, trust only the number in your signed order form, and ask for 12-month price protection before you commit.

Platformlindy.ai saysvellum.ai saysgetvoip.com says
SynthflowUsage-based, free to start (its AI-crawler page variant says $375-$1,400/mo)$29-$249/mo plans, around $0.08/min$29-$1,250/mo plus roughly $0.13/min effective
Bland$0.14/min to start; $299-$499/mo plansNo public pricing; enterprise focus$0.09/min flat, active talk time only
Retell$0.07-$0.31/min all-in, itemized by componentFrom about $0.07/min, near $0.05 at enterprise volumeFrom $0.07/min, plus your own language-model costs

The compliance facts every listicle skips

Across roughly 20,000 words of top-ranking content we analyzed for this query, the word TCPA appears zero times. If you ever dial outbound, this is the section that protects you.

TCPA and the FCC. In February 2024, the FCC ruled that AI-generated voices fall under the Telephone Consumer Protection Act's restrictions on artificial and prerecorded voice calls. As of 2026, that means outbound AI calls generally require prior express consent, and AI telemarketing calls require prior express written consent. TCPA exposure accrues per call, so a misconfigured outbound campaign is not one mistake — it is thousands.

Call recording consent. The federal baseline is one-party consent, but a number of states — including California, Florida, Illinois, Maryland, Massachusetts, Pennsylvania, and Washington, as of 2026 — require all parties to consent. Since you rarely control where a caller is standing, the safe pattern is a brief recording disclosure at the start of every call, everywhere.

HIPAA. Compliance is not a logo; it is a signed Business Associate Agreement plus actual safeguards on protected health information. lindy.ai names several platforms that advertise HIPAA support, which is a fine starting point — but the only test that matters is whether the vendor will sign your BAA before patient data flows. PCI DSS applies separately if the agent takes card payments over the phone.

  • Get the BAA signed before any health information touches the system.
  • Build consent capture and a documented consent record before the first outbound dial.
  • Play a recording disclosure in all-party-consent states — or simply everywhere.
  • Disclose the AI up front. Several states have bot-disclosure laws as of 2026, and outbound AI voices already sit under the FCC's TCPA ruling — but the better reason is trust: the worst caller experience is not talking to a bot, it is feeling tricked by one.
  • Honor opt-outs immediately, and log them where the dialer actually checks.
  • Document your retention policy for recordings and transcripts, and who can access them.
  • Ask where call data is stored and whether it is used to train models. Get the answer in writing.

Human transfer and after-hours: the part demos skip

Every ranking article writes some version of escalates to a human when needed, and not one explains the mechanics. There are two kinds of transfer. A cold transfer dumps the caller into another queue, and they start over. A warm transfer briefs the human first — the agent passes the caller's name, number, reason for calling, and what has already been said, so the person picks up mid-story instead of at the beginning. When you evaluate any vendor, ask which one you are getting and what context actually reaches the human.

Then ask the harder question: what happens at 2 a.m. when there is no human to transfer to? Good systems degrade gracefully — they take a structured message, create a ticket or book the appointment directly, text the caller a confirmation, and schedule a callback for opening hours, with separate triage rules that route true emergencies to an on-call line. If a vendor's answer to the after-hours question is a shrug, the agent will fail at exactly the hours you bought it for.

Set expectations with the best number on the public record: per vellum.ai, PolyAI — the strongest enterprise performer on this SERP — reports call containment above 80 percent, reaching 87 percent early in some deployments. Read that as a ceiling. Even excellent agents hand roughly one call in seven to a person, so the transfer path is not an edge case; it is a core feature.

Risks and limitations no listicle prints

Accuracy has a ceiling. getvoip.com puts speech understanding at roughly 90 to 95 percent under good conditions; lindy.ai's own AI-crawler page pegs structured inquiries at 80 to 90 percent. Heavy accents, background noise, and bad cell connections push those numbers down. Test with recordings of your real callers, not a quiet office demo. The exception that proves the rule, per getvoip.com: PolyAI's speech recognition is trained on millions of real customer calls, which is exactly why it handles heavy accents and dialects better than its peers — and part of why it costs six figures.

Off-script moments are still the weak point. lindy.ai watched a no-code agent fail a simple can-you-repeat-that and fall back to a canned line — the exact moment a caller realizes they are talking to a bot. And without grounding in your real knowledge base plus guardrails, language models can confidently invent answers to pricing and policy questions. The fix is restricting what the agent may claim and reviewing transcripts weekly, which is work someone must own.

Sometimes the honest answer is do not buy one. If you get a handful of calls a week, voicemail-to-text is nearly free. If your calls demand genuine human judgment and empathy — bereavement-sensitive conversations, escalated complaints, complex high-stakes negotiations — a trained human, whether on staff or at a human answering service, is simply better, and pretending otherwise burns trust. And if one mishandled call can cost you more than a year of saved labor, keep humans on the line and use AI only for overflow and after-hours. The vendors who deserve your money will tell you this themselves.

Demo red flags: a field checklist

Vendors demo the happy path. This checklist is compiled from failure patterns the ranking reviews themselves documented — use it in the room.

And before the demo, check the third-party record. vellum.ai is the only ranking page that compiles G2 review scores — Retell at 4.8 out of 5 across 612 reviews, Synthflow at 4.5 across 815 — and it is also where you discover PolyAI's perfect 5.0 rests on just 11 reviews. Read the review count before the star rating.

  • The demo only shows a scripted flow. Ask to go off script, interrupt mid-sentence, and ask something not on the website. Watch what happens.
  • Latency claims with no methodology. Sub-500 milliseconds means nothing without knowing how, when, and under what load it was measured. Run your own five-minute test.
  • Unlimited minutes with a unique-caller cap in the fine print — the pattern lindy.ai documented at Goodcall. Ask what unlimited excludes.
  • Core workflow features gated behind an enterprise tier — lindy.ai found Synthflow's CRM write-backs locked to enterprise. Price the tier you actually need.
  • Support that lives in Discord or a community forum. lindy.ai called Retell's support non-existent; when your phone line breaks, a community thread is not an SLA.
  • No public pricing for a product aimed at small businesses. Opaque pricing at SMB scale usually means the price is whatever you will pay.
  • Credit-based pricing you cannot forecast — both lindy.ai and vellum.ai flag this pattern at ElevenLabs. Ask for a worked example at your call volume.
  • No exportable recordings and transcripts. If you cannot take your call data with you, you are renting your own customer conversations.
  • The vendor will not put answer rate, latency, or uptime in the contract. Promises that cannot survive a contract are not promises.
  • Compliance answered with certification logos instead of a signed BAA and a consent workflow. Logos are marketing; signatures are compliance.

The vendor-question checklist

Send these before the demo and bring them to the call. A vendor who answers all fourteen crisply is worth shortlisting; a vendor who dodges three or more is telling you something.

  • What is my all-in cost per answered call at my volume — including telephony, model, and voice fees?
  • What happens to my bill if call volume doubles next month? What exactly triggers overage charges?
  • Who tunes and maintains the agent after launch — you or me — and is that work billed?
  • How long from signature to live on my number, and what do you need from me?
  • Can I hear real production calls from a business like mine, not demo recordings?
  • What is your measured median latency, and how was it measured?
  • What does the agent do when it does not know the answer?
  • Is your transfer warm or cold, and what context reaches my staff when it hands off?
  • What happens after hours when no one is available to take a transfer?
  • What are your data retention rules, and can I export and delete everything if I leave?
  • Where is my call data stored, and is it ever used to train models?
  • Will you sign a HIPAA Business Associate Agreement? (Only if you handle health information.)
  • How do you capture and store TCPA consent for outbound calls, and how are opt-outs honored?
  • How do you handle call-recording consent in all-party-consent states?

Outbound is a different sport: spam labels and number reputation

If you make calls instead of just answering them, the technology is the easy half. The hard half is whether anyone picks up. Carriers score the reputation of outbound numbers, and once your caller ID earns a Spam Likely label, connect rates collapse no matter how good the agent is. STIR/SHAKEN — the framework US carriers use to cryptographically attest who is calling — means properly registered, consistently used numbers get treated better than burner pools.

This entire topic gets one bullet across page one: vellum.ai's writeup of Leaping AI mentions number scrubbing to prevent spam flagging and voicemail-detection fallbacks. That is it — across three articles. The practical playbook: use legitimate, registered numbers tied to your business identity; monitor how your numbers display on major carriers; do not blast volume from a single number; call only consent-based lists, which TCPA requires anyway; and honor opt-outs instantly. Outbound AI without number-reputation management is a very fast way to talk to no one.

Run a two-week pilot before you sign anything

Whatever category you land on, do not buy from a demo. Pick one narrow use case — after-hours answering, or appointment booking, or missed-call recovery — and define success before the first call: the metrics below, with target numbers written down. Run week one in the lowest-stakes slot you have, usually after-hours only. Read every transcript daily; it takes minutes and is where every real problem shows up first. Expand to business hours in week two only if week one held. And watch how fast fixes ship: if the vendor cannot turn around a prompt or flow change within a day during a pilot — when they are trying to win you — it will be slower after you sign.

Hold the pilot to precise definitions, because vendors use these terms loosely and the differences move money. vellum.ai's own FAQ guidance says to track containment rate, resolution rate, transfer rate, customer satisfaction, average handle time, and cost per resolved interaction — a good list; the definitions below keep anyone from gaming it:

  • Containment rate — share of calls fully handled with no human involvement. High containment with angry callers is not success.
  • Deflection — calls kept away from humans by any means, including making the caller give up. Never accept deflection as a success metric without resolution beside it.
  • Resolution rate — share of calls where the caller's actual need was met. The metric that matters most and is quoted least.
  • Transfer rate — share of calls escalated to a human. Plan staffing around it; per vellum.ai, even PolyAI's reported best is 80-87 percent containment, so transfers never reach zero.
  • Automation rate — share of post-call work (logging, follow-ups, booking) done without a human. Where most of the hidden labor savings live.
  • Barge-in — the agent stops speaking the instant the caller interrupts. Test it on every demo.
  • Endpointing — how the system decides the caller finished talking. Bad endpointing feels like being talked over.
  • Concurrency — how many simultaneous calls the system handles before callers queue. Ask what your tier guarantees; lindy.ai notes Retell includes 20 concurrent calls, then charges $8 a month per additional line.
  • SIP trunk — the connection that links an AI platform to your existing numbers and carrier, so you can keep your phone number while changing what answers it.
  • MOS (mean opinion score) — a 1-to-5 listener rating of voice quality. Useful, but it measures sound, not whether the agent helped.
  • BAA (Business Associate Agreement) — the signed HIPAA contract that makes a vendor accountable for protected health information. No signature, no healthcare deployment.

Where MapleVoice fits — and where it doesn't

First, where we are the wrong choice. If you have engineers who want to own the stack, use Vapi or Retell — a managed service will only frustrate them. If you are a contact center with a hundred seats and a six-figure budget, PolyAI, Sierra, Replicant, and Cognigy are built for you in ways we are not. And if you genuinely enjoy building and tuning flows yourself, a no-code platform like Synthflow will cost less cash than any managed service, including ours.

MapleVoice exists for the buyer type page one ignores: the small business that wants the phone handled, not another tool to operate. We build the agent for you, tune it for your industry — we work across 20 verticals, from dental to home services to restaurants, mapped at /industries — and maintain it as your business changes. It is typically live in about 48 hours, answers 24/7 in under two seconds, books appointments, qualifies leads, and takes orders, connects to your booking system, CRM, or POS, and warm-transfers to your team with full context when a call needs a human. Pricing is a flat monthly fee with no per-minute meter. For qualifying healthcare customers we sign a BAA, outbound calling ships with TCPA consent controls, and every call produces a recording, transcript, summary, call reason, outcome, and next step — so you can audit us the way this article told you to audit everyone.

The next step is the same one we recommend for any vendor: listen to real, unedited calls at /call-recordings, check the flat-rate math at /pricing, and compare us against everything named in this article in the ranked companion at /alternatives/best-ai-voice-agent. If, after the decision tree and the pilot, a DIY platform fits how you operate better — pick it. The best AI voice agent is the one that matches how you actually run your business, and that was true before we had anything to sell you.

Frequently asked questions

How do AI voice agents work?

An AI voice agent converts caller speech to text, runs it through a large language model to decide a response, and speaks the reply with synthesized speech. According to lindy.ai, transcription takes about 100 milliseconds, the model about 200, and speech synthesis 100-150 — and conversations start to drag past half a second.

How much do AI voice agents cost?

Most cost $0.08-$0.50 per minute or roughly $29-$199 a month on small-business plans, according to getvoip.com, while vellum.ai reports enterprise platforms like PolyAI and Sierra typically start near $150,000 a year. Done-for-you managed services charge a flat monthly fee. Always compute all-in cost per answered call at your real volume.

How accurate are AI voice agents at understanding speech?

Roughly 90-95 percent under good conditions, according to getvoip.com; lindy.ai pegs structured inquiries at 80-90 percent. Accuracy drops with heavy accents, background noise, and weak connections, so test any vendor with recordings of your real callers — not a quiet office demo — before trusting it with live traffic.

Can AI voice agents replace human agents?

No — they replace specific call types, not your team. Routine bookings, FAQs, lead qualification, and order status automate well, but vellum.ai reports even PolyAI's containment runs 80-87 percent, so humans still take the rest. Complex, sensitive, or high-stakes conversations need a clear, fast escalation path to a person.

Can AI voice agents handle inbound and outbound calls?

Yes — most modern agents handle both directions. Inbound agents answer, qualify, book, and escalate; outbound agents run reminders, follow-ups, and lead qualification. lindy.ai notes some platforms are optimized for one direction, so confirm before buying. And as of 2026, outbound AI calls fall under TCPA consent requirements, so consent capture comes first.

What is the difference between an AI voice agent and an IVR?

An IVR routes callers through fixed menus — press one, press two — and breaks on anything off-script. An AI voice agent holds an open conversation: callers speak naturally, change their minds mid-sentence, and the agent understands intent, takes actions like booking, and escalates with context. One is a phone tree; the other is a conversation.

Do AI voice agents support multiple languages?

Yes — lindy.ai reports ElevenLabs supports 70-plus languages and Cognigy more than 100, and vellum.ai notes Synthflow covers 50-plus. But listed support and production quality are different things: accents, dialects, and mid-call language switching vary widely by platform, so test your actual target languages on live calls before committing.

Are AI voice agents HIPAA compliant?

Some are — but HIPAA compliance means the vendor signs a Business Associate Agreement and actually safeguards patient information, not that a logo appears on a pricing page. lindy.ai lists several platforms advertising HIPAA support; verify each directly. MapleVoice signs BAAs for qualifying healthcare customers. No signed BAA, no healthcare deployment.

Can I integrate an AI voice agent with my CRM?

Yes — most platforms connect to major CRMs like Salesforce and HubSpot, and developer APIs can reach almost anything via webhooks. The bar to set is higher than connects to: the agent should write call outcomes back automatically, update contact records, and trigger follow-ups — otherwise you have bought a transcript generator, not automation.

What kind of data do AI voice agents collect?

Call recordings, transcripts, caller phone numbers, interaction metadata like duration and outcome, detected intents, and any personal information callers share, per getvoip.com's summary. Before signing, ask three things: how long data is retained, whether you can export and delete it if you leave, and whether it is ever used to train models.

The “What is…” series

Ten definitive guides to AI voice technology — plain English, honest math, no hype.

Keep reading

Hear it answer a real call

MapleVoice builds and runs a fully-managed AI voice agent for your business — live in about 48 hours, flat monthly price.