Business

SaaS

Technology

•

May 15, 2026

Home Depot Voice Agent Case Study: What Enterprise Deployment Looks Like

When Home Depot started rolling out AI voice agents for in-store customer service in May 2026, it was not an experiment. A Fortune 50 retailer does not put AI on live customer calls without a lot of prior validation. Production voice agent deployments grew 340% year-over-year across more than 500 organisations, and Home Depot is the most visible example.

Here is the context: Gartner shows 64% of enterprise CX teams ran agentic AI pilots in 2026, but only 27% are in full production. Most organisations that have tried voice AI are not actually using it in production yet. If you are evaluating voice AI production deployments across verticals, that gap is where the real work sits.

This article uses Home Depot as the framing evidence, Medical Data Systems and Pine Park Health for the numbers, and Forrester and Gartner market data to give you the vocabulary for making a production decision — not another pilot.

What Did Home Depot Actually Deploy — and What Does It Tell Us?

In a 50-store pilot, Home Depot’s AI voice agents identify why a customer is calling within 10 seconds and reach a resolution four times faster than traditional phone menus. Customers speak naturally instead of navigating a menu tree. The agent checks order status, confirms product availability, sends product links to a customer’s cart, and routes to a human when needed.

Call volume, containment rate, and deployment architecture are not publicly disclosed. The point is not that you have a technical blueprint to replicate — it is that a $150B+ retailer made a production commitment in Q2 2026.

There is a corroborating signal worth noting. In late April 2026, 3CLogic announced its AI Agent Evaluator — automated post-call quality scoring — ahead of ServiceNow’s Knowledge26. Companies do not build enterprise monitoring infrastructure for technology that is still in pilot. The category is moving from evaluation to production instrumentation.

What Is the Difference Between an IVR and a Modern AI Voice Agent?

Interactive Voice Response (IVR) relies on DTMF menus — press 1 for billing, press 2 for returns, navigate a scripted tree. No natural language, no contextual memory, no capacity to take action mid-call. IVR deflects calls by routing them. Whether the caller’s problem was actually resolved is a separate question entirely.

A modern AI voice agent works differently. The caller speaks naturally, the system understands intent across the whole conversation, and the agent can act mid-call — look up an order, schedule an appointment, process a payment — without sending the caller through a menu.

The metric that matters here is containment rate: the percentage of calls fully resolved by the AI without any human transfer. Not deflection rate, which counts any self-service interaction as a win regardless of outcome. A 70% containment rate is a fundamentally different cost proposition from a 40% deflection rate that still routes most calls to humans. Make sure you are measuring the right thing.

Why Are 64% of Enterprises Stuck in Pilot While 27% Are in Production?

Full production means the AI is handling live customer calls at scale with no human review of individual interactions. Pilot means controlled, monitored, limited-volume — conditions that make performance look better than it will under real telephony load.

Four things keep most organisations stuck:

Compliance architecture is incomplete. TCPA Prior Express Written Consent for outbound, HIPAA Business Associate Agreements with every vendor processing protected health information, PCI-scope decisions for payment-handling — none of these can be deferred. Only 11% of healthcare CX programmes are in full production, compared to 38% for banking. The compliance burden directly explains the lower production rates.

Escalation paths are not production-ready. A warm transfer that preserves full conversation context is non-trivial. Most pilots implement cold transfer. It works fine in a controlled test. It fails under volume.

Monitoring is not in place before launch. Traditional QA programmes sample 2–5% of calls. Without post-call scoring at 100% coverage, you are running a production system blind. 3CLogic’s AI Agent Evaluator is what production-grade monitoring looks like — automated scoring that feeds performance trends into systems of record before the first complaint arrives.

Latency degrades under real telephony load. The latency requirements behind retail-scale deployment are worth understanding before you go live.

Organisations that crossed the threshold did three things: integrated the AI into the knowledge base, CRM, and order system; defaulted to warm transfer; and instrumented monitoring before the first complaint — not after.

What Do the Quantitative Case Studies Actually Show?

Home Depot provides market signal. Medical Data Systems and Pine Park Health provide the numbers.

Medical Data Systems is a healthcare receivables and collections company. Call volume had grown to more than 3,000 calls per day, many going unanswered. Rather than add headcount, they deployed Retell AI to handle 100% of inbound calls. The outcomes: 30,000 calls per month handled by AI, a 70% containment rate, and approximately $280,000 per month in automated collections.

Why is 70% containment achievable in collections? Collections calls have defined outcomes — payment, payment plan, dispute, transfer. The intent space is narrow, account information is pulled before the call begins, and there is no ambiguous judgment required. Well-defined call types like this are high-containment-potential by design.

The cost works out like this: 30,000 calls × 3 minutes × $0.07 = $6,300/month in platform fees. Against $280,000/month in automated collections, the decision is simple. For TCPA exposure in outbound AI voice campaigns, the MDS deployment illustrates what the compliance requirements look like directly — Retell AI’s SOC 2 Type II, HIPAA BAA portal, GDPR, and PCI compliance were all in place before go-live.

Pine Park Health deployed Retell AI for patient scheduling and reported a 38% increase in scheduling NPS. Scheduling is the first patient touchpoint — 24/7 availability and lower wait times improve satisfaction whether or not cost reduction was the primary goal. Together these two cases make the dual ROI argument: cost reduction at scale, and experience improvement. Both at once.

How Should You Think About Voice AI ROI? The Gartner and Forrester Data

Gartner projects conversational AI will cut contact centre labour costs by $80 billion in 2026. Forrester’s Total Economic Impact study documents 331–391% three-year ROI with a median payback period of 5.4 months. Voice AI handled 19% of inbound contact centre volume in 2026, up from 6% in 2024.

The Forrester range matters. The difference between 331% and 391% comes down to implementation quality, use case selection, and containment rate. Top-quartile programmes reach payback in 2.9 months; bottom-quartile are the ones still stuck in pilot.

Here is the per-unit comparison: ContactBabel puts the average inbound human call cost at $7.16. A Retell AI-handled call at $0.07/minute × 3 minutes costs $0.21 — a 34x per-unit difference. At 100,000 calls per month with a 70% containment rate, that is a 68% cost reduction against the all-human baseline.

The lever is containment rate. Use case selection and compliance architecture are the two variables that determine where you land in that Forrester range.

What Does Enterprise Deployment Actually Cost? Managed Service vs. Developer Platform

The enterprise voice AI market has two distinct tiers. Not knowing which one fits you produces unpredictable costs.

Managed service tier — PolyAI, Cognigy — runs $150,000+/year with a six-week minimum. SOC 2 Type II, HIPAA, GDPR, PCI DSS, and ISO 27001 included. That price covers implementation, compliance certification, and ongoing support — costs the developer platform buyer handles separately.

Developer platform tier — Retell AI, Vapi, Bland AI — per-minute pricing, self-managed implementation. Pricing transparency matters more than the headline rate. Retell AI’s $0.07/minute all-in covers platform, STT, LLM inference, and TTS with no surprises. Vapi’s $0.05/minute becomes 0.25–0.33/minute in production once all components are added, plus a $1,000/month HIPAA add-on.

The decision is simple:

Managed service when compliance is complex, engineering capacity is limited, or the use case requires significant customisation.
Developer platform when your engineers own the implementation and per-minute cost economics matter at scale.

Factor post-call scoring into total cost of ownership regardless of which tier you choose. For a detailed breakdown of replicating the enterprise deployment pattern, the platform and architecture choices are examined in full.

What Separates a Successful Production Deployment from a Stalled Pilot?

Five things separate organisations that cross the threshold from those that stay stuck.

1. Compliance architecture defined before day one. TCPA PEWC for outbound, HIPAA BAA signed with every vendor touching protected health information, PCI-scope decision documented before any payment-handling calls go live.

2. Use case scoped to high containment potential. Collections, scheduling, order status — deployable from day one. Complex billing disputes — not on day one. Password resets achieve 78% deflection; complaint handling sits at 19% regardless of model quality. Start where the AI can win.

3. Escalation path production-ready. Warm transfer with full conversation context. Eighty-two percent of consumers expect a clear and immediate path to a human. Mishandled escalation erodes the CSAT benefit of the entire programme.

4. Monitoring instrumented before launch. Post-call scoring active from day one. Hallucination-related complaints drop from 0.34% to 0.11% with retrieval-augmented grounding. You cannot fix what you are not measuring.

5. Performance validated under live call volume. Sandbox results do not predict production behaviour. Performance that looks fine during testing degrades under real volume.

Crossing from pilot to production is an organisational readiness question, not a technical milestone. As Home Depot’s EVP of Customer Experience Jordan Broggi put it: “We have spent a lot of time developing guardrails to prevent outright hallucinations” — embedding their own knowledge base rather than relying on LLM providers alone.

What to Validate Before Calling a Deployment “Production”

Compliance: TCPA PEWC capture verified for outbound calling. HIPAA BAA signed with STT, LLM, and TTS vendors. PCI-scope decision documented. State-level requirements mapped: California CIPA, Florida FTSA.

Performance: Containment rate target defined and measured during pilot — not estimated. Latency validated below 600ms under live telephony conditions.

Escalation: Warm transfer with full context tested under failure scenarios. Human fallback SLA defined. Escalation triggers defined: low confidence, caller request, regulated topic.

Monitoring: Post-call scoring active from day one — 100% coverage, not 2–5%. Hallucination rate tracked. Re-contact rate within 72 hours tracked as a resolution quality indicator.

Governance: CTO-level sign-off process defined. Incident response path documented. Audit-ready logging in place for calls involving payments or regulated topics.

Each checkpoint maps to a known failure mode. TCPA and HIPAA violations have real financial consequences. Callers who cannot reach a human file complaints. Seventy-one percent of CX leaders rank hallucinations as a top-three governance risk. Treat this as a failure-mode prevention map, not a bureaucratic gate.

For the full picture of the enterprise state of voice AI, the cluster overview covers the whole landscape — from model selection to governance. For organisations ready to move from evaluation to architecture, replicating the enterprise deployment pattern walks through the build-out step by step.

Frequently Asked Questions

What did Home Depot actually deploy for voice AI in 2026?

Home Depot launched an AI voice agent phone system for in-store customer service in May 2026, confirmed by CX Dive and Newser. In a 50-store pilot, AI agents identify why a customer is calling within 10 seconds and resolve calls four times faster than traditional phone menus. Nationwide rollout across all U.S. stores is planned within one year.

What is the difference between containment rate and deflection rate?

Containment rate is calls fully resolved by the AI without any human transfer — the industry median is 41.2%, and optimised deployments reach 70%. Deflection rate counts any self-service interaction as deflected regardless of whether the issue was actually resolved. A 70% containment rate means 70% of callers never needed to speak to a human.

Is voice AI ROI actually real or is it vendor hype?

Independent data says yes. Forrester documents 331–391% three-year ROI and a 5.4-month median payback. Gartner projects $80 billion in contact centre labour cost savings from conversational AI in 2026. Medical Data Systems generates approximately $280,000/month in automated collections at a 70% containment rate. The ROI is real — contingent on containment rate, use case selection, and compliance architecture.

How long does it take to deploy a voice AI agent in production?

Median time from pilot to production is 4.7 months; top quartile gets there in 2.6 months. Full production — compliance, escalation path, monitoring — typically takes four to twelve weeks regardless of how fast the platform prototype runs.

What does voice AI actually cost per minute in production?

Retell AI’s all-in pricing is $0.07/minute covering platform, STT, LLM inference, and TTS. Vapi’s $0.05/minute becomes 0.25–0.33/minute in production once all components are added. Managed services like PolyAI price at $150,000+/year. At $0.07/minute on a three-minute call, 100,000 calls/month costs approximately $21,000 — compared to approximately $716,000 in human agent cost.

What is the pilot-to-production gap and why does it matter?

Sixty-four percent of enterprise CX teams ran agentic AI pilots in 2026, but only 27% are in full production. Pilots are controlled and low-volume — they do not require production-grade compliance, escalation paths, or monitoring. Full production means the AI handles live interactions at scale with no human review of individual calls.

How does voice AI for outbound calls differ from inbound?

Inbound agents handle calls initiated by customers — lower compliance complexity. Outbound triggers TCPA’s stricter Prior Express Written Consent requirement under the FCC’s February 2024 ruling classifying AI voice as “artificial voice.” Outbound requires consent infrastructure, audit trails, and state-level compliance mapping before any calls go live.

What is a managed voice AI service and when does it make sense over a developer platform?

A managed service (PolyAI, Cognigy) provides the full package — implementation, compliance, support — for $150,000+/year. A developer platform (Retell AI, Vapi, Bland AI) provides the infrastructure for engineers to build and manage agents at per-minute pricing. Managed services make sense when compliance is complex or engineering capacity is limited. Developer platforms make sense when engineering capacity is available and cost-per-minute economics matter.

What is post-call scoring and why does it matter for enterprise deployments?

Post-call scoring uses LLM evaluation to score every AI-handled call for resolution quality, compliance adherence, and accuracy. Traditional QA programmes sample 2–5% of calls; post-call scoring achieves 100% coverage. 3CLogic’s AI Agent Evaluator, debuted at Knowledge26 in May 2026, feeds performance trends and audit trails into systems of record like ServiceNow. For healthcare and financial services, 100% QA coverage is part of the audit infrastructure required for production.

Home Depot Voice Agent Case Study: What Enterprise Deployment Looks Like

What Did Home Depot Actually Deploy — and What Does It Tell Us?

What Is the Difference Between an IVR and a Modern AI Voice Agent?

Why Are 64% of Enterprises Stuck in Pilot While 27% Are in Production?

What Do the Quantitative Case Studies Actually Show?

How Should You Think About Voice AI ROI? The Gartner and Forrester Data

What Does Enterprise Deployment Actually Cost? Managed Service vs. Developer Platform

What Separates a Successful Production Deployment from a Stalled Pilot?

What to Validate Before Calling a Deployment “Production”

Frequently Asked Questions

What did Home Depot actually deploy for voice AI in 2026?

What is the difference between containment rate and deflection rate?

Is voice AI ROI actually real or is it vendor hype?

How long does it take to deploy a voice AI agent in production?

What does voice AI actually cost per minute in production?

What is the pilot-to-production gap and why does it matter?

How does voice AI for outbound calls differ from inbound?

What is a managed voice AI service and when does it make sense over a developer platform?

What is post-call scoring and why does it matter for enterprise deployments?

Related Articles

WFH vs RTO: How and why we transitioned

Personal AI Assistants Are Here And They Are Lobsters

AI is going to make remote teams and video calls less frustrating

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG