If you were a senior engineer between 2012 and 2016, you know exactly how cloud account sprawl happened. AI agent sprawl is following the same trajectory — and it is moving faster.
Fifty-nine per cent of enterprises are now moving AI agents from pilots to production, yet only 43% have formal AI governance policies. This is a practical pre-deployment governance playbook: what agent sprawl is, what IBM did to govern agents across 280,000 employees, and what to put in place before your first agent goes live. The platforms being governed are covered in our overview of the enterprise agent platform war.
What is AI agent sprawl and why does it matter before your first deployment?
Agent sprawl is the uncontrolled proliferation of AI agents across an organisation without centralised visibility, governance, or oversight. Gartner has a name for what is rapidly becoming a board-level problem: no single view of what agents exist, no cost transparency, and no accountability when they act.
More than three million AI agents are now operating within corporations globally, and only 47.1% are actively monitored. Uncoordinated agents produce what CIO.com calls token haemorrhaging — redundant API calls from agents that duplicate context retrieval without any awareness of each other. Shadow AI-related breaches now cost an average of $4.63 million per incident.
Here is the thing: sprawl begins before a fleet exists. Without a registry or lifecycle process in place from the very first deployment, every subsequent agent inherits an ungoverned baseline. Retrofitting governance onto a running fleet is significantly harder than building it upfront — and nobody wants to be cleaning that up a decade later.
Presidio has named this the Governance Paradox: the enterprises moving fastest on AI agent adoption are simultaneously the most governance-exposed.
Why AI agent sprawl follows the same pattern as cloud account sprawl from 2012
Remember cloud sprawl? Between 2012 and 2016, business units provisioned AWS and Azure accounts directly to bypass IT procurement. By the time IT had any visibility, there were hundreds of accounts with no tagging, no cost allocation, and no decommission process.
With AI agents, the same bypass is in play. Low-code agent builders make it trivially easy for any domain expert to deploy without IT involvement. Microsoft’s Cyber Pulse 2026 survey found 29% of employees admit to using unsanctioned AI agents at work. The shadow IT layer exists before a single governance policy has been written.
IBM CIO Matt Lyteson names the parallel directly: “A lot of CIOs like myself still have a little bit of anxiety and stress over what happened in early days of cloud computing, where everyone somehow found a way to get access to a cloud account, and now we’re 10, 15, 20 years later, still cleaning some of those things up.”
Three parallels worth naming: business units bypassing IT (discovered after the fact); costs invisible until a centralised audit surfaces them; security exposure from agents with no access log, no permission boundaries, and no clear ownership.
Cloud sprawl was tamed by enforced tagging, cost allocation accounts, and centralised visibility tooling. The agent-era equivalent is an agent registry, an acceptable-use policy, and runtime monitoring — established before the second agent is deployed.
How IBM governed AI agents across 280,000 employees — the Ask IT case study
IBM deployed Ask IT — an AI-powered IT support agent — across its 280,000-employee workforce in approximately 100 days. Its resolution rate: 81–82% without human escalation. That number comes directly from Matt Lyteson, who monitors it as IBM’s primary drift detection signal. When it drops from roughly 82% toward 75%, IBM investigates immediately.
IBM built three governance layers before deploying Ask IT.
AI Licence to Drive: A certification program requiring employees to demonstrate data privacy, information security, and system integration competency before building or deploying agents. Without it, someone builds a critical agent and then says “I don’t have the skills or resources to maintain this” — an unhappy conversation for any CIO.
AI Fusion Teams: Cross-functional groups pairing business domain experts with CIO-aligned IT engineers. Together they build and own agents end-to-end.
WatsonX Governance: Runtime monitoring across the full operational stack — token usage per agent, success rates, latency, user satisfaction, and trace-backed anomaly detection.
IBM streamlined AI project intake from weeks to roughly five minutes to provision a full environment. The governance infrastructure is exactly what made that speed possible.
What is the AI Licence to Drive and how do you adapt it for an SMB?
The AI Licence to Drive answers a question that applies at any company size: who is certified to deploy an agent in production — not simply who has access to the tooling to build one.
A smaller-scale equivalent has three practical components.
- A documented acceptable-use policy: What can agents access? What actions require human approval? What data is off-limits? It needs to exist and be signed off before any agent touches production data.
- A review gate before production deployment: A five-minute checklist, not a committee process. Its purpose is to create a record — owner identified, purpose documented, permissions scoped.
- Registry entry as the price of admission: No registry entry means no approval to run. This is what prevents every subsequent agent being deployed informally and without oversight.
The AI Fusion Team model is equally portable. The minimum equivalent is a named pairing: one domain expert and one technical owner for each agent, both accountable for its governance and performance. That’s it. You do not need an enterprise programme to get the principle right.
How do you set human-in-the-loop policies before deploying AI agents?
Human-in-the-loop (HITL) governance is not a binary switch. Define which actions require human review based on actual risk, using a four-tier model.
- Low risk — Read-only data access, summarisation, internal lookups: Agent proceeds autonomously.
- Medium risk — Writing to internal systems (CRM, databases, ticketing): Human approval required before first autonomous write.
- High risk — External communications (emails, notifications, API calls to third parties): Mandatory human approval gate, always.
- Critical — Financial transactions, compliance-sensitive decisions: Human-in-the-loop non-negotiable; full audit trail required.
For each agent, document which risk tier its actions fall into and the corresponding HITL policy. That document is the agent’s permission boundary.
The HITL policy is policy governance — what agents should do. WatsonX Governance and equivalent monitoring tools are runtime governance — what agents are doing. Both are required. Deterministic guardrails are the technical foundation for these governance policies — and they work alongside HITL to constrain autonomous action at the code level.
What are token costs and model drift, and why do they surprise CTOs after deployment?
Token cost management is the metric Matt Lyteson specifically flags as a surprise: “I can see on a daily basis last week, what did it cost me for this specific AI use case? Why did that spike? Do we have unanticipated costs because they’re using more tokens than we thought?”
Eighty-five per cent of organisations misestimate AI costs. The mechanism is token haemorrhaging — uncoordinated agents duplicate requests and run redundant pipelines. Track Orchestration Efficiency (OE): successful multi-agent task completions versus total compute cost. Low OE means agents are competing for resources and eroding ROI.
Model drift is a separate risk, and it’s a sneaky one. An agent’s outputs change over time as real-world data patterns diverge from its training baseline — without any code change. A broken API call throws an exception; an agent reasoning failure produces confident, plausible output that is completely wrong. No error. No alert. No log entry.
The minimum viable monitoring set: token cost per agent workflow, success or failure rate, and a user satisfaction signal. Three metrics. Start there.
What does your governance checklist look like before the first enterprise AI agent goes live?
Enterprise AI readiness is an operating model decision, not a technology choice. Here is the checklist.
Seven-item pre-deployment governance checklist:
- Agent registry established: Every agent has a named owner, documented purpose, and scoped permissions before it runs. Not after. Before.
- Acceptable-use policy documented: Defines what agents can and cannot access, what actions require human approval, and what constitutes a governance breach.
- HITL policy written for each agent: Specifies which risk tier each agent’s actions fall into and the corresponding approval requirement.
- Token cost monitoring instrumented: Baseline API cost per workflow measured from the first run. Alert thresholds set.
- Success-rate monitoring live: A primary performance metric defined and tracked from day one — your drift detection signal.
- Agent lifecycle owner assigned: A specific person — not a team — is accountable for monitoring, iterating, and eventually retiring this agent.
- AI Licence to Drive equivalent in place: Whoever built the agent has been reviewed against the acceptable-use policy and the deployment approved through the review gate.
Treat each agent like a new staff member: onboard deliberately, monitor continuously, retire explicitly. IBM’s 100-day Ask IT deployment across 280,000 employees was fast precisely because the governance framework existed before it started.
Note that your governance options are constrained by your platform choice — platforms vary significantly in what monitoring, registry, and lifecycle tooling they expose. And if you are still evaluating the platforms you will be governing, the governance feature set should be a primary selection criterion. The Databricks Enterprise AI Maturity Model provides a useful self-assessment framework to gauge your organisation’s readiness before committing to production.
FAQ
What is agent sprawl and how is it different from shadow IT?
Agent sprawl is Gartner’s term for uncontrolled AI agent proliferation without centralised oversight. Shadow IT describes unsanctioned software adoption. The key difference is action: an unauthorised AI agent that makes decisions, sends external communications, or modifies records is orders of magnitude more dangerous than a passive software tool.
Do I need WatsonX Governance to govern AI agents?
No. The minimum viable equivalent is three instrumented metrics — token cost per workflow, success rate, and user satisfaction signal — plus a centralised registry. WatsonX Governance is a reference implementation, not a prerequisite.
What is the IBM AI Licence to Drive and can I use it at a smaller company?
IBM’s AI Licence to Drive requires employees to demonstrate responsible AI competency before building or deploying agents. A smaller-scale equivalent: a documented acceptable-use policy, a sign-off review before production deployment, and a registry entry as the condition of approval. The principle — the right to build is earned, not assumed — applies at any company size.
How do I know if my organisation is ready to move from AI pilots to production?
The Databricks Enterprise AI Maturity Model identifies three readiness indicators: CEO-level ownership of the AI data strategy, an operating model decision on centralised versus federated governance, and at least one production agent with instrumented monitoring. If all three are absent, production deployment will create governance debt faster than it creates value.
What actions should always require a human in the loop?
Three action types always require human review: external communications sent on behalf of the organisation, financial transactions or approvals, and compliance-sensitive decisions in regulated industries. Writing to production systems for the first time should also require human approval until the agent’s behaviour is predictable.
What is the governance paradox in enterprise AI adoption?
The governance paradox, named by Presidio: the enterprises moving fastest on AI agent adoption are simultaneously the most governance-exposed. Treat governance as an enabler of sustained speed. IBM’s 100-day Ask IT deployment is the evidence that governed deployment is not slower deployment.
What is an AI agent registry and do I actually need one before my first deployment?
An AI agent registry documents every agent in production: name, owner, purpose, data access permissions, and monitoring status. Establishing it as the price of admission before the first deployment prevents every subsequent agent being deployed without it.
How do I measure ROI from enterprise AI agent deployments?
Track three categories: resolution rate (IBM’s Ask IT benchmark is 81–82%); Orchestration Efficiency (successful multi-agent task completions versus total compute cost); and human hours redirected from tier one and tier two support to higher-value work.
What is model drift and how quickly does it affect production AI agents?
Model drift is the gradual degradation of an AI agent’s outputs as real-world data patterns diverge from its training baseline — without any code change. IBM monitors Ask IT’s resolution rate as the leading indicator; a drop from 82% toward 75% triggers investigation.
Is AI governance only for large enterprises, or does it apply to SMB companies?
Governance applies at any scale, but tooling and process complexity should be proportional. Three agents need a registry, an acceptable-use policy, and instrumented monitoring — not an enterprise governance platform. The IBM case study is a reference architecture, not a copy-paste template. Return to the enterprise agent platform war overview to see how governance fits within the broader landscape.