Your AI project is stalling. The model is fine. The data pipeline is fine. But the agents keep failing in ways nobody can fully explain, and every debugging session leads back to the same murky territory: the API estate underneath everything.
That is where API sprawl is the uncontrolled proliferation of APIs — redundant endpoints, inconsistent contracts, untracked integrations — quietly accumulating as a by-product of team autonomy and growth. With AI agents discovering and invoking APIs at runtime, uncontrolled sprawl stops being a management headache and starts being a deployment blocker.
This article is part of our AI-ready API architecture overview series. It gives you a diagnostic framework and a vendor-neutral self-audit checklist you can complete in one working day, without buying any tooling, whether your company has 50 employees or 500. By the end you will have a score and a remediation sequence — not just a problem statement.
What is API sprawl and why does it form in every growing tech company?
API sprawl is a structural outcome of growth. It emerges from the way microservices architecture, agile delivery, and team autonomy interact — not from bad individual decisions.
The mechanics are pretty straightforward. Microservices rewards small, independently deployable services, and each one adds its own API. Agile rewards shipping speed, so documentation lags. Teams solve local problems with local APIs because existing ones are hard to find, so developers reinvent the wheel. Third-party integrations pile up. If you cannot easily answer “how many APIs do we have?” or “are there duplicates?” — you have sprawl. For context: only 10–20% of APIs in the average large organisation meet a documentation gold standard.
Kin Lane, founder of Naftiko, coined “API sediment” to describe what happens in growing organisations. Layers of WSDL, Swagger/OpenAPI, GraphQL, and now MCP accumulate as technology evolves, each leaving undocumented artefacts behind. You do not migrate from one integration standard to another — you add another layer on top.
Three mechanics turn normal growth into sprawl: API versioning debt, where multiple live versions run with no deprecation enforcement; undocumented testing or integration endpoints that were never removed; and acquired services added without central registration during a fast acquisition or rushed product launch.
The shadow IT analogy applies directly. Shadow IT emerges when developers bypass slow procurement. Shadow APIs emerge when they bypass slow API governance. The incentive structure produces the outcome.
Why was API sprawl tolerable before AI agents — and why is it a blocker now?
AI agents invoke APIs at runtime using semantic cues, not institutional memory. They find everything reachable — including everything unmanaged. Human developers compensate for bad documentation by calling a colleague. Agents have no colleagues to call.
This creates three categories of damage.
Discovery failure. Without a centralised catalogue, agents operate with insufficient context — making tool choices based on naming and availability rather than governance status or deprecation state.
Security exposure. Shadow APIs bypass central gateways, authentication enforcement, and rate limiting — exposing tokens through interfaces that lack the controls required for GDPR or SOC 2. At agent invocation speed, the exposure accumulates fast.
Reliability failure. Probabilistic models send requests that deviate slightly from what a backend expects, leading to JSON mismatches and silent data pollution. Zombie APIs make this worse — malformed responses trigger retry loops that nobody notices until the damage is done.
The trust consequence is the most damaging. When agents fail because of bad APIs, the failure looks like an unreliable AI product. The board sees an expensive initiative that does not deliver. The real cause stays invisible.
In 2002, Jeff Bezos required all Amazon teams to expose data only through externalizable service interfaces — no direct database access, no back-channels. The mandate was about building a clean, governable runtime surface. That is exactly what AI agents now require.
What are shadow APIs and zombie APIs, and why do they become critical when AI agents are involved?
Precise definitions matter here because the failure modes are different.
Shadow APIs are endpoints created outside formal IT or security oversight — typically for testing or quick integrations — that were never registered, reviewed for security, or connected to central gateways or authentication systems.
Zombie APIs are deprecated or retired API versions that were never formally shut down — still running, often on unpatched infrastructure, carrying old vulnerabilities and outdated libraries.
Both fall under OWASP API9:2023 — Improper Asset Management — the classification that translates this risk into language your board and auditors will recognise.
The agent-specific failure sequences are what make these dangerous at scale. For shadow APIs: an agent discovers an unregistered endpoint, invokes it, bypasses authentication and rate limiting — the session is unmonitored. Across thousands of requests, exposure accumulates silently.
For zombie APIs: an agent navigates to a deprecated endpoint that still resolves, receives a malformed response, and retries. Each retry triggers side effects — database writes, notifications, charges, downstream calls. The task fails without anyone knowing why.
The consequences are real and documented. The Optus breach traced back to a shadow API with no authentication — nearly 10 million customer records scraped. Stripe’s deprecated /v1/sources endpoint, missing modern fraud detection and rate limiting, was used to validate stolen card data across dozens of retailers. T-Mobile’s $31.5 million settlement in 2024 involved 76 million records exposed via an unmanaged API. At agent scale and speed, these exposures become proportionally harder to detect and contain.
For a deeper look, how sprawl amplifies authorisation risk is covered in the next article in this series.
How do I audit my company’s API estate for AI-readiness today?
One person with access to your version control system, infrastructure documentation, and API gateway configuration can complete this in a single working day with no vendor tooling. Each section scores 0 to 5. Maximum total is 25. Three thresholds: 20–25 is AI-ready; 10–19 is remediable in 90 days; below 10 means structural remediation is required before deploying AI agents.
Section 1 — Inventory (0–5)
Do you have a count of all APIs your organisation operates? (1 point). Are all APIs registered in a central location — a spreadsheet, a wiki, or a catalogue tool? (1 point). Does the registration include internal, partner, and external APIs? (1 point). Were any APIs created in the last 12 months without a registration record? (deduct 1 point per “yes”). Do you know which APIs are in production versus staging versus development? (1 point).
Section 2 — Documentation (0–5)
Does each API have an OpenAPI Specification file? (1 point). Are descriptions sufficient for someone unfamiliar with the codebase to understand the API’s purpose? (1 point). Do endpoint descriptions include example requests and responses? (1 point). Are error codes documented with semantic meaning — not just HTTP status codes? (1 point). Was documentation updated in the last 6 months? (1 point).
Section 3 — Ownership (0–5)
Does every API have a named owner — a person, not a team? (1 point). Is the owner reachable and currently employed at the organisation? (1 point). Is there a documented deprecation process? (1 point). Has the deprecation process been executed at least once in the last 12 months? (1 point). Are ownership records stored in a central, queryable location? (1 point).
Section 4 — Authentication Coverage (0–5)
Are there any APIs accessible without authentication? (deduct 2 points if yes). Are API keys the only authentication mechanism — no OAuth, no JWT? (deduct 1 point if yes). Are API keys rotated on a defined schedule? (1 point). Are authentication requirements documented in the OpenAPI spec? (1 point). Are there any APIs accessible from the public internet without rate limiting? (deduct 1 point if yes).
Section 5 — Lifecycle Status (0–5)
How many APIs have not been updated in over 12 months? (deduct 1 point per cohort of 10%). Are there APIs with no active consumers in the last 90 days? (deduct 1 point if yes). Is there a formal sunset date process for deprecated versions? (1 point). Are active API consumers notified before a version is sunset? (1 point). Is lifecycle status tracked centrally? (1 point).
Below 10 means an AI agent operating over this estate will fail unpredictably. A score of 20 or above means the estate is safe for AI agent deployment.
The five dimensions also tell you where to focus. Low scores on Section 4 indicate shadow API risk — your most immediate security liability. Low scores on Section 5 indicate zombie API accumulation. Low scores on Sections 2 and 3 mean that dynamic tool discovery and MCP integration will fail even if you get agents deployed.
Use the score in your next leadership conversation: not “we have 127 APIs” but “our estate scores 11/25 and here is the 90-day path to 18/25.”
Where do you start when remediating API sprawl — and how do you do it without halting product delivery?
The remediation hierarchy has four steps, and the order matters. Each step enables the next.
Step 1 — Catalogue first. Your starting action, regardless of score. A catalogue requires no architectural changes, no code refactoring, and no platform team. It can begin as a structured spreadsheet: API name, owner, URL pattern, auth type, OpenAPI spec link, lifecycle status. That spreadsheet is immediately more valuable than no central record.
Spotify’s Backstage is the canonical open-source catalogue implementation. Every API gets a catalog-info.yaml file alongside the code — name, description, owner, lifecycle status. Backstage implements Golden Paths: opinionated, pre-approved routes for building services that embed governance by default. The honest caveat: adoption outside Spotify is often low — around 10% — because teams burn out on maintenance. Roadie, the managed Backstage offering, removes that overhead. For teams with fewer than 50 APIs, a structured Notion template is a perfectly valid starting point. The right catalogue is the one your team will actually use. Step 1 requires one afternoon from one engineering lead. Not a sprint.
Step 2 — Governance process. Every new API must be registered before it ships; every deprecation must be announced with a sunset date; an owner must be named at creation. One governance document, one Slack channel, two to four weeks to propagate into active practice.
Step 3 — Shift-left enforcement. OpenAPI spec linting in CI/CD catches documentation gaps before they reach production. Schema validation prevents contract drift. A one-sprint effort once you have selected your linting tooling.
Step 4 — API Scoring in CI/CD. Automated scoring gates in CI/CD prevent APIs that do not meet the readiness threshold from being promoted to production. Another one to two sprints.
A team that completes only Step 1 is materially safer than one that has not started.
API-first strategy is the structural cure that prevents sprawl from recurring — and cataloguing is the gateway to it. The full strategic treatment is a separate article in this series.
What does an AI-ready API estate look like — and how do you know when you’ve reached it?
An API estate is AI-ready when five criteria are met: every API is fully inventoried in a central catalogue; documented with a machine-readable OpenAPI Specification sufficient for an LLM to understand purpose and usage without human assistance; assigned to a named, reachable owner with an active deprecation process; protected by at least OAuth or JWT — no unauthenticated public endpoints; and in a known lifecycle state with a defined sunset process.
That is the benchmark the self-audit score targets. A score of 20–25 corresponds to meeting all five criteria.
Concretely: agents can discover authorised APIs through the catalogue, interpret their contracts without human guidance, invoke them safely within defined auth boundaries, and fail gracefully when an endpoint is sunsetted. No shadow endpoints, no zombie calls, no silent retry loops.
Strategically: a governed API estate enables MCP integration, dynamic tool discovery, and eventually an Agent Management Platform. Without it, every AI initiative requires a human to manage the discovery and invocation layer manually — which removes most of the operational value of deploying agents at all.
A 200-person SaaS company with a functioning engineering team can reach 18–20/25 on the AI-readiness audit within 60–90 days without a dedicated platform team.
How does API sprawl connect to the rest of the AI-ready API architecture challenge?
API sprawl remediation is the pre-condition for every other architectural decision in this series. The dependency chain is direct.
Your team cannot implement MCP reliably without a governed API estate — MCP is agent-first, which means it risks bypassing the governance and security controls your organisation has spent years building. You cannot close the authorisation gap without knowing which APIs exist and what authentication mechanisms they use. You cannot build an API-first strategy on top of an ungoverned estate.
The five audit dimensions map directly to the articles in this cluster. Section 2 below 3? Start with why poorly described APIs break MCP at scale. Section 4 below 3? Start with how sprawl amplifies authorisation risk. Ready to prevent recurrence? Move to the API-first approach that prevents sprawl from recurring. For the full picture, the full API agent-era series covers every layer.
One thing worth acknowledging: the hardest part of sprawl remediation is not technical. Getting teams to register APIs, maintain documentation, and honour sunset processes requires governance culture, not just tooling. Organisations that treat their API catalogue as a product consistently report the productivity gains that justify the investment.
The goal is an API estate a machine can navigate as reliably as a human — fully inventoried, thoroughly documented, securely authenticated, and governed through its lifecycle. Your self-audit score tells you exactly where to start.
Frequently Asked Questions
What is the difference between an API catalogue and an API gateway?
An API catalogue is a visibility and governance tool: a searchable registry of all APIs including ownership, documentation, and lifecycle status. It does not route traffic. An API gateway is a runtime infrastructure component: it manages traffic routing, rate limiting, and authentication enforcement. It does not provide discovery or governance. Both are required for an AI-ready estate, but the catalogue must come first.
What is the difference between an API catalogue and an API registry?
An API registry is narrower: a machine-readable record of API locations, versions, and schemas, optimised for service discovery. An API catalogue is broader: it adds human-readable documentation, ownership records, lifecycle status, and metadata for governance and AI agent discovery. For AI-readiness purposes, a catalogue is the correct target.
Does fixing API sprawl require shutting down live APIs?
Not immediately. The remediation hierarchy begins with inventory, not decommissioning. Shadow APIs and zombie APIs can be quarantined — removed from agent-discoverable endpoints, flagged for deprecation — without immediate shutdown. The goal is not to reduce API count but to bring every API under governance.
How long does a typical API estate remediation take for a 200-person SaaS company?
Step 1 takes one afternoon to initiate and one to two weeks to populate. Step 2 takes two to four weeks to propagate. Step 3 is a one-sprint effort. Step 4 adds one to two more sprints. A 200-person SaaS company can reach 18–20/25 on the AI-readiness audit within 60–90 days without a dedicated platform team.
What tools are available for API catalogue management for small teams without platform engineering resources?
Spotify Backstage is the most capable open-source option, but self-hosting requires ongoing maintenance. Roadie removes that overhead. SwaggerHub provides commercial catalogue management at lower setup cost. For teams with fewer than 50 APIs, a structured Notion or Confluence template is a viable starting point. The right starting tool is the one your team will actually use.
What is OWASP API9 and why is it relevant to API sprawl?
OWASP API9:2023 — Improper Asset Management covers security risks from untracked, undocumented, or improperly retired API endpoints — shadow APIs and zombie APIs both. This classification lets you translate API sprawl risk into language meaningful to boards and auditors. Remediating OWASP API9 is a direct output of completing the inventory and lifecycle dimensions of the self-audit.
Why do AI agents invoke shadow APIs and zombie APIs rather than avoiding them?
Agents discover APIs using semantic cues — endpoint naming, documentation, discoverability signals — not institutional knowledge. They cannot tell an endpoint is unregistered or deprecated unless that status is encoded in the catalogue. Shadow APIs surface through developer docs never removed. Zombie APIs are discoverable because deprecated does not mean unreachable — the endpoint still resolves and the agent treats it as valid.
What is API-first design and how does it prevent sprawl from recurring?
API-first design means defining the OpenAPI Specification before writing implementation code. Governance requirements — documentation, ownership, auth, lifecycle — get encoded into the spec and enforced before a line of code is written. Teams cannot accidentally create shadow APIs because specification review requires registration before development starts. The full strategic treatment is in a separate article.
What happens when an AI agent hits a zombie API endpoint mid-task?
The agent sends a request to a deprecated endpoint that still resolves, gets a malformed response or empty body, and retries — because the error is unexpected. Each retry may trigger side effects: database writes, notifications, charges, downstream API calls, compounding with each iteration. The failure is invisible to stakeholders who observe only that the agent did not complete the task.
Can I fix API sprawl incrementally or do I need a dedicated project?
Incrementally. Step 1 can be started by one person with a spreadsheet and a free afternoon. Steps 2 through 4 are each bounded efforts that fit into normal sprint cycles. The key is sequencing — Step 1 before Step 2 before Step 3 before Step 4. Completing Step 1 alone puts you in a meaningfully better position than doing nothing.
How do I know if what I have is API sprawl or just a large API portfolio?
Size is not the defining criterion. A large, well-governed API portfolio is not sprawl. A 20-API estate with no documentation, no ownership records, and no deprecation process is. The defining characteristics: APIs discoverable but not registered; owners who cannot be identified; API versions live but with unknown consumers; documentation that does not exist or is not maintained. The self-audit rubric is the diagnostic — below 15/25 indicates sprawl regardless of total API count.