Business

SaaS

Technology

•

May 6, 2026

Why MCP Alone Cannot Scale Enterprise AI and What Comes Next

If your team adopted the Model Context Protocol, you made the right call. MCP is the first open standard purpose-built for connecting AI agents to APIs and external tools, and it solves that connection problem cleanly. Feel good about that work.

But here’s the thing — MCP was built to solve the connection problem, not the scale problem. And once you’re managing 50-plus APIs, inconsistent documentation, or agents deployed across multiple teams, you’re going to hit the same three issues every time: context explosion, discovery ambiguity, and governance gaps. These aren’t edge cases. They’re predictable. They happen to everyone.

This article validates MCP’s genuine role, names the three mechanisms that cause it to fail at scale, and gives you a practical decision framework for when to move beyond it. It’s part of the full API agent-era architecture series — the pillar page has the broader picture if you need it. No prior familiarity with Arazzo or Agent Experience (AX) assumed. Working knowledge of MCP is.

What does MCP actually do and why is it the right first step for enterprise AI?

The Model Context Protocol is an open standard published by Anthropic in November 2024. It defines a standardised interface through which LLM-based agents interact with external tools, resources, and APIs — so each agent integration doesn’t have to reinvent how it calls external services.

Before MCP, every AI agent implementation had its own proprietary mechanism for discovering and calling tools. This created an M×N integration problem — M models times N tools equalling M×N custom integrations. MCP turns that into M+N. Each model implements MCP once. Each tool implements it once. The combinatorial explosion disappears.

By December 2025, MCP had over 5,800 servers and 300 clients. OpenAI adopted it across the Agents SDK and ChatGPT Desktop in March 2025. Google DeepMind confirmed support in April 2025. In December 2025, Anthropic donated the protocol to the Linux Foundation’s Agentic AI Foundation to keep it open and community-governed. The official specification is at modelcontextprotocol.io.

MCP is well-supported across LangChain, the OpenAI Agents SDK, and most custom implementations. If you built on MCP, you built on the right foundation. What follows is a critique of raw MCP exposure at scale — not a critique of the protocol itself.

Why does MCP break at enterprise scale — and what are the three failure modes?

Nordic APIs put it plainly: “You cannot simply expose a large environment of resources and tools to your LLM and expect that to work well. It’s not MCP’s fault, because it was never intended to solve that class of problems.”

Three distinct failure modes emerge as an API estate grows. They’re predictable, not exceptional.

Failure Mode 1: Context explosion

Anthropic’s own data shows that 50 MCP tools require approximately 72,000 tokens just for definitions — 77,000 tokens total before any task execution begins. Five MCP servers with 20 tools each gives you 100 tools. Every incoming task loads all 100 definitions before doing anything useful. The context window is not a resource — it is a bottleneck.

Failure Mode 2: Discovery ambiguity

When an agent must select the right tool from dozens or hundreds of options, LLM non-determinism becomes a reliability problem. Anthropic’s evaluation data shows tool selection accuracy degrades once models see more than 30 to 50 tools. The same task can invoke different tools on different runs. Agents hallucinate API calls or misroute requests when tool descriptions are ambiguous, incomplete, or overlapping.

Failure Mode 3: Governance absence

Raw MCP exposure creates no natural chokepoint for security policy, cost controls, or access governance. Every tool in an MCP server is equally visible and equally callable. Research by Equixly found 43% of public MCP server implementations contain command injection flaws. MCP tool poisoning attacks achieve 84.2% success rates in controlled testing when agents have auto-approval enabled.

Full governance treatment is in the governance layer your scaled AI tooling requires. The point here: the absence of a policy layer compounds with every tool you add.

The MCP fulcrum

These three failure modes are the practical signals that you’ve reached the MCP fulcrum — the point at which a growing AI deployment has outgrown raw MCP exposure. It’s not a sign MCP was a mistake. It’s a sign your deployment has grown to the point where it needs architecture around MCP, not instead of it.

What is dynamic tool discovery and how does it change the scaling equation?

Dynamic tool discovery is the runtime mechanism by which AI agents select and load only the tools required for a specific task, rather than loading all definitions at initialisation.

Anthropic calls this “progressive tool disclosure” internally — load what the current task needs, defer the rest. The token reduction is structural, not incremental. Moving from static loading (77,000 tokens for 50-plus tools) to dynamic selection drops consumption to approximately 8,700 tokens — an 89% reduction in tool-related token overhead. (Source: Lunar Dev analysis.) The upfront semantic search costs roughly 500 tokens; a typical search returns 3 to 5 relevant tools at around 3,000 tokens total.

The mechanism is straightforward. When a task arrives, the agent sends the task description to a semantic search layer operating against the full API catalogue. The search returns the most relevant tool subset. The agent reasons only over that subset. Everything else is invisible for that task.

There’s a governance benefit too: dynamic discovery produces an explicit shortlist that can be logged and audited. Tools not in the shortlist cannot be called accidentally. Contextual least-privilege, arriving naturally from the architecture.

The hard requirement: dynamic discovery needs high-quality tool descriptions to work. An agent cannot dynamically discover a tool it cannot understand from its description. That’s the AX quality dependency — which the next section addresses.

What are workflow-aligned tools and why do they add a determinism layer that raw MCP cannot?

Workflow-aligned tools are purpose-built, deterministic tools that abstract complex multi-API interactions into a single agent-callable interface.

Here’s a concrete example. A credit eligibility check might require six separate API calls: check customer record, verify account standing, retrieve credit score, query risk model, apply discount rules, return eligibility result. Raw MCP exposure of those six APIs requires the agent to orchestrate the sequence correctly on every run. LLM non-determinism means it may take different paths, make different intermediate calls, or stop early.

A workflow-aligned tool wraps that sequence into a single callable. The agent calls one tool. The workflow executes predictably regardless of LLM variability. Deterministic execution also means inputs can be tested, sanitised, and errors handled in actual code rather than relying on the LLM to format API requests correctly.

Nordic APIs describes it as analogous to API gateways from 15 years ago — a single controlled access point that abstracts backend complexity and enforces the correct interaction pattern. Workflow-aligned tools also shrink the discovery space: wrapping a six-call sequence into one callable removes five tools from the semantic search pool.

A note on the Arazzo Specification: Arazzo is the emerging standard for defining workflow-aligned tools — see the FAQ below.

MCP versus workflow-aligned tools versus direct REST invocation: how do I decide?

The primary variable is API estate size, with description quality and governance as modifiers.

API estate fewer than 50 APIs, descriptions complete, governance in place: MCP is sufficient. Maintain AX quality and monitor estate growth.

API estate 50–500 APIs, or descriptions are inconsistent or sparse: Graduate to enriched documentation and workflow-aligned tools. Dynamic tool discovery is your target architecture.

API estate over 500 APIs, or multiple teams consuming agents, or production AI in critical paths: Implement full dynamic discovery and a policy layer. Evaluate AMP tooling. See the governance article.

Any scale — shadow APIs or undescribed endpoints: Do not expose via MCP. Catalogue and remediate first. See diagnosing the API estate quality problems that cause MCP to fail.

A few things the tiers alone can’t capture:

MCP versus direct REST invocation. MCP is appropriate when the agent needs to discover tools at runtime or the tool ecosystem is expected to change. Direct REST invocation is appropriate for stable, well-defined single-endpoint calls. If the agent always calls the same three endpoints in sequence, a workflow-aligned tool is better than either.

The framework question. LangChain and the OpenAI Agents SDK support MCP tool integration but neither resolves context explosion, discovery ambiguity, or governance absence. These are API estate architecture problems, not framework problems. Framework selection is orthogonal to the MCP scaling decision.

Use this as a starting point, not a hard rule. A 40-API estate with poor descriptions is in worse shape than a 60-API estate with complete documentation.

What is agent experience (AX) and what do your APIs need to provide for agents to use them well?

Agent experience (AX) is the quality dimension of APIs as consumed by AI agents rather than human developers. A human developer reads documentation, infers intent from examples, and asks questions when unclear. An AI agent reads the machine-readable tool description, selects tools by semantic matching, and invokes them based on that description alone — no gap-filling intuition, no ability to ask for clarification.

What AX-quality API documentation needs in practice:

Tool name: Verb-noun format describing the action and object — what the tool retrieves and from where, not a version-tagged internal label.
Description: One clear sentence of what the tool does and when to use it versus adjacent tools.
Parameters: Named for semantic meaning (not internal variable conventions), with type, required/optional status, and a concrete example value.
Error responses: Structured, machine-readable error codes the agent can act on — not HTTP status codes alone.
Usage examples: At least one complete request/response example per tool.

AX quality is the prerequisite for dynamic discovery to function. Perfect dynamic discovery infrastructure still produces wrong tool selections if your descriptions are thin.

API scoring — automated quality evaluation of tool descriptions for AX fitness — can be integrated into CI/CD as a quality gate. See API-first design producing the agent experience qualities MCP requires for a full treatment.

What is the five-step AI enablement ladder and where does MCP fit within it?

The five-step AI enablement ladder is a progressive model for scaling an organisation’s AI tooling, drawn from Nordic APIs’ “Beyond MCP” analysis.

Step 1 — More Tools via MCP: Connect agent infrastructure to APIs using MCP. The right starting point for most organisations.

Step 2 — Better Tool Descriptions (AX quality): Systematically enrich API documentation for agent readability. Apply API scoring in CI/CD as a quality gate. Runs in parallel with Step 1; prerequisite for Step 3.

Step 3 — Dynamic Tool Discovery: Move from static tool loading to runtime tool selection via semantic search. Required at 50-plus APIs. Breaks the context explosion ceiling.

Step 4 — Workflow-Aligned Tools: Abstract frequent multi-API interaction patterns into single deterministic callables. Required for production AI in critical paths. Replaces probabilistic chaining with predictable execution.

Step 5 — Safe/Sandboxed Tool Usage: Run agents against mocked or isolated environments before production deployment. Surfaces unexpected interaction patterns and validates workflow-aligned tools without production risk.

Beyond Step 5 — Full AMP Tooling: Gartner has named the AI Agent Management Platform as the enterprise destination for organisations operating AI agents at scale. The AMP category provides the full governance infrastructure: security gateway, curated agent library, centralised tooling management, dashboard and registry, marketplace for third-party agents, and full observability across the agent lifecycle.

The market projections signal where enterprise investment is heading. Gartner projects $15 billion in AMP spend by 2029, up from under $5 million today. By 2027, 75% of enterprises will consider their agent monitoring methodology their most important AI tool, up from 1% today.

Steps 1 and 2 run in parallel. Step 5 can begin as soon as Step 1 is in place. The tiers matter; the sequence is approximate.

The five-step ladder sits within a broader architectural picture. If you’re deciding which layer to tackle first across your entire API estate, the AI-ready API architecture guide maps how MCP scaling, authorisation, API-first strategy, and supervised execution connect as a unified programme.

Frequently Asked Questions

Do I need to throw away my MCP implementation if I want to scale?

No. MCP stays in place at every step of the five-step ladder. What changes is what sits around it: enriched tool descriptions at Step 2, a dynamic discovery layer at Step 3, and workflow-aligned tools at Step 4. You’re not replacing the MCP server — you’re governing it better. Your investment is preserved and extended, not discarded.

What is the Arazzo Specification and do I need to learn it?

Arazzo is an OpenAPI-adjacent specification for describing multi-step API workflow sequences in a machine-readable format — effectively the definition schema for workflow-aligned tools. Whether you implement it directly depends on how you build those tools. Teams already using OpenAPI tooling will find it familiar; teams building workflow tools in code may prefer their own abstraction. Either way, it’s worth knowing as the emerging standard for this problem space.

What is the difference between MCP and an AI gateway?

MCP is a protocol — it defines how an agent communicates with tool providers. It does not enforce policy, manage costs, or provide observability. An AI gateway is an infrastructure component that enforces rate limiting, semantic caching, prompt guards, PII sanitisation, cost controls, and routing. Both can coexist: MCP handles tool communication; an AI gateway handles traffic governance.

Is MCP enough for enterprise AI or do I need something more?

MCP is sufficient for the connection problem at small API estates — fewer than 50 APIs with complete descriptions. It is not sufficient for the scale problem. Context explosion, discovery ambiguity, and governance absence are predictable failure modes that every organisation running production AI at scale will encounter. The decision framework in this article is the tool for answering that question for your specific estate.

How do AI agents consume APIs differently from human developers?

Human developers read documentation, infer intent, ask questions, and adapt over time. AI agents read the machine-readable tool description, select tools by semantic matching, and invoke them based on that description alone — no gap-filling, no ability to ask for clarification. API quality standards that are “good enough” for developers are often insufficient for agents.

What is the “MCP fulcrum” and how do I know if I’ve reached it?

The MCP fulcrum is the point at which a growing AI deployment has outgrown raw MCP exposure. Practical signals: agents invoking incorrect tools despite correct task descriptions; token overhead consuming a significant fraction of your context window; different runs of the same task producing inconsistent API call sequences; sensitive or deprecated APIs being called by agents. At 50-plus tools with static loading, context explosion becomes practically inevitable.

What is dynamic tool discovery versus static tool loading?

Static tool loading loads all MCP tool definitions into the agent’s context at initialisation — simple to implement, fails at scale due to context explosion. Dynamic tool discovery loads only the tools the agent needs for a specific task, selected at runtime via semantic search. The upfront search costs approximately 500 tokens; a typical search returns 3 to 5 relevant tools — 8,700 tokens total versus 77,000 for static loading at 50-plus tools.

How do LangChain and the OpenAI Agents SDK relate to MCP scaling?

Both support MCP tool integration and abstract much of the orchestration complexity. Neither solves context explosion, discovery ambiguity, or governance absence — these are API estate architecture problems, not framework problems. Dynamic discovery, AX quality, and workflow-aligned tool abstraction must be implemented at the architecture level regardless of which framework is used.

Where is the official MCP specification published?

The official MCP specification is published by Anthropic at modelcontextprotocol.io. The specification has been updated multiple times since November 2024. In December 2025, it was donated to the Linux Foundation’s Agentic AI Foundation to ensure it remains open and community-driven.

Where can I find Gartner’s research on AI Agent Management Platforms?

Gartner’s AMP research is a subscription-gated report. An accessible summary of the key projections is available via Gravitee’s AMP breakdown article. For the full Gartner report, access requires a Gartner subscription or your organisation’s existing research contract.

What are the security risks of raw MCP exposure at scale?

At scale, raw MCP exposure creates three governance risk vectors: undescribed or shadow APIs become agent-discoverable and callable without oversight; MCP server implementations of variable quality introduce command injection vulnerabilities (43% susceptibility per Equixly, March 2025; tool poisoning attacks achieve 84.2% success rates when agents have auto-approval enabled); and no native cost control exists at the MCP layer, meaning agents can generate runaway API call volumes. Full treatment belongs in the governance layer your scaled AI tooling requires.

Where to go from here

If the decision framework points you toward dynamic tool discovery or workflow-aligned tools, the next concrete step is the governance layer your scaled AI tooling requires — the supervised execution architecture that provides the policy layer MCP alone cannot supply.

If you’re starting further back — auditing the estate, remediating shadow APIs, building documentation for AX fitness — begin with diagnosing the API estate quality problems that cause MCP to fail.

For a complete overview of all six dimensions of AI-ready API architecture — from sprawl to revenue — see the full API agent-era architecture series.