Business

SaaS

Technology

•

Jun 22, 2026

Microsoft Foundry and the Bet That Enterprise AI Battles Are Won on Reliability, Not Capability

The enterprise AI conversation has been dominated by a single question for two years: which model is the most capable? The data from 2026 points elsewhere. Only 12% of enterprise AI agent pilots ever reach production. Seventy-four per cent of enterprises have already rolled agents back after going live. Gartner now forecasts that 40% of enterprises will demote or decommission autonomous AI agents by 2027, not because the models are not smart enough, but because the governance, observability, and operational infrastructure was not in place before deployment.

Microsoft’s response at Build 2026, a platform called Foundry, is not a model launch. It is a bet that the next phase of enterprise AI competition will be won on reliability, not benchmark scores. This pillar page frames the argument, examines the evidence, and routes you to the deep-dive articles that explore each dimension in detail across this four-part series.

In This Series

Why Most Enterprise AI Agents Never Reach Production covers the adoption statistics, the 7-gap production stack, and what separates the agents that succeed from those that do not.

Microsoft Foundry and the Infrastructure Bet Behind Enterprise AI unpacks Foundry’s architecture: the agent service, vendor-agnostic model marketplace, and CI/CD tooling.

The Governance Infrastructure Enterprise AI Agents Actually Need covers MCP security, ACS runtime enforcement, ASSERT evaluation, and the regulatory compliance layer.

How to Choose Between Microsoft Foundry and Building Your Own Agent Infrastructure is the decision-support synthesis: Foundry versus AWS Bedrock versus Google Vertex AI, single-model versus multi-model strategy, and the build-vs-adopt framework.

Why is reliability more important than raw model capability for enterprise AI adoption in 2026?

Model capability, how well an LLM performs on benchmarks, has ceased to be the binding constraint on enterprise AI adoption. The constraint is now reliability: whether an agent can run in production without silent failures, cost overruns, governance breaches, or outputs that cannot be audited. Gartner’s May 2026 forecast identifies governance gaps discovered only after deployment, not model performance, as the trigger. The infrastructure that surrounds the model has become the product.

Models are commoditising rapidly. Anthropic‘s Claude, OpenAI‘s GPT, and Microsoft’s own MAI models all deliver frontier performance, and the differences between them shrink with each release. The strategic question has therefore shifted: not “which model is best?” but “which infrastructure keeps the model safe, auditable, and cost-controlled in production?” The 74% rollback rate and the 88% pilot failure rate are data points that demonstrate capability is not the bottleneck — and the full statistical breakdown confirms the pattern is structural, not transient. Fifty-nine per cent of companies are now meaningfully adopting generative AI, yet only 43% have formal AI governance policies in place. The gap is operational, not algorithmic.

Models change every quarter; governance, identity, observability, and CI/CD pipelines persist across model generations. Foundry’s announcement at Build 2026, releasing ACS, ASSERT, Guided Guardrails, and the vendor-agnostic marketplace, represents a bet that the enterprise customer wants infrastructure portability and operational dependability, not model lock-in. As one industry analysis put it, the orchestration layer could become the moat, not the model. That is the reliability-first thesis, and it is the central argument every article in this cluster examines from a different angle.

Why Most Enterprise AI Agents Never Reach Production digs into the evidence behind this shift, including the statistics, the structural gaps, and what production-readiness actually means.

What are the actual enterprise AI agent adoption statistics for 2026?

The headline statistics tell the story. Only 12% of enterprise AI agent pilots convert to production within 12 months. Seventy-four per cent of enterprises have rolled agents back after going live. IDC and McKinsey data place the enterprise AI agent market at roughly $10.9 to $12 billion, yet the pilot-to-production conversion rate has not materially improved since 2024. The gap is rooted in infrastructure and operational maturity, and it affects every industry.

The spread between the 80% of applications that now embed an AI agent, per Gartner, and the 31% of organisations actually running one in production is a 49-point gap. That is where most enterprise AI dollars are being spent, and also where most of the year’s quiet write-offs are happening. Gartner’s CIO survey puts deployed-agent adoption at just 17% in 2026, while the Cisco AI Readiness Index shows 83% of organisations plan to deploy autonomous agents but only one in three says their infrastructure is ready. The investment is high. The outcomes are not matching.

These numbers are a structural signal, not a transient phase. The consistency of the pattern across industries, regions, and model providers suggests the issue is not model quality but operational readiness. Gartner’s 40% decommissioning forecast is the forward-looking indicator: the wave of rollbacks visible today will become a wave of formal decommissioning within 18 months. Gartner’s analysis identifies the root causes explicitly: absence of autonomous agent lifecycle management, uniform-governance failures where identical controls are applied to high-risk and low-risk agents, and lack of named agent ownership. Each of those is an operational gap, not a model capability gap.

Why Most Enterprise AI Agents Never Reach Production provides the full statistical picture, including adoption rates by industry and region and the production-ready agent profile.

What is the 7-gap production stack that prevents AI agents from reaching production?

The 7-gap production stack is a diagnostic framework identifying the seven operational layers that must exist before an AI agent can run reliably in production: (1) observability and tracing, (2) evaluation and testing, (3) non-deterministic output management, (4) governance and compliance, (5) tool safety and MCP governance, (6) identity and access, and (7) cost and ROI measurement. Each missing layer compounds the risk of silent, expensive, or irreversible failure. The production readiness gap represented by the 88% pilot failure rate is not a single-gap problem. It is the cumulative weight of multiple missing layers.

Here is what each gap means in practice. Gap 1, observability: can you trace every agent decision to its source? Gap 2, evaluation: do you run automated test suites on every prompt and model change? Gap 3, non-deterministic outputs: how do you handle same-input-different-output behaviour in regulated workflows? Gap 4, governance: do you have runtime policy enforcement, or only pre-deployment checklists? Gap 5, tool safety: is every MCP server connection vetted, authenticated, and monitored? Gap 6, identity: do agents operate as scoped workload identities or user proxies? Gap 7, cost: can you attribute and cap spend per agent, per task, per model?

These gaps compound. Each individual gap is a 1.5x to 3x risk multiplier, and they stack. An agent with no evaluation coverage, no governance enforcement, and no cost attribution can silently run up thousands of dollars in a single weekend loop while producing outputs that cannot be audited. The numbers bear this out. Only 38% of production agents have automated evaluations running on every prompt change.

Why Most Enterprise AI Agents Never Reach Production walks through each gap in detail and provides the maturity model for self-assessment.

What is Microsoft Foundry and how does it differ from Azure AI Studio?

Microsoft Foundry is a production infrastructure layer for deploying, governing, and operating AI agents at enterprise scale. It is not an experimentation environment. Azure AI Studio, its predecessor, was built for model exploration, fine-tuning, and prototyping. Foundry is built for running agents in production: it provides managed agent hosting with memory persistence, CI/CD pipelines with evaluation gates, a vendor-agnostic model marketplace, integrated governance through ACS and ASSERT, and identity management via Microsoft Entra ID. The analogy is the difference between a development environment and a Kubernetes cluster.

Foundry is Microsoft’s response to the production crisis documented in the adoption statistics. Azure AI Studio answered the question “can we build an AI agent?” Foundry answers the question “can we run that agent safely, audibly, and cost-effectively across an enterprise?” The Build 2026 launch positioned Foundry as an AI app and agent factory, standing alongside Microsoft 365 and Microsoft Fabric as the third pillar of Microsoft’s enterprise platform. Foundry consolidates several previous Azure AI services: the Assistants API is replaced by the Responses API, and a unified project client replaces multiple packages.

The strategic stakes matter for enterprise decision-making. If you evaluate Foundry as a model experimentation platform, you miss the point entirely. If you evaluate it as a production operations layer that treats AI agents like real software systems, with CI/CD, canary deployments, rollback, cost management, and runtime governance, you see the strategic bet. Foundry’s production architecture assumes models will change and builds the governance and operations layers to persist across model generations. It inherits Azure’s certification stack, the broadest in the industry for regulated verticals, with private endpoints, customer-managed keys, and data residency all standard.

Microsoft Foundry and the Infrastructure Bet Behind Enterprise AI provides the full architectural walkthrough and the enterprise deployment evidence.

What does Microsoft Foundry’s agent service actually provide for production deployments?

Foundry’s Agent Service provides a managed runtime for deploying AI agents with persistent memory, automated scaling, and integrated governance. It offers two agent types: hosted agents, which are stateful with procedural, user, and session memory for complex production workloads, and prompt agents, which are stateless and lightweight for simpler, single-turn tasks. The three memory types each serve a distinct function: procedural memory retains how the agent should work across runs, user memory stores preferences across sessions, and session memory holds context within a single conversation thread. The service also includes Foundry IQ for knowledge grounding against enterprise data, the Responses API for model-agnostic routing, CI/CD blueprints with evaluation gates, and identity integration through Microsoft Entra ID where agents authenticate as scoped workload identities.

Hosted agents are the fully managed, stateful option. Each agent session runs in its own isolated execution sandbox with no shared state between sessions, no cross-session data leakage, and strong compute boundaries. They support VNET isolation and zero idle cost, meaning you pay only for active execution. Foundry IQ reimagines RAG as a dynamic reasoning process, powered by Azure AI Search and centralising RAG workflows into a single grounding API.

The operational surface is what makes Foundry a production platform rather than a model playground. The VS Code Foundry Toolkit and GitHub Copilot SDK integration provide the developer experience. CI/CD blueprints include evaluation gates that run automated test suites on every prompt, model, or tool change, along with canary deployments and rollback mechanisms. The enterprise deployments validate the scale. KPMG deployed Agent 365 across 276,000 professionals. Deloitte deployed Claude to over 470,000 employees. Standard Chartered, SoftBank, and Nasdaq are running production workloads on the platform as well. These are not pilot programmes. They are production operations running on the Foundry stack.

Microsoft Foundry and the Infrastructure Bet Behind Enterprise AI covers the agent service architecture in detail, including the hosted-versus-prompt agent decision framework and the full memory architecture.

What is Microsoft’s vendor-agnostic model marketplace strategy and why does it matter?

Foundry’s model marketplace hosts frontier models from Anthropic, OpenAI, Fireworks AI, DeepSeek, Meta, Mistral, and xAI alongside Microsoft’s own MAI models, all behind a single Responses API that abstracts model selection from agent logic. The strategic logic is that models are commoditising, so the durable enterprise product is the infrastructure layer that persists across them: governance, identity, routing, and cost management. The marketplace is the architectural expression of the reliability-first thesis.

The marketplace is an architectural bet on infrastructure portability. When an enterprise builds agents on Foundry, the governance layer, ACS, ASSERT, Entra ID, Azure API Management, applies regardless of which model runs underneath. If Anthropic releases a breakthrough model next quarter, you can route to it without rebuilding your governance stack. If a model provider changes pricing or deprecates a version, the Responses API absorbs the change. The infrastructure, not the model, is the enduring investment. Over 10,000 customers have used more than one model on Foundry, and the number using both Anthropic and OpenAI models increased 2x quarter over quarter.

Foundry’s catalog includes over 11,000 models, the broadest selection on any cloud. Competitors’ frontier models are first-class citizens, not second-tier integrations. This signals that Microsoft’s bet is on the platform layer, not on winning the model war. Thirty-seven per cent of enterprises already use five or more models in production. The marketplace architecture makes that operationally viable at scale. For the full competitive context, see how Foundry’s model strategy stacks up against AWS Bedrock and Google Vertex AI.

Microsoft Foundry and the Infrastructure Bet Behind Enterprise AI provides the full model roster and the competitive comparison against AWS Bedrock and Google Vertex AI Model Garden.

What is the Model Context Protocol and why does it need enterprise governance?

The Model Context Protocol, or MCP, is Anthropic’s open standard that lets AI agents connect to external tools and data sources. It has over 150 million package downloads and thousands of community-built servers. But alongside that adoption comes a significant security surface. Researchers have documented STDIO command injection, tool poisoning, and cross-server tool shadowing attacks. Over 1,800 MCP servers are exposed to the internet without authentication. Every MCP server connection is a potential data exfiltration vector, privilege escalation path, or compliance violation.

MCP was designed for capability, not security. The specification leaves authentication optional and tool description trust implicit. Tool descriptions are the interface, loaded directly into the AI model’s reasoning. An attacker who controls a tool description controls the model’s behaviour. The Cloud Security Alliance’s MAESTRO threat modelling framework has catalogued over 200,000 vulnerable MCP instances and at least seven confirmed high-severity CVEs spanning MCP Inspector, LiteLLM, Cursor IDE, and LibreChat. Nation-state actors have already weaponised MCP: Anthropic disclosed a Chinese state-sponsored campaign that used Claude Code plus MCP tools to run AI-orchestrated intrusions against roughly 30 organisations. Only 8% of MCP servers support OAuth, and nearly half of those have material implementation flaws.

What enterprise governance of MCP requires is a four-layer security model: transport security with TLS and mTLS, authentication and authorisation, input/output validation and content filtering, and behavioural monitoring. Foundry’s Toolboxes provide the management construct, grouping MCP tools, applying policies at the group level, and managing the full lifecycle from registration through vetting to deprecation. The unifying principle is simple: treat tool descriptions as code. Code gets reviewed, versioned, tested, and monitored. MCP tool descriptions need the same rigour.

The Governance Infrastructure Enterprise AI Agents Actually Need provides the complete MCP governance assessment framework and the secure-by-default strategy.

What is the Agent Control Specification and how does it enforce safety at runtime?

The Agent Control Specification, ACS, is an open industry specification and part of Microsoft’s Agent Governance Toolkit that enforces safety controls at five checkpoints in an agent’s execution lifecycle: input, LLM processing, state changes, tool execution, and output. Unlike pre-deployment testing, which catches known failure modes, ACS enforces policy continuously at runtime, blocking, allowing, or escalating agent actions based on deterministic YAML contracts. It is designed to work across any agent framework and is endorsed by KPMG, IBM, Arize AI, CrewAI, and Zscaler.

Each of the five checkpoints addresses a specific failure point. The input checkpoint checks whether the user prompt contains injection attempts or policy-violating instructions. The LLM checkpoint verifies that the model’s internal reasoning complies with content safety and domain constraints. The state checkpoint catches attempts to modify state the agent should not access. The tool execution checkpoint ensures the proposed tool call is authorised, rate-limited, and within scope. The output checkpoint checks for PII, toxic content, or policy violations before the response reaches the user. Each checkpoint produces an attestation, cryptographic proof that creates an audit trail for regulatory compliance.

ACS is the runtime complement to pre-deployment evaluation. Static testing, through ASSERT, Rubric, or manual QA, catches known failure modes before deployment. ACS catches novel failure modes at runtime: the unexpected tool combination, the adversarial prompt variant, the edge case that no test suite anticipated. Both are necessary; neither is sufficient alone. ASSERT converts written governance documents into executable evaluations that feed into ACS checkpoints, creating the evaluate, control, re-evaluate closed loop. ACS has been released as open source under the MIT license and is intended to remain vendor-neutral and community-driven. As Michael Bargury, co-creator of ACS, put it: “Governance cannot rely on soft guardrails or wishful system prompts.”

The Governance Infrastructure Enterprise AI Agents Actually Need provides the full five-checkpoint architectural detail and the ACS-versus-static-testing comparison.

What governance framework should enterprises put in place before deploying autonomous agents?

A production-grade governance framework needs six integrated layers: (1) agent identity, where agents authenticate as scoped workload identities via Microsoft Entra, not user proxies; (2) content safety, covering input filtering, output filtering, and abuse monitoring; (3) runtime enforcement, with ACS checkpoints that block policy-violating actions at execution time; (4) provider caps and kill switches, meaning rate limiting per agent, per model, per tool, plus emergency halts at the infrastructure level; (5) attestation and audit, providing cryptographic proof of compliance at each checkpoint; and (6) regulatory alignment, addressing EU AI Act high-risk system requirements and NIST AI RMF alignment.

Governance is an operational concern, not a compliance checkbox. Each of the six layers addresses a specific failure mode. Identity prevents privilege escalation. Content safety prevents toxic or PII-leaking outputs. ACS prevents policy violations at runtime. Caps and kill switches prevent cost runaway and large-scale failures. Attestation provides the evidence chain for auditors and regulators. Regulatory alignment ensures the framework meets the EU AI Act’s requirements for transparency, human oversight, accuracy, and robustness, with high-risk provisions effective August 2026. Non-compliance can result in fines up to 35 million euros or 7% of global annual turnover. The operational governance framework that addresses these requirements is covered in depth in the governance article.

Governance controls should be calibrated to agent risk. A document summarisation agent and a transaction-executing agent carry different risk profiles and should not be governed identically. Gartner’s four-level autonomy classification, Observe, Advise, Act with Approval, and Fully Autonomous, provides the framework for assigning governance controls proportionate to risk. Every agent should be classified by autonomy level before deployment, with circuit breakers in place and a named human owner assigned to each agent’s authority boundary. Yet only 21% of organisations have a mature governance model for autonomous AI agents, and 83% of security leaders say business units are deploying agents faster than security teams can assess them.

The Governance Infrastructure Enterprise AI Agents Actually Need provides the governance maturity self-assessment and the full six-layer framework with regulatory mapping.

How should enterprises evaluate whether to adopt Foundry versus building their own agent infrastructure?

The build-versus-adopt decision turns on five factors: agent fleet size and diversity, governance requirements, model strategy, platform team capacity, and cloud lock-in tolerance. A handful of homogenous agents with minimal governance requirements can be managed with custom infrastructure built on the Anthropic Agent SDK or OpenAI Agents SDK. Hundreds of agents across dozens of teams, with EU AI Act compliance requirements, multi-model strategies, and auditable attestation trails, demand a platform. Foundry’s integrated governance stack reduces the integration burden that custom builds must assemble from open-source and vendor components.

Each factor deserves honest evaluation. Agent fleet size: are you managing five agents or five hundred? Governance requirements: do you need EU AI Act-compliant attestation trails, or is internal policy sufficient? Model strategy: are you committed to a single model provider, or do you need optionality across Anthropic, OpenAI, and others? Platform team capacity: do you have the engineering resources to maintain the orchestration layer, governance layer, observability layer, and CI/CD pipeline yourself? Cloud lock-in tolerance: does adopting Foundry, which runs on Azure, align with or constrain your cloud strategy?

What “building your own” actually means in practice is worth understanding. You will need to assemble an MCP governance layer equivalent to Toolboxes, a runtime enforcement layer equivalent to ACS, a policy-to-evaluation compiler equivalent to ASSERT, an identity layer equivalent to Entra ID, a gateway layer equivalent to Azure API Management, and a model routing layer equivalent to the Responses API, likely using LiteLLM or Portkey. Each of these integrates with the others. Each new model provider, tool, or deployment adds integration debt. The build-versus-adopt comparison is a bet on whether your organisation’s integration capacity exceeds the platform’s integration surface. Teams underestimate time-to-value by six to twelve months on custom builds and face ongoing maintenance burden.

How to Choose Between Microsoft Foundry and Building Your Own Agent Infrastructure provides the complete evaluation framework, including the competitive comparison against AWS Bedrock and Google Vertex AI.

Single-model strategy vs vendor-agnostic multi-model approach — which is the better enterprise bet?

A single-model strategy, committing to OpenAI, Anthropic, or another provider, simplifies development, testing, and cost management but creates vendor lock-in and limits your ability to adopt new models as frontier capabilities shift. A multi-model strategy, using Foundry’s marketplace or a routing layer like LiteLLM, preserves optionality, allows workload-specific model selection, and insulates against deprecation or pricing changes, but adds complexity in evaluation, routing, and cost management. The better bet depends on workload diversity and how much you value model portability over integration simplicity.

Both strategies come with their costs. Single-model: you optimise deeply around one provider’s APIs, evaluation tooling, and cost structure. Your integration surface is smaller. But you are betting that your chosen provider will remain the frontier leader, that their pricing will stay competitive, and that their model deprecation schedule will not disrupt your production agents. Companies that picked a single model provider in 2023 have started regretting the dependency. When GPT-4 Turbo underperformed on specific tasks, they had no fallback. When pricing structures changed, they had no leverage. Multi-model: you build on an abstraction layer that lets you route workloads to the best model for each task, Claude for reasoning-heavy work, GPT for code generation, and so on. Your optionality is higher, but so is your evaluation and governance overhead.

The decision is a bet on how the model market will evolve. If models continue commoditising at the current pace, with Anthropic, OpenAI, Meta, DeepSeek, and others all delivering frontier performance, the multi-model bet looks stronger because differentiation shifts to the infrastructure layer. If one provider pulls decisively ahead and sustains the lead, the single-model bet looks stronger because deep integration with the winner compounds. Foundry’s marketplace is the architectural expression of Microsoft’s bet that multi-model is the enterprise future. Not every task requires the most advanced reasoning engine, and routing simpler workloads to efficient models can materially reduce cost at scale. For the full analysis, the decision-support framework covers the economic and strategic trade-offs in detail.

How to Choose Between Microsoft Foundry and Building Your Own Agent Infrastructure provides the full economic and strategic analysis, including the MAI-versus-Claude-versus-GPT model comparison.

Resource Hub: Enterprise AI Infrastructure Deep Dives

Diagnosing the Production Crisis

Why Most Enterprise AI Agents Never Reach Production covers the full statistical picture: adoption rates by industry and region, the 7-gap production stack with the integrated maturity model, and the production-ready agent profile that distinguishes the 12% that succeed. It covers Gartner’s 40% decommissioning forecast, the rollback rate differential between agents with and without evaluation coverage, and the evaluation, observability, and ROI measurement frameworks engineering leaders need.

Platform and Governance Architecture

Microsoft Foundry and the Infrastructure Bet Behind Enterprise AI covers Foundry’s complete architecture: the Agent Service with hosted and prompt agents, the three memory types, Foundry IQ for knowledge grounding, the Responses API, CI/CD blueprints, VS Code tooling, Entra ID identity integration, and the vendor-agnostic model marketplace hosting Anthropic, OpenAI, and seven other providers. Enterprise deployment evidence at scale: KPMG with 276,000 professionals and Deloitte with 470,000-plus employees.

The Governance Infrastructure Enterprise AI Agents Actually Need covers the governance layer in depth: MCP’s security surface and the four-layer assessment model, ACS’s five-checkpoint runtime enforcement architecture, ASSERT’s policy-to-evaluation compilation, Toolboxes for MCP lifecycle management, and the full six-layer governance framework. Regulatory context: EU AI Act high-risk provisions effective August 2026, NIST AI RMF alignment, and the governance maturity self-assessment.

Strategic Decision-Making

How to Choose Between Microsoft Foundry and Building Your Own Agent Infrastructure covers the complete evaluation framework: Foundry versus AWS Bedrock versus Google Vertex AI compared across model breadth, agent service maturity, governance depth, and CI/CD tooling. Single-model versus multi-model strategy analysed with economic and strategic trade-offs. Build-versus-adopt decision criteria covering fleet size, governance requirements, platform team capacity, model strategy, and cloud lock-in tolerance.

Suggested reading order: Start with the production crisis article to understand the problem. Then read the Foundry platform article to understand Microsoft’s response. Follow with the governance article to understand the operational layer that makes production viable. Finish with the decision-support article to evaluate your own path.

Frequently Asked Questions

What does Gartner predict about enterprise AI agent decommissioning rates through 2027?

Gartner’s May 2026 forecast predicts that 40% of enterprises will demote or decommission autonomous AI agents by 2027, not because of model capability failures, but because of governance gaps discovered only after production deployment. The forecast identifies the absence of autonomous agent lifecycle management, uniform-governance failures where identical controls are applied to high-risk and low-risk agents, and the lack of named agent ownership as the root causes. This transforms the current 74% rollback data from a temporary statistic into a structural trend. For detailed analysis, see Why Most Enterprise AI Agents Never Reach Production.

What criteria should organisations use to assess whether an AI agent is production-ready?

A production-ready agent must satisfy five conditions: automated evaluation coverage on every prompt, model, and tool change (the strongest predictive indicator, with a 47% rollback rate without it and 9% with it); scoped workload identity with least-privilege access; runtime governance enforcement through ACS or an equivalent; cost attribution and per-task budgeting; and a defined human-in-the-loop pattern appropriate to the agent’s autonomy tier. Only 12% of enterprise agents meet all five. Without them, you are running an experiment in production. The full production-ready profile is detailed in Why Most Enterprise AI Agents Never Reach Production.

How does Foundry’s governance stack compare to AWS Bedrock Guardrails?

Foundry provides a deeper governance stack: ACS offers five-checkpoint runtime enforcement with attestation, ASSERT compiles written policies into executable evaluations, and Toolboxes manage the full MCP server lifecycle. Bedrock Guardrails provides content filtering for PII, toxicity, and prompt injection plus topic denial, but lacks runtime enforcement checkpoints, policy-to-evaluation compilation, and MCP server lifecycle management. Both platforms integrate with their respective cloud identity providers. The full governance infrastructure comparison is covered in The Governance Infrastructure Enterprise AI Agents Actually Need and evaluated in the competitive context in How to Choose Between Microsoft Foundry and Building Your Own Agent Infrastructure.

Is vendor lock-in a risk with Foundry, given it runs on Azure?

Foundry introduces Azure as the runtime dependency, which is a form of cloud platform lock-in. However, Foundry’s vendor-agnostic model marketplace, hosting Anthropic, OpenAI, Fireworks AI, DeepSeek, Meta, Mistral, and xAI alongside Microsoft’s own MAI models, is explicitly designed to prevent model-level lock-in. The Responses API abstracts model selection, meaning you can switch model providers without rewriting agent logic. The trade-off is cloud platform lock-in on Azure in exchange for model portability across vendors. Whether that trade-off works for your organisation depends on your existing cloud commitments and model strategy. See How to Choose Between Microsoft Foundry and Building Your Own Agent Infrastructure for the full build-versus-adopt evaluation.

What is ASSERT and how does it convert policies into executable agent evaluations?

ASSERT, Adaptive Spec-driven Scoring for Evaluation and Regression Testing, is Microsoft’s open-source framework that converts natural-language or structured policy documents into executable evaluation scenarios. It generates targeted test cases from policy statements, scores agent behaviour against those policies, and produces inspectable judge rationale, the reasoning behind each pass or fail. It works across LangChain, CrewAI, LiteLLM, and OpenAI, making it framework-agnostic. ASSERT feeds its results into ACS checkpoints, creating the closed loop where policy violations detected in evaluation trigger runtime controls. The full architecture is covered in The Governance Infrastructure Enterprise AI Agents Actually Need.

Can enterprises use the MCP standard without adopting Foundry?

Yes. MCP is an open specification developed by Anthropic, and you can use it independently with any agent framework that supports the protocol. However, using MCP without a governance layer, whether Foundry’s Toolboxes and ACS or a custom-built equivalent, means accepting the protocol’s documented security risks: unauthenticated tool connections, tool poisoning vulnerabilities, and cross-server attack surfaces. Only 8% of MCP servers support OAuth, and over 1,800 are exposed to the internet without authentication. The question is not whether you can use MCP without Foundry; it is whether you have built the governance infrastructure to use it safely. The four-layer MCP security assessment model in The Governance Infrastructure Enterprise AI Agents Actually Need provides the framework for evaluating your readiness.

How should enterprises measure ROI on AI agent deployments beyond task completion rates?

Task completion rate is a necessary but insufficient metric. Enterprise AI ROI should be measured across four dimensions: throughput improvement, time-to-resolution reduction versus the human baseline; accuracy uplift, error-rate improvement specifically for non-deterministic outputs; cost efficiency, cost-per-task versus human cost-per-task including infrastructure, evaluation, and governance overhead; and capability expansion, new workflows enabled that were previously infeasible. Deloitte’s Return-on-Autonomy framework extends this further to capture decision quality and customer outcome improvement, dimensions that cost-based ROI models miss. Only 44% of organisations have adopted AI FinOps practices to track these metrics systematically. For the full ROI framework, see Why Most Enterprise AI Agents Never Reach Production.

Microsoft Foundry vs AWS Bedrock vs Google Vertex AI — which is best for enterprise AI agent deployment in 2026?

There is no single best platform. The answer is conditional on your context. Foundry leads on governance depth with ACS and ASSERT plus Toolboxes, vendor-agnostic model breadth, and developer tooling including the VS Code Toolkit and CI/CD blueprints, but requires Azure commitment. Bedrock leads on maturity, AWS ecosystem integration, and FedRAMP compliance posture. Vertex AI leads on model training and MLOps integration, European regulatory posture, and Gemma 4 economics at $0.13 per million tokens. If governance is your binding constraint, Foundry has the deepest stack. If AWS is your cloud, Bedrock is the natural choice. If model training is central to your workflow, Vertex AI may lead. The full three-way comparison across model marketplace, agent service, governance, and CI/CD dimensions is in How to Choose Between Microsoft Foundry and Building Your Own Agent Infrastructure.

Enterprise AI is at an inflection point. The model capability conversation that dominated 2023 and 2024 has run its course. The question now is not whether a model can do the work. It is whether you can run that model safely, audibly, and cost-effectively at scale, across hundreds of agents, under real regulatory obligations, without becoming another entry in the 74% rollback statistic.

The four articles in this cluster walk you through that question from every angle. If your agents are not making it to production, start with the production crisis article. If you are evaluating platforms, start with the Foundry explainer and the build-versus-adopt comparison. If governance is keeping you awake at night, the governance article is where you go first. The problem is well-defined, the data is clear, and the frameworks exist. The only remaining variable is whether you put them to work before the decommissioning wave reaches your deployment.