Insights Business| SaaS| Technology SAFE-MCP: The Security Framework That Defines the Enterprise MCP Adoption Baseline
Business
|
SaaS
|
Technology
Apr 17, 2026

SAFE-MCP: The Security Framework That Defines the Enterprise MCP Adoption Baseline

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of SAFE-MCP enterprise security framework for the Model Context Protocol

Gartner projects AI cybersecurity spending to grow more than 90% in 2026. That money is chasing a real and rapidly expanding attack surface — and MCP deployments are one of the fastest-growing contributors to it.

Here’s the thing: MCP introduces a categorically different security risk profile than standard API integrations. Organisations that treat it like a conventional API gateway will leave gaps. And attackers are already exploiting those gaps. Tool poisoning attacks achieve 84.2% success rates in controlled testing when agents have auto-approval enabled.

The SAFE-MCP framework — developed by the Coalition for Secure AI (CoSAI) and published under OASIS Open — gives you the structured threat taxonomy and enterprise adoption baseline you need to make a governed “yes with controls” decision on MCP. This article decodes the four SAFE attack codes (SAFE-T1001, SAFE-T1007, SAFE-T1102, SAFE-T1111), defines the enterprise security baseline, and gives you a structured CTO decision framework to work with. If you need to understand what MCP is and why it matters before reading this analysis, start there.


Why does MCP create a different class of security risk than standard API integrations?

In a traditional API integration, the application decides exactly what data the model sees. In MCP, the agent autonomously retrieves and acts on external content — and that entire data pipeline is attacker-addressable. That’s the core difference, and it matters enormously.

Four structural properties produce security risks with no equivalent in traditional API threat models.

Dynamic tool discovery means agents load tool metadata at runtime, not build time. A compromised MCP server can serve manipulated tool definitions to any connecting agent without touching a single line of application code.

Autonomous context retrieval means the agent — not the application — decides what external content to fetch and process. Every document retrieved, every API response processed, every resource read is a potential injection surface that bypasses input-layer filters entirely.

Delegated OAuth permissions means MCP servers commonly authenticate using broad service identities rather than user-bound scoped tokens. That’s the confused deputy problem: a compromised tool can act with the full permissions of the MCP server, not just the requesting user.

Chained agent execution means context poisoning propagates. In multi-agent pipelines, a compromised upstream MCP client passes false context into shared state, and the downstream agent treats it as trusted — without any interaction with the original user.

This is why existing API gateways aren’t enough. MCP security risks emerge in the LLM‘s context window after the gateway has already passed the content through. CoSAI has catalogued more than 80 attack techniques across 14 tactic categories that have no equivalent in traditional API threat models.

For a detailed explanation of MCP’s host/client/server architecture and how each component maps to this threat surface, see the architecture explainer.


What is SAFE-MCP and how does it adapt MITRE ATT&CK for agent tooling?

SAFE-MCP is a structured threat taxonomy covering 14 tactic categories and 80+ attack techniques specific to MCP deployments. It’s hosted by the Linux Foundation and supported by the OpenID Foundation. Not a checklist — a living catalogue of tactics, techniques, and procedures (TTPs) for MCP-based systems.

It adapts MITRE ATT&CK — designed for network and endpoint environments — to the LLM-plus-tool surface, where the “endpoint” is an AI agent’s context window and tool call stack.

CoSAI published the foundational MCP Security white paper under OASIS Open on 27 January 2026. Co-creators Frederick Kautz, Arjun Subedi, and Bishnu Bista lead a community with contributors from eBay, Okta, Red Hat, Intel, American Express, and Premier Sponsors including Google, IBM, Meta, and Microsoft. That breadth signals real production deployment scenarios, not theoretical risk.

SAFE-MCP’s regulatory alignment spans CISA, ENISA, and NIST AI security guidance — which means your organisation can demonstrate compliance with emerging requirements without custom mapping work. If you are still evaluating whether MCP is the right protocol choice, our MCP overview covers the full architecture and adoption context before you commit to the security baseline work below.


What are the four MCP attack vectors: SAFE-T1001, SAFE-T1007, SAFE-T1102, and SAFE-T1111?

The four SAFE codes target distinct layers of the MCP stack. A control that addresses one does not necessarily address the others. You need to understand each on its own terms.

SAFE-T1001 — Tool Poisoning: Malicious instructions in tool metadata or response payload redirect the agent to exfiltrate credentials or execute unauthorised actions. It persists across all sessions using the compromised tool — simply loading the tool description into the agent’s context can trigger the attack. Real-world incidents have included compromised GitHub repositories and SSH credentials across major AI platforms. Mitigation: treat tool metadata as untrusted; cryptographic signing; re-approval on updates.

SAFE-T1007 — OAuth Consent Abuse: This is the highest-surprise risk for enterprises with existing OAuth infrastructure, because your current setup does not transfer safely to MCP. A user who grants “read calendar” permissions may be served by an MCP server holding broader service scopes — a poisoned tool call can trigger calendar-write or email-send actions using those broader permissions. Mitigation: per-user scoped tokens; PKCE mandatory; audience restriction.

SAFE-T1102 — Indirect Prompt Injection: Malicious instructions embedded in a document or web page retrieved by the agent override legitimate user instructions. Because MCP agents retrieve far more external content than standalone LLMs, every retrieval is a potential injection surface. Mitigation: segregate untrusted content; sanitise before processing; deny tool invocations triggered by external content instructions.

SAFE-T1111 — Agent CLI Weaponisation: Almost always a second-stage attack, requiring a preceding T1001 or T1102 event. CVE-2025-53967 demonstrated this concretely, allowing remote code execution through command injection in Figma‘s MCP server. Mitigation: sandboxed execution environments; no production infrastructure access; allowlist-only network egress.

One distinction worth being clear about: T1001 attacks via what the tool returns. T1102 attacks via external data the agent retrieves from other sources. Related attack classes, different layers of the stack.


What is the non-negotiable MCP security baseline for enterprise adoption?

If you’re deploying MCP at scale, there are five controls you need before going to production. These are the minimum — a baseline, not a complete security programme.

The Non-Negotiable Enterprise MCP Security Baseline:

  1. OAuth 2.1 with user-bound scoped tokens (addresses SAFE-T1007)
  2. MCP gateway for policy enforcement and audit logging (addresses SAFE-T1001, T1102, T1111)
  3. Tool allowlisting and capability scoping (addresses SAFE-T1001, T1111)
  4. Sandboxed CLI and filesystem access (addresses SAFE-T1111)
  5. Human-in-the-loop approval gates for high-impact actions (addresses all four codes)

Control 1 — OAuth 2.1 with user-bound scoped tokens

The MCP specification mandates five OAuth 2.1 authorisation patterns: per-user scoped tokens, Proof Key for Code Exchange (PKCE), resource indicators (RFC 8707), Protected Resource Metadata (RFC 9728), and token expiry tied to task completion. Token lifespans of 15–60 minutes, with refresh token rotation.

These are requirements, not recommendations. Every agent-to-tool interaction must use a token scoped to the minimum permissions required for that specific task. Broad service identities must go.

Control 2 — MCP gateway for policy enforcement and audit logging

An MCP gateway provides a governed entry point for all agent-to-tool traffic that enforces policy, logs every tool invocation, and applies rate limiting. Audit logs must capture the full chain: user prompt → tool selection → tool input → tool output → downstream API calls. Without end-to-end correlation, forensic investigation becomes effectively impossible.

SOC Prime‘s Uncoder AI is the concrete reference here — strong identity chains, tool allowlists with input/output validation, and centralised logging tying prompts to tool calls and downstream actions. Enterprise-grade production deployment with these controls is achievable, not theoretical.

Control 3 — Tool allowlisting and capability scoping

Define explicitly which tools each agent role is permitted to call. Deny-by-default for unlisted tools. Scope permissions to the minimum required — least-privilege reduces blast radius from both misconfiguration and active attacks. Treat tool metadata as untrusted and require cryptographic signing of approved tool versions.

Control 4 — Sandboxed CLI and filesystem access

CLI and filesystem tools must operate in isolated execution environments with no access to production infrastructure, credentials stores, or network egress outside an allowlist. Sandboxing is required for any agent with shell or file tool access. Without it, a single second-stage attack can reach production infrastructure.

Control 5 — Human-in-the-loop approval gates

Any tool action classified as high-impact — create, modify, delete, pay, escalate privileges, external communication — must require explicit human approval before execution. Approval gates are an architectural safeguard, not a UX friction choice. They’re the last line of defence against a successful injection or poisoning event reaching production consequences.

For guidance on operationalising these controls and building governance playbooks, see how to govern and operationalise MCP server infrastructure across an engineering organisation.


What does SAFE-MCP give you that building your own MCP security controls from scratch does not?

Four things that custom controls can’t replicate: a shared vocabulary for cross-team communication, a complete TTP catalogue for structured red team exercises, documented regulatory alignment, and institutional credibility with auditors and procurement teams.

Shared vocabulary. When a security lead tells a CTO “we need to address T1007 before go-live,” both parties are referencing an externally published, institutionally governed standard. Internal documentation doesn’t carry that weight in a compliance audit.

TTP catalogue for red team exercises. Custom controls address the risks the team already knows about. SAFE-MCP’s 80+ techniques catalogue attacker goals and enabling conditions that most development teams haven’t yet encountered. Without it, red teams scope exercises against generic API threat models that miss the MCP-specific attack surface entirely.

Regulatory alignment without custom mapping work. SAFE-MCP’s governance context spans CISA, ENISA, and NIST AI security guidance — your organisation demonstrates compliance without building custom mappings from scratch.

Institutional maintenance. CoSAI updates the taxonomy as new techniques are discovered. An in-house framework requires dedicated security engineering to maintain the same currency. SAFE-MCP amortises that cost across the entire ecosystem.

Custom controls may be technically equivalent, but they carry higher audit risk, higher maintenance cost, and higher knowledge concentration risk. When the engineers who built them leave, so does the institutional knowledge of why. SAFE-MCP’s open governance model eliminates that single point of failure.


Is MCP secure enough for enterprise use?

Yes, with controls. The right question isn’t “is MCP safe?” It’s “have we implemented the controls that make MCP safe in our environment?” SAFE-MCP gives you the checklist that answers that objectively.

The CTO decision matrix:

Green light — proceed to production:

Amber — hold, remediate before production:

Red — not yet, security review required:

Gartner’s projection of more than 90% AI cybersecurity spending growth in 2026 reflects market acknowledgement that the answer to MCP adoption is “yes with investment in controls.” The implementation path is documented, the framework is governed, and the regulatory alignment is established.

For the broader context for MCP adoption decisions, see our MCP guide. For governance and operationalisation playbooks that build on the five-control security baseline, see the guide to operationalising MCP server infrastructure across an engineering organisation.


Frequently Asked Questions

What is SAFE-MCP and who created it?

SAFE-MCP is an open-source security specification for identifying and mitigating attack vectors in MCP-based AI systems. It covers 14 tactic categories and 80+ attack techniques, each with mitigation and detection guidance. CoSAI published the foundational white paper under OASIS Open on 27 January 2026. Co-led by Frederick Kautz, Arjun Subedi, and Bishnu Bista, with contributors from eBay, Okta, Red Hat, Intel, and American Express, and Premier Sponsors including Google, IBM, Meta, and Microsoft.

Does SAFE-MCP apply to all MCP deployments or only large enterprise ones?

Any production MCP deployment, not just large enterprises. The threat vectors exist at any scale. Smaller organisations benefit most from the shared vocabulary and TTP catalogue precisely because they lack dedicated security engineering to build equivalent frameworks from scratch. The five-control baseline is implementable at any scale; the controls aren’t complex, they’re discipline.

Is prompt injection in MCP the same as prompt injection in direct LLM use?

No. Direct prompt injection involves a user crafting malicious inputs within the prompt itself. Indirect prompt injection (SAFE-T1102) involves malicious instructions embedded in external data the agent retrieves autonomously — a document, a web page, an API response. MCP dramatically amplifies this attack class because agents retrieve far more external content than standalone LLM deployments, and those instructions can trigger real tool actions, not just generate incorrect outputs.

What is the difference between tool poisoning (SAFE-T1001) and indirect prompt injection (SAFE-T1102)?

Tool poisoning attacks via what the tool returns in its response payload — the attack is delivered through the tool’s own output. Indirect prompt injection attacks via external data the agent retrieves from other sources. Key distinction: tool poisoning persists across all sessions and can trigger without the tool being invoked — loading the tool description into the agent’s context is sufficient.

What is the confused deputy problem in MCP and how does it differ from standard OAuth misuse?

The confused deputy problem (SAFE-T1007) occurs when an MCP server uses a broad service identity to authenticate to downstream services on behalf of a user, but that identity holds permissions far exceeding what the task requires. Standard OAuth misuse typically involves a single application overreaching. In MCP, the agent chains multiple tool calls — each inheriting the broad service identity — so a single confused deputy event can cascade across an entire workflow.

What OAuth 2.1 patterns does the MCP specification require?

Five: per-user scoped tokens, Proof Key for Code Exchange (PKCE), Resource Indicators (RFC 8707), Protected Resource Metadata (RFC 9728), and token expiry tied to task completion. Token lifespans of 15–60 minutes are recommended. These are requirements, not recommendations — every agent-to-tool interaction must use a token scoped to the minimum permissions required for that specific task.

What audit logging does an enterprise MCP deployment need?

Capture the complete chain: authenticated identity, user prompt, tool selection decision, tool input parameters, tool output, downstream API calls, and policy decisions. Correlate everything with a shared ID so a single forensic query can reconstruct prompt to consequence. Retain logs to satisfy applicable compliance requirements (SOC 2, ISO 27001, HIPAA as relevant).

How does multi-agent chaining change the MCP security threat model?

In single-agent deployments, a successful T1102 injection affects one agent’s context window. In multi-agent pipelines, a compromised upstream MCP client passes false context into shared state — each downstream agent treats that corrupted context as trusted and acts on it. This context poisoning propagation requires inter-agent context validation controls, not just per-agent controls.

Is MCP safe for regulated industries — financial services, healthcare, government?

Yes, with controls. SAFE-MCP’s governance lineage — OASIS Open, with CISA/ENISA/NIST alignment — and the five-control baseline provide a documented foundation for compliance arguments. Regulated industries typically add sector-specific controls on top: data residency requirements, extended audit log retention, and approval gate documentation for regulatory examination.

What is the SAFE-MCP GitHub repository and how can teams contribute?

The SAFE-MCP taxonomy is maintained as an open repository under Linux Foundation / OpenID Foundation governance. Contribute attack technique documentation, mitigation patterns, and red team exercise playbooks through standard open-source processes. It provides access to the evolving threat landscape before techniques are widely published — and before they appear in the next production incident.

Why can’t an existing API gateway substitute for MCP-native security controls?

Traditional API gateways validate what is sent and received, but not what an LLM does with the response content. MCP security risks emerge in the LLM’s context window after the gateway has already passed the content through. Tool poisoning, indirect prompt injection, and confused deputy abuse all occur downstream of the API gateway boundary. An MCP gateway must operate with awareness of agent context, tool chains, and semantic content.

What does “zero trust for agentic AI” mean in the context of MCP deployments?

No agent-to-tool request is trusted by default — every tool call must be authenticated, authorised against the minimum required scope, and logged, regardless of prior behaviour in the same session. Every tool invocation is a new authentication event; no implicit trust accumulates; all tool capabilities are deny-by-default unless explicitly allowlisted. Short-lived scoped tokens and refresh token rotation operationalise this at the token level.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter