Business

SaaS

Technology

•

Apr 28, 2026

How to Secure AI Agents Across the Entire Attack Surface

More than 80% of Fortune 500 companies are already running active AI agents, and Gartner estimates that 40% of enterprise applications will integrate task-specific agents by the end of 2026. If you are building on a SaaS platform, processing financial transactions, or handling health data, there is a reasonable chance AI agents are already operating inside your systems — and a meaningful chance some of them are not sanctioned by your IT team.

Unlike APIs and service accounts, AI agents reason, plan, chain access across systems, and execute multi-step tasks without human approval at each step. The same capabilities that make them valuable make them a new category of security risk — one that spans six distinct domains, each requiring controls that traditional security tools were not designed to provide.

Each section below answers one core question and links to the full cluster article for any reader ready to go deeper.

Where does your concern sit right now?

Use this table to route directly to the relevant article if you have a specific concern.

| Your current concern | Start here | |—|—| | You are worried about what damage an AI agent with broad permissions can do | AI agents are the new insider threat | | You need to restrict what credentials AI agents can hold and when | Zero standing privilege for AI agent identities | | You want to understand how malicious content can hijack your agent’s actions | Prompt injection at tool-chain scale | | Your agents use MCP or external tool integrations | AI supply chain attacks and MCP security | | You need to isolate AI-generated code execution from your production systems | AI agent sandboxing beyond Docker | | You need to build observability and governance for agents already in production | AI agent governance and monitoring programme |

What is the AI agent attack surface and why is it different from traditional software security?

The AI agent attack surface spans six distinct zones — insider threat, identity, prompt injection, supply chain, sandboxing, and governance — each corresponding to a class of risk that traditional security controls were not designed to address. Unlike a web application with fixed inputs and outputs, AI agents dynamically retrieve content, call tools, maintain memory across sessions, and act on behalf of users at runtime. Each capability introduces a new vector that existing tools were not built to catch.

AI agent security boundaries are defined in natural language — any text the agent retrieves could, in principle, attempt to override its instructions. This property has no equivalent in conventional application security. The six domains are interconnected layers: a failure in any one amplifies exposure in others, and no single control is sufficient. The OWASP AI Agent Security Cheat Sheet is the authoritative cross-domain taxonomy and the most useful starting reference for teams mapping their own exposure.

Why are AI agents described as the new insider threat in 2026?

Palo Alto Networks named AI agents the biggest insider threat of 2026 because they share the defining characteristic of a malicious insider: authorised access to sensitive systems, operating autonomously, and capable of causing serious harm before detection. An agent granted broad, persistent permissions across your CRM, database, email, and code repositories is the functional equivalent of a privileged employee with no audit trail and no work hours.

An agent’s access footprint is dynamic — it expands based on what the agent needs at runtime, giving adversaries an insider capable of silently executing trades, deleting backups, or exfiltrating databases. The accidental pathway is just as real: a developer who gives a coding agent access to production secrets without scoping that access has created an insider-threat-class exposure through ordinary behaviour. Microsoft Cyber Pulse data shows 29% of employees already use unsanctioned AI agents — ungoverned identities with inherited credentials and no IT visibility.

For the full treatment — including the Non-Human Identity (NHI) framework and a response model for teams without a dedicated security engineer — read why AI agents are the new insider threat and the mental model for AI agent risk that underpins every domain in this guide.

Why do AI agents require a fundamentally different identity model than human users or service accounts?

AI agents are a new category of non-human identity (NHI) — not users with predictable work patterns, not service accounts with fixed role assignments, and not API keys with defined scopes. They operate continuously, chain access across systems within a single session, and request resources dynamically. Legacy Privileged Access Management (PAM) was designed for humans and breaks down when applied to agents that never sleep and never have a fixed set of needs.

Zero Standing Privilege (ZSP) is the evolved standard: no persistent permissions; access granted when needed and revoked immediately after. This differs from “least privilege,” which minimises access but still allows it to persist. The CrowdStrike acquisition of SGNL for $740 million in early 2026 signals that this shift has reached board-level priority. Cloud-native primitives — AWS IAM condition keys, Azure Managed Identities, Okta Workflows — are a proportionate starting point for teams without a dedicated identity platform.

Read the full guide to zero standing privilege and identity security for AI agents for Just-In-Time (JIT) access flows, the Shared Signals Framework (SSF), and a decision framework showing how dynamic authorisation replaces legacy PAM for agents at any scale.

How do prompt injection attacks work against AI agents, and why are they harder to stop than chat-level jailbreaks?

In an agentic system, prompt injection targets workflows, not chat interfaces — malicious instructions embedded in a document, email, web page, or API response are retrieved by the agent and interpreted as legitimate directives, without the attacker ever touching the agent directly. Every retrieval source is a potential attack vector: one injected instruction in one retrieved file can trigger a cascade of real-world tool calls.

Data exfiltration via tool calls is the primary outcome: a hijacked agent embeds credentials in a normal API parameter, moving secrets through channels that look like ordinary traffic. Multi-agent architectures amplify the blast radius: a compromised agent can propagate poisoned instructions to other agents through shared memory. Input sanitisation alone is not sufficient; layered defences are required.

Read how prompt injection operates at tool-chain scale and what to do about it for the full attack taxonomy and a practical defensive playbook — including why the prompt injection threat in multi-agent systems is categorically different from anything in conventional application security.

What are agentic tool chain attacks and why can’t static code analysis detect them?

Tool poisoning, tool shadowing, and rugpull attacks are a new class of supply chain threat specific to AI agent architectures. All three operate through the Model Context Protocol (MCP) and exploit the reasoning layer, not the code layer — because the attack payload lives in natural language tool description metadata or in dynamic runtime behaviour, no static analysis tool can detect them.

Tool poisoning embeds malicious instructions in an MCP tool’s description metadata; tool shadowing shapes how the agent constructs parameters for a separate, legitimate tool at inference time; rugpull attacks introduce hidden exfiltration behaviour into a clean MCP server after integration — analogous to an npm package being compromised after your dependency audit. One compromised server affects every agent that trusts it.

Version pinning carries the highest protection for the lowest cost — never allow tools to update automatically. Read the complete breakdown of tool poisoning, tool shadowing, and rugpull attacks explained for the full attack taxonomy, and why these are the AI supply chain attacks no code review can catch.

Why is Docker containerisation not sufficient isolation for AI agents executing untrusted code?

Docker containers share the host kernel. A successful container escape — achievable through known kernel vulnerabilities or LLM-generated exploit code — gives an attacker access to the host and every workload on it. Docker’s isolation model was designed to separate application workloads, not to contain adversarially-influenced code, and for AI agents that execute instructions from untrusted content, that distinction matters critically.

The Langflow RCE and Cursor MCP-based RCE are production evidence: real tools used by sophisticated developers were compromised through insufficient isolation. Three technologies provide stronger guarantees: Firecracker microVMs, gVisor, and Kata Containers each address what Docker cannot through hardware-level or user-space kernel isolation. Northflank and Cloudflare Moltworker offer managed VM-grade isolation for teams that don’t want to run their own sandbox infrastructure.

Read the detailed guide to AI agent sandboxing explained — why Docker is not enough for AI agent sandboxing for concrete trade-offs, the Kubernetes Agent Sandbox project, a framework for choosing between Firecracker, gVisor, and Kata Containers, and an SMB decision framework.

What does it mean to “govern” AI agents, and how is governance different from security?

Microsoft’s framing is the clearest: governance defines who owns risk; security enforces the controls. Governance is the cross-functional process of establishing ownership, acceptable use policies, accountability structures, and audit requirements for every AI agent running in your organisation. You cannot enforce access policy on agents you cannot see, which makes an agent registry — a centralised inventory including shadow deployments — the prerequisite for everything else.

Shadow AI is the governance entry point — see the insider threat section above for the scope of that problem. The Microsoft five-capability observability framework provides the structural backbone; observability must come before enforcement. Human-in-the-Loop (HITL) controls are the governance mechanism for high-risk agent actions. For SMB teams: this model requires five clearly assigned roles, not five separate departments — in a 50-person company, two or three people can hold all of them.

The operational programme — registry design, HITL threshold-setting, and cross-functional accountability — is covered in how to build an AI agent governance and monitoring programme, including a practical guide to governing and monitoring AI agents at scale without a dedicated SOC team.

Resource Hub: AI Agent Security Library

Threat Awareness and Mental Models

AI Agents Are the New Insider Threat and Here Is the Mental Model You Need: The conceptual foundation for AI agent risk. Covers the superuser problem, non-human identity framing, shadow AI, and the agent registry as the minimum viable governance control. Read first.
How Prompt Injection Now Operates at Tool-Chain Scale and What to Do About It: Covers indirect prompt injection, data exfiltration via tool calls, memory poisoning, goal hijacking, and the layered defence model. Includes the Claude Code case studies.
Tool Poisoning, Tool Shadowing, and Rugpull Attacks — The AI Supply Chain No One Is Auditing: The attack taxonomy specific to MCP-connected agent architectures. Explains why code review cannot detect these threats and provides a prioritised mitigation stack.

Technical Controls and Infrastructure

Zero Standing Privilege Is the New Standard for Securing AI Agent Identities: Covers ZSP, JIT access, context-aware authorisation, SSF, CAEP, and the CrowdStrike–SGNL acquisition. Includes a decision framework for SMB teams evaluating identity platforms.
AI Agent Sandboxing Explained — Why Docker Is Not Enough and What Actually Works: Covers Firecracker, gVisor, and Kata Containers with concrete trade-offs. Compares Kubernetes Agent Sandbox, Cloudflare Moltworker, and Northflank for SMB deployment decisions.

Operational Governance

How to Build an AI Agent Governance and Monitoring Programme from Scratch: The operational maturity article. Covers the Microsoft five-capability framework, HITL threshold design, anomaly detection without sensitive log storage, the dual-use paradox, and cross-functional ownership at SMB scale.

Frequently Asked Questions

What is the biggest security risk when you give an AI agent access to your company’s systems?

Excessive standing access. An agent with broad, persistent permissions across multiple systems is the functional equivalent of a privileged insider with no oversight. Zero Standing Privilege addresses this: access granted dynamically when needed, revoked immediately after.

See: AI agents are the new insider threat and Zero standing privilege for AI agent identities.

Can an AI agent be hacked or manipulated into stealing data?

Yes — prompt injection via retrieved documents or emails can redirect an agent to exfiltrate credentials through seemingly normal tool calls without the attacker touching the codebase. This has occurred in production: the Claude Code secrets-leak incident demonstrated that agents can read off-limits files despite exclusion controls.

See: How prompt injection operates at tool-chain scale.

What are the three core security domains that matter most for teams just starting out?

For teams at the beginning of their AI agent security journey, the highest-priority starting domains are: (1) insider threat and agent visibility — you cannot govern or secure agents you cannot see, so an agent registry is the foundation; (2) identity — review every agent’s credential scope and remove any access not required for its specific task; (3) sandboxing — if your agents execute code, Docker containers are not sufficient. These three domains address the most immediate blast-radius risks before you invest in a comprehensive programme.

Is AI agent security different from what we already do for APIs and microservices?

Your existing controls are a prerequisite, not a substitute. Traditional APIs have fixed, predictable behaviour; AI agents reason at runtime, ingest retrieved content as context, and make autonomous tool-call decisions — introducing indirect prompt injection, tool chain attacks, and reasoning-layer manipulation that API security was not designed to address.

What does the CrowdStrike acquisition of SGNL signal about AI agent security?

Enterprise-scale buyers are already investing in dedicated NHI security infrastructure. The $740 million acquisition signals that dynamic, identity-level access enforcement is becoming a platform requirement rather than a specialist add-on — CrowdStrike’s president framed it as ensuring identities hold privileges only for the time required.

See: Zero standing privilege and identity security for AI agents.

Do I need a security team to run AI agents safely or can developers handle it?

Developers can implement the foundational controls — agent registry, scoped credentials, version pinning for MCP dependencies, and basic HITL thresholds. Governance is where cross-functional input is needed: acceptable use policy and audit trail requirements involve Legal, HR, and Compliance as much as engineering.

See: How to build an AI agent governance and monitoring programme from scratch.

What is shadow AI and why does it matter for security?

Shadow AI is employees deploying agents outside sanctioned channels — 29% of employees already do this, per Microsoft Cyber Pulse. Each unsanctioned agent is an ungoverned NHI with inherited credentials and no monitoring. Without a registry to detect them, you cannot enforce any access policy on them.

See: AI agents are the new insider threat and How to build an AI agent governance programme.

Defence-in-Depth Is the Only Model That Works

The six domains in this cluster are not parallel alternatives to choose between. They are complementary layers that depend on each other.

Identity controls (ZSP) constrain what a sandboxed agent can access if containment fails. Sandboxing limits the damage when prompt injection bypasses input sanitisation. Tool chain mitigations — version pinning, signed manifests, mTLS — prevent supply chain compromise from delivering a payload that other controls then have to catch. Governance and observability are the detection layer when preventive controls are working and an agent is still manipulated.

The goal is not to solve each domain independently but to ensure that a failure in any one layer does not produce an uncontained breach. Use the navigation guide at the top of this page to find the article that matches your current concern, or start with AI Agents Are the New Insider Threat if you are building the mental model from scratch.