Insights Business| SaaS| Technology OWASP AI Testing Guide: What the Standard Requires
Business
|
SaaS
|
Technology
Jun 10, 2026

OWASP AI Testing Guide: What the Standard Requires

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of the topic OWASP AI Testing Guide: What the Standard Requires

Search for “OWASP AI testing guide” and you land on a sprawling family of documents, not a single PDF. It’s worth clearing that up before your team implements controls from the wrong publication — or misses the one that actually applies to what you’re building.

The OWASP GenAI Security Project is home to all current AI security guidance from OWASP. And for teams with EU exposure, the EU AI Act’s enforcement of high-risk AI provisions starting August 2026 turns this guidance from advisory into something auditors will be checking for. This article maps what each publication requires, what’s new in the agentic list, and how it translates to an audit checklist. It is part of a broader series on the AI code security requirements engineering teams now face.

What is the OWASP AI Testing Guide and what does it actually require?

There’s no single document with that title. The OWASP AI Security and Privacy Guide is where searches often land, but it’s a foundational overview page, not testing guidance. The actual testing guidance lives across the OWASP GenAI Security Project’s publications at genai.owasp.org.

What the project publishes includes the OWASP Top 10 for LLM Applications, the OWASP Top 10 for Agentic Applications 2026, the Secure MCP Server Development Guide, the AIBOM Generator, and the Q2 2026 Red Teaming Landscape. Which ones apply depends on what you’re building. Standalone LLM integrations follow the LLM Top 10. Agentic systems follow the Agentic Top 10. MCP integrations need the MCP guide on top of that.

The OWASP AI Use Case Framework AIUC-1 (May 2026) is the decision-tree crosswalk that maps your AI use case to the applicable control sets. Start there.

How does the OWASP GenAI Security Project structure its publications?

The project publishes across four document categories. Risk catalogues (LLM Top 10, Agentic Top 10) define what can go wrong. Implementation guides (Secure MCP Server Development Guide) define how to prevent it. Tooling documentation (AIBOM Generator) defines how to produce compliance artefacts. Testing guidance (Red Teaming Landscape) defines how to verify your controls are actually working.

The LLM Top 10 and Agentic Top 10 are not version increments of each other — they address categorically different threat models. Teams building agentic systems need both. The OWASP AI Agent Security Cheat Sheet supplements the Agentic Top 10 with concise developer-level controls — handy when you need something to hand directly to your developers. Treat the Q1–Q2 2026 publications as the current baseline. OWASP ran four dedicated GenAI sessions at RSAC 2026 — for context on OWASP’s four GenAI sessions at RSAC 2026 and the standards community’s response, see the conference analysis.

So that’s the map. Now for the part that changes what your team actually needs to do.

What is new in the OWASP Top 10 for Agentic Applications that is not in the LLM Top 10?

The OWASP Top 10 for LLM Applications covers risks at the model interface: prompt injection, sensitive information disclosure, supply chain issues, data and model poisoning, improper output handling.

The OWASP Top 10 for Agentic Applications 2026 uses ASI risk codes and addresses what emerges when a system can plan, use tools, and act on its own. In an agentic system, a prompt injection doesn’t just produce a bad output — it hijacks the agent’s planning process, selects different tools, runs them with inherited user privileges, persists malicious instructions in memory, and propagates tainted instructions to peer agents.

Two new categories your team needs to understand: Excessive Agency (agents granted more authority than their task requires) and Goal Hijacking (ASI01) — an attacker tricking the agent into changing its objective via malicious content in tool outputs or retrieved documents. The mitigation for both is OWASP’s Least Agency Principle: agents get only the minimum scope of authority, tool access, and data access needed for their specific task. Short-lived tokens, per-task scoping, per-agent identity.

The practical question is whether your current threat model covers what your agents can now do. According to industry analysis, only 11% of organisations are running agentic AI in full production, and security governance gaps are the primary barrier.

Why does OWASP have a dedicated Secure MCP Server Development Guide?

The Model Context Protocol (MCP) was introduced by Anthropic in November 2024. It standardises how AI applications connect to external tools, data sources, and services — a universal interface layer that replaces fragmented custom integrations.

MCP servers are a new attack surface because, unlike traditional APIs where developers control every call, MCP lets the LLM decide which tools to invoke, when, and with what parameters. The OWASP Practical Guide for Secure MCP Server Development (February 2026) covers the four controls that matter: authentication on every server (no anonymous access), network isolation, permission scoping per tool, and logging of every tool invocation.

The attack scenario OWASP is most concerned about: a malicious MCP server or compromised tool response causes the agent to take unauthorised actions. Malicious tool descriptions can achieve attack success rates up to 72.8% against production agents. Any team using Claude, Cursor, or building custom MCP integrations needs this guide alongside the Agentic Top 10.

What will the Q2 2026 GenAI Agentic Red Teaming Guide add to existing testing guidance?

The OWASP AI Security Solutions Landscape for AI and Agentic Red Teaming Q2 2026 (April 2026) pushes back on treating red teaming as a pre-release gate. By the time you finish a testing cycle on an agentic system, a tool it relies on has updated its behaviour, a new MCP server has been added, or a downstream agent has been modified. A static eval suite run quarterly is red teaming a version of the agent that no longer exists.

The Q2 2026 guidance covers agentic-specific scenarios the LLM-era playbooks don’t: multi-step goal hijacking, tool misuse chains, privilege escalation across agent handoffs, and persistent state manipulation. For teams without a dedicated red team, the message is simple — AI security testing needs to become a repeatable automated step in your release process, not a manual review. The guide provides the scenario library to build those automated tests from.

How do you generate an AI-BOM that meets OWASP and EU AI Act expectations?

An AI Bill of Materials (AI-BOM — also written AIBOM, AI SBOM, or ML-BOM) is a structured inventory of all components in an AI system: base models, fine-tuned weights, training datasets, third-party API dependencies, and inference infrastructure. OWASP treats AI-BOM generation as a compliance requirement, not an optional transparency measure.

The OWASP AIBOM Generator (December 2025) produces output in CycloneDX format — the machine-readable standard formalised as ECMA-424. EU AI Act Article 53 requires high-risk AI system providers to maintain technical documentation covering component provenance, training methodologies, and distribution channels. CycloneDX AIBOM output satisfies this. Generate it during CI/CD and publish one machine-readable AIBOM per release.

How do OWASP requirements map to your audit checklist?

An EU AI Act conformity assessment auditor reviewing a high-risk AI system looks for evidence across six control domains: system documentation (AI-BOM, architecture records), access control (Least Agency Principle applied), adversarial testing records, supply chain verification (model provenance), monitoring (anomaly detection for agent behaviour), and incident response procedures.

OWASP Agentic Top 10 controls map to EU AI Act Article 9, Article 53, and Article 72, but the mapping isn’t something OWASP publishes directly. Your team needs to construct it. The OWASP guide also grounds the DevSecEng lifecycle this standard requires — mapping controls to your development process is the first step before building audit evidence.

Excessive Agency (AC-1) — tool scope definitions and authorisation records.

Agentic Prompt Injection (PI-1) — input validation logs and tool response sanitisation records.

Goal Hijacking (GH-1) — agent behaviour monitoring logs and anomaly detection alerts.

Supply Chain Security (SC-1) — AI-BOM in CycloneDX format with model provenance records (direct Article 53 evidence).

MCP Server Security (MCP-1) — authentication logs and authorisation scope records for every MCP connection.

Red Teaming (RT-1) — continuous red teaming logs and a test scenario library.

Tools like Apiiro provide OWASP-aligned scanning that can generate audit-ready artefacts for some of these control areas. For a full evaluation of OWASP-aligned vendor tools and how they map to the control domains above, see the vendor comparison. EU AI Act high-risk provisions take effect August 2, 2026, with penalties reaching €15 million or 3% of global annual turnover. Q2 2026 is the practical deadline for your conformity assessment documentation. For the full picture of the AI code security requirements engineering teams now face — covering scanning gaps, organisational change, autonomous pipelines, and vendor selection — see the series overview.

Frequently Asked Questions

Is there a single OWASP document called the “AI Testing Guide”?

No. “OWASP AI Testing Guide” resolves to the OWASP GenAI Security Project’s publication family: the Top 10 for LLM Applications, the Top 10 for Agentic Applications, the Secure MCP Server Development Guide, and the Red Teaming Landscape. The OWASP AI Security and Privacy Guide is often the search landing page but is not the same as testing guidance.

Where can I download the OWASP Top 10 for Agentic Applications 2026?

The OWASP Top 10 for Agentic Applications 2026 is published at genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/ as a PDF and versioned GitHub repository.

What is the difference between the OWASP LLM Top 10 and the Agentic Top 10?

The LLM Top 10 (LLMxx risk IDs) addresses model-interface risks. The Agentic Top 10 (ASIxx risk IDs) addresses risks that only emerge when an AI system acts autonomously: Excessive Agency, Goal Hijacking, tool misuse, privilege escalation across multi-step agent chains. Teams building agentic systems need both.

Do I need an AI Bill of Materials to comply with the EU AI Act?

For high-risk AI systems under EU AI Act Article 53, technical documentation including component provenance is required. An AI-BOM in CycloneDX format is the practical mechanism for satisfying this — OWASP treats it as a compliance requirement, not an optional best practice.

What is the OWASP Secure MCP Server Development Guide and who needs it?

Security requirements for implementing Model Context Protocol servers — the interface layer allowing AI agents to call external tools and data sources. Any team using Claude, Cursor, or building custom MCP integrations needs this guide alongside the Agentic Top 10.

What is the Least Agency Principle in OWASP’s 2026 guidance?

OWASP’s design mandate for agentic AI systems: agents get only the minimum scope of authority, tool access, and data access needed for their specific task. The agentic equivalent of least privilege, and the primary control for preventing Excessive Agency.

What is Excessive Agency in the OWASP Agentic Top 10?

Excessive Agency is when an AI agent is granted more authority, tool access, or data scope than it needs. Analogous to an over-privileged service account. OWASP’s mitigation is the Least Agency Principle: define and enforce minimum required scope for each agent at design time.

Is the OWASP AI Security and Privacy Guide the same as the AI Testing Guide?

No. It’s a foundational overview page. The actual testing guidance lives across the OWASP GenAI Security Project publications, particularly the Agentic Top 10 and the Red Teaming Landscape.

How does continuous red teaming differ from traditional pre-release security testing for AI?

Traditional pre-release testing is a point-in-time gate. Continuous red teaming integrates adversarial testing into deployment pipelines and production monitoring. Agentic systems require this because their attack surface evolves at runtime — tool outputs, retrieved documents, and inter-agent communications introduce new adversarial inputs after deployment.

What is AIUC-1 and how does it help me choose the right OWASP document?

AIUC-1 is the OWASP AI Use Case Framework crosswalk (May 2026). It maps AI use cases by system type, deployment context, and risk level to applicable OWASP control sets — the recommended starting point for teams overwhelmed by the volume of OWASP AI publications.

Can OWASP alignment help with EU AI Act conformity assessment?

Yes. OWASP Agentic Top 10 controls map to EU AI Act Article 9, Article 53, and Article 72. OWASP alignment doesn’t guarantee conformity assessment approval, but it provides a widely-recognised control framework that auditors are increasingly citing as a reference standard.

What role does the OWASP AIBOM Generator play in supply chain security?

The OWASP AIBOM Generator produces AI Bill of Materials output in CycloneDX format, capturing model provenance, dataset lineage, fine-tuning history, and dependency chains. It supports supply chain security review under the Agentic Top 10 and satisfies EU AI Act documentation requirements for high-risk AI system components.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter