Insights Business| SaaS| Technology The 50 Percent Success Rate — Why Current Defences Are Not Enough
Business
|
SaaS
|
Technology
May 27, 2026

The 50 Percent Success Rate — Why Current Defences Are Not Enough

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of layered defences and the 50 percent prompt injection success rate

According to the International AI Safety Report 2026 (UK DSIT), cited by Vectra AI, a persistent attacker attempting prompt injection against systems with safeguards in place still succeeds 50% of the time. Not against unprotected systems. Against systems where some defences are already active. The word is “some.” For the full picture, see the full industrial-scale prompt injection threat in production 2026.

What Does a 50% Prompt Injection Bypass Rate Actually Mean for Organisations With Some Defences Already Deployed?

The 50% figure comes from International AI Safety Report 2026 (UK DSIT), Figure 14. Vectra AI cites it in the context of production GenAI deployments with safeguards active — it’s Vectra AI’s reported figure citing the DSIT report, not independently verified primary data from Vectra AI’s own research. The DSIT policymakers summary does independently confirm that “although developers have made it harder to bypass safeguards, attackers still succeed at a moderately high rate.”

The “over multiple attempts” qualifier is the most important part of that statistic. This isn’t a per-attempt coin flip. It’s a cumulative persistence rate — attackers iterate payloads until one gets through. A layered defence stack can reduce attack success from 73.2% to under 10%. But for organisations running partial deployment — input filter active, output filtering absent, no retrieval isolation — the 50% rate is a conservative baseline. Partial deployment leaves that gap wide open.

What Does the Siemba ROAR 4 and HouYi Research Add Beyond Vectra’s Single Statistic?

A single statistic from one source, with methodological caveats, is a weak foundation for anything. Independent corroboration is what changes the picture.

Siemba’s ROAR 4 report (Q1 2026, practical red team testing) found 1 in 3 AI-integrated applications has directly exploitable LLM vulnerabilities. “Directly exploitable” means testers extracted system prompts without advanced techniques. Their conclusion: “System prompts are not security controls. Security lives in scoped permissions and validated outputs.”

The HouYi Framework academic research adds a third data point: 86% of 36 real-world LLM applications tested were vulnerable. Cisco’s State of AI Security 2026 adds a fourth: prompt injection weaknesses in 73% of audited production AI deployments.

Government safety report. Commercial red team. Academic framework testing. Independent security audit. Different methods, same conclusion. The OWASP classification of what still gets through current defences explains why the same pattern keeps showing up across all of them.

Why Do Single-Layer Prompt Injection Defences Fail Against a Multi-Class Attack Taxonomy?

Single-layer defences fail because the attack surface is bigger than any single control point can cover.

Digital Applied’s security analysis — from approximately 200 production audits — identifies ten distinct attack classes by delivery vector: direct user input, indirect via content, tool outputs, memory, RAG sources, collaborative agents, document attachments, email bodies, API responses, and shared user sessions. Input filtering addresses class 1. Nine of those ten classes arrive through trusted channels an input filter never even sees.

In 2026, indirect prompt injection accounts for over 55% of observed attacks, with 20–30% higher success rates than direct injection. The structural root cause: LLMs process system instructions and user inputs in the same text format with no enforced execution boundary. AWS explicitly acknowledges this for Bedrock Guardrails: “There is no single control that can remediate indirect prompt injections.”

How Does Multi-Tenant SaaS Architecture Amplify the Risk That Single-Layer Defences Leave Unaddressed?

In multi-tenant SaaS deployments, shared inference infrastructure creates cross-tenant attack pathways that simply don’t exist in single-tenant environments. A Cybersecurity Journal analysis found 12 of 18 LLM vulnerabilities are amplified in multi-tenant versus single-tenant deployments. An attacker working at a 50% bypass rate against a shared platform doesn’t need to target any single tenant — they can automate attempts across the whole system.

Gravitee’s 2026 survey found 80.9% of organisations have agents in active testing or live production, yet only 14.4% have full security approval. Tenant isolation controls (Secure Multi-Tenant Architecture / burn-after-use session patterns) achieve a 92% defence success rate against cross-tenant leakage — at a 15–30% throughput cost that many operators keep deferring. For context on how these attack patterns have scaled, see how injection attacks have industrialised in 2026.

What Does Defence-in-Depth for Prompt Injection Actually Look Like When Grounded in Evidence?

Defence-in-depth is a documented production architecture, not a theoretical concept. The Digital Applied four-layer framework looks like this:

  1. Input Sanitisation: addresses direct injection (attack class 1 of 10)
  2. Tool Restriction: limits blast radius from agentic execution
  3. Output Validation: catches exfiltration before data leaves the system
  4. Human Review: checkpoints for irreversible actions

Each layer addresses a different attack surface. Commercial tools like Lakera and open-source tools like LLM Guard implement specific layers within this framework. They’re not substitutes for the framework itself.

For budget-constrained organisations, the minimum viable baseline is three layers: input validation (direct injection), output filtering (exfiltration), and retrieval isolation (indirect injection via RAG). Open-source tools — Garak, PyRIT, Promptfoo, LLM Guard — reduce cost per layer. Agentic systems need a behavioural monitoring layer on top of that.

What Does the 80%/10% Governance Gap Mean for Organisations That Have Procured AI Security Products?

The real question for organisations that have already bought AI security products: does “we have something” mean “we have coverage”?

IBM Institute for Business Value found only 24% of ongoing GenAI projects consider security, despite 82% saying secure AI is crucial. The gap between procurement threshold (a product is deployed) and architecture threshold (deployed controls actually cover the full attack surface) is precisely where the 50% bypass rate lives.

OWASP LLM01:2025, MITRE ATLAS AML.T0051, and NIST AI 600-1 classify prompt injection at the highest severity level across every major security taxonomy. MITRE ATLAS distinguishes direct injection (AML.T0051.000) and indirect injection (AML.T0051.001) as requiring different controls — a single guardrail addressing only one vector leaves the other unaddressed by design.

Addressing the gap means mapping deployed controls against the four attack channels and working out which ones have no coverage. Red team tools — Garak, PyRIT, Promptfoo — provide empirical measurement where gap analysis falls short. For the specific product gaps the 50% success rate exposes, see the specific product gaps the 50% success rate exposes. For a complete view of how these attack patterns have industrialised and what organisations are doing about them across the full stack, see the full industrial-scale prompt injection threat in production 2026.

Frequently Asked Questions

Is a 50% prompt injection success rate normal even with AI safety tools deployed?

Yes. The International AI Safety Report 2026 (UK DSIT) confirms “attackers still succeed at a moderately high rate” even with safeguards deployed. The “over multiple attempts” qualifier means this is a cumulative persistence rate — an attacker iterating payloads needs only one success.

What exactly does Vectra AI’s 50% bypass rate statistic measure?

Vectra AI cites DSIT’s International AI Safety Report 2026 Figure 14 — prompt injection attack success rates by model release date. “Over multiple attempts” means cumulative attacker persistence. Treat it as a reported figure, not independently verified primary data from Vectra AI’s own research.

What is the difference between a guardrail and defence-in-depth?

A guardrail is a single vendor product layer at specific input and output points. AWS explicitly states no single control can remediate indirect prompt injections. Defence-in-depth applies the Digital Applied four-layer framework across multiple independent controls at different system points. One product is one layer of a four-layer minimum.

What is indirect prompt injection and why do input filters miss it?

Indirect prompt injection embeds malicious instructions in external content the model retrieves — documents, emails, tool outputs, database records. Input filters monitor the user input channel (attack class 1 of 10). Indirect injection arrives via the other nine classes, all through trusted channels the filter doesn’t inspect.

Is my company’s AI assistant safe to use with sensitive data?

That depends on the defence architecture deployed, not the model. The 50% bypass rate, Siemba’s 1-in-3 directly exploitable finding, and HouYi’s 86% vulnerability rate collectively indicate most production AI deployments have material exposure. The question is whether the deployment has coverage at input, retrieval, output, and (for agentic systems) behavioural layers.

How does multi-tenant SaaS amplify prompt injection risk?

In multi-tenant deployments, successful injection in one tenant context can affect others via shared infrastructure. Tenant isolation controls achieve a 92% defence success rate against cross-tenant leakage, at a 15–30% throughput cost many operators defer.

What is a minimum viable multi-layer defence for a budget-constrained organisation?

Three layers: input validation (direct injection), output filtering (exfiltration), and retrieval isolation (indirect injection via RAG). Open-source tools — Garak, PyRIT, Promptfoo, LLM Guard — reduce cost per layer. Agentic systems require an additional behavioural monitoring layer.

What does the 80% deploy / 10% strategy governance gap mean for AI security?

Gravitee’s 2026 survey found 80.9% of organisations have agents in production, yet only 14.4% have full security approval. IBM found only 24% of ongoing GenAI projects consider security despite 82% saying it’s crucial. The gap between “we have a filter” and “we have a defence strategy” is where the 50% bypass rate lives.

Does using a newer LLM model reduce prompt injection risk significantly?

The DSIT report notes developers have made safeguards harder to bypass — but attackers still succeed at a moderately high rate. Model selection is not a substitute for architectural controls.

What does OWASP LLM01:2025 classify as the top AI application risk?

Prompt injection is ranked LLM01:2025 — the top LLM application risk. Unlike SQL injection, “no equivalent guaranteed defence exists.” MITRE ATLAS distinguishes direct (AML.T0051.000) and indirect (AML.T0051.001) injection as requiring different controls. NIST AI 600-1 corroborates the top-severity classification.

How do you determine whether current AI defences are adequate?

Map deployed controls against the Digital Applied four attack channels: direct input, indirect retrieval, output exfiltration, and (for agentic deployments) autonomous behaviour. Identify which channels have no control. “Adequate” means coverage at multiple independent layers, not presence of a single product.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter