You would never hand a junior contractor your logged-in laptop and tell them to “just handle it” — no scope, no oversight, full access to email, cloud storage, and every system you’re authenticated to. Yet that is precisely the trust model underlying every agentic browser currently shipping to enterprise users.
The vulnerability that makes this dangerous is not a browser engine flaw. It is the fact that web pages can contain sentences that the agent reads as instructions — and acts on.
Within days of launching ChatGPT Atlas, OpenAI’s own CISO, Dane Stuckey, publicly called prompt injection “a frontier, unsolved security problem.” That is not product humility. It is an accurate description of where the field sits.
This article covers the full attack taxonomy: what prompt injection is, why indirect injection is the dominant threat vector, why Same-Origin Policy and CORS are irrelevant here, and what the confused deputy problem reveals about the root cause. For broader context, see the browser-agent security landscape. The key variable is architecture — not filtering.
What actually makes an agentic browser different from a browser with a chatbot?
An agentic browser is not a browser with an AI sidebar bolted on. A conventional AI feature — “summarise this page,” “answer a question about this PDF” — operates in a read-only, single-origin context. It can describe what it sees. It cannot act on what it sees, and it cannot carry your credentials across domains.
An agentic browser removes both constraints simultaneously. The LLM agent autonomously navigates sites, fills forms, reads emails, and completes multi-step workflows across every domain you’re authenticated to — in a single session, using your full credentials. It is not answering questions. It is taking actions.
When Brave‘s security team tested browsers in this category, they found ChatGPT Atlas, Perplexity Comet, Fellou, and Opera Neon all vulnerable to prompt injection. The hCaptcha Threat Analysis Group found Atlas completed 16 of 19 malicious abuse scenarios with no jailbreaking required.
The architectural root cause is what Noma Security describes as the collapse of three historically separate security layers: browsing, assistance, and action. Traditional browsers kept those layers separate by design. Agentic browsers eliminate those separations — that is literally the value proposition. Risk concentrates where layers merge.
What is prompt injection, and why does it matter for browser agents specifically?
Prompt injection is a security attack where an adversary places malicious instructions in locations an LLM is likely to trust, overriding the model’s intended behaviour. The core vulnerability is structural: developer system prompts and attacker input share the same format — natural-language text. The model has no syntactic mechanism to distinguish them. This is the semantic gap.
Unlike SQL injection — where OR '1'='1' is structurally distinguishable from legitimate input — a prompt injection payload looks identical to a legitimate user request. That is why OWASP lists it as LLM01: the number-one risk in the OWASP Top 10 for LLMs 2025.
For a chatbot, a successful injection produces wrong answers. For an agentic browser, it produces data breaches — forwarding sensitive emails, sending money, deleting cloud files, all using your credentials.
That is why Dane Stuckey framed it as he did: “Prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make ChatGPT agent fall for these attacks.”
For how OWASP and compliance frameworks formally classify these attacks, see how OWASP classifies these attack types.
What is indirect prompt injection, and why is it harder to stop than direct injection?
Direct prompt injection — also called jailbreaking — is when the attacker types malicious instructions directly into the model’s input. They need to control the user interface to do it.
Indirect prompt injection is categorically different. The attacker embeds malicious instructions in external content — a web page, an email, a calendar invite, an image — that the agent later retrieves and processes. The agent executes those instructions as if they were legitimate user directives. The user never sees the malicious content.
For agentic browsers, indirect injection is the dominant threat vector because the entire purpose of the browser is to retrieve and process external web content. Every page the agent visits is a potential injection surface.
Brave’s security team confirmed this is systemic: “Indirect prompt injection is not an isolated issue, but a systemic challenge facing the entire category of AI-powered browsers.” Their research demonstrated invisible injection in practice: malicious instructions hidden in near-invisible text — unreadable to a human, fully readable by OCR and passed to the LLM as trusted content.
Johann Rehberger’s term “offensive context engineering” captures the sophistication: attackers shape the entire context window the agent sees, not just a single injected instruction. Input sanitisation cannot solve this. The “input” is an ordinary web page the user asked the agent to visit. This is why these attacks are demonstrated in real exploits across production systems.
What is link-mediated exfiltration — and how does it turn a web page into a data leak?
Link-mediated exfiltration converts a prompt injection from instruction-override into active data theft. The injected prompt instructs the agent to: locate sensitive data — emails, calendar entries, credentials; encode that data into a URL; and trigger an HTTP request to an attacker-controlled server. No browser engine vulnerability required. No sandbox escape. The agent is following instructions from a channel it trusts.
Miggo‘s Google Gemini calendar exploit illustrates this concretely. A malicious calendar invite contained a dormant payload. When the user made a routine schedule query, it activated: Gemini was instructed to summarise all private meetings, create a new calendar event with that summary in its description, and respond to the user with “it’s a free time slot.” The attacker-created event was visible through shared enterprise calendar configurations. Google had deployed a dedicated language model to detect malicious prompts. The attack succeeded anyway.
The attack surface is no longer code. It is content.
Zero-click exfiltration is the most severe variant: AI agents in messaging platforms can be directed via indirect injection to generate a data-leaking URL, which the platform’s link preview fetches automatically — no user action required. The case studies of prompt injection in production systems show these mechanisms across documented incidents.
Why do Same-Origin Policy and CORS provide no protection against these attacks?
Same-Origin Policy prevents scripts on one origin from reading data from another origin. CORS allows servers to selectively relax those restrictions. Both assume the browser enforces origin boundaries by keeping code sandboxed to its originating context.
Agentic browsers operate with the user’s full authenticated authority across all origins simultaneously — a condition SOP was never designed to address.
Brave’s security team stated this directly: agentic browser assistants “can be prompt-injected by untrusted webpage content, rendering protections such as the same-origin policy irrelevant because the assistant executes with the user’s authenticated privileges. This lets simple natural-language instructions on websites trigger cross-domain actions that reach banks, healthcare provider sites, corporate systems, email hosts, and cloud storage.”
SOP assumes a script is the attacker. In an agentic browser, the attacker is manipulating the agent — which already has legitimate cross-domain authority. The distinction worth holding is SOP irrelevance, not SOP bypass. The entire threat category has moved outside what SOP was built to handle.
Your SOP and CORS configurations are intact. They are the wrong tool for this threat.
What is the confused deputy problem, and why does it explain everything?
The SOP analysis points to something more fundamental. The confused deputy problem.
A privileged program — the “deputy” — is tricked by a less-privileged party into misusing its authority. The deputy has legitimate access. The attacker has no direct access. The attacker exploits the deputy’s legitimacy to achieve what they cannot achieve directly.
Applied to an agentic browser: the AI agent holds the user’s full authentication privileges across all domains. Attacker-controlled web content has no privileges. By injecting instructions into content the agent processes, the attacker manipulates the privileged deputy into taking actions on systems it cannot otherwise reach.
MIT Professor Srini Devadas put it precisely: “The challenge is that if you want the AI assistant to be useful, you need to give it access to your data and your privileges, and if attackers can trick the AI assistant, it is as if you were tricked.”
CloudFactory calls this the “trust paradox.” To be useful, AI agents need access to systems, data, and privileges. To be secure, they require isolation and restricted access. Those requirements are in direct conflict — which is why enterprise AI deployments stay confined to low-risk pilots.
The confused deputy framing is also why making the model smarter does not resolve the problem. The vulnerability is not in the model’s reasoning — it is in the architecture of delegation. For how MITRE ATLAS taxonomises these attack types as architectural risks, this framing is the foundation. For an overview of agentic browser risks across the full threat surface, this architectural distinction is a consistent thread.
Is prompt injection actually unsolved, or is that just vendor hype?
It is genuinely unsolved at the architectural level. This is the consensus of security researchers, standards bodies, and the vendors with the greatest commercial incentive to say otherwise.
OpenAI’s December 2025 blog: “Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved’… Prompt injection remains an open challenge for agent security, and one we expect to continue working on for years to come.”
Simon Willison: “in application security, 99% is a failing grade.” His concern: “it worries me that it’s setting up a false sense of security — if it’s harder but still possible someone is going to get through.”
Johann Rehberger concluded: “Since there is no deterministic solution for prompt injection, it is important to highlight and document security guarantees applications can make… The message remains: Trust No AI.”
The Gemini calendar exploit illustrates why. Google deployed a language model specifically to detect malicious prompts. The attack succeeded anyway — because the payload was semantically plausible. That is the semantic indistinguishability problem: instructions and data share the same format, so there is no syntactic signature to detect.
Mitigations exist and improve continuously. They do not eliminate the vulnerability class.
Probabilistic defences vs. deterministic controls: what is the practical difference?
Probabilistic defences detect and block malicious prompts using pattern recognition, classifiers, and model tuning. They reduce the frequency of successful attacks. They cannot reduce it to zero.
Current examples in production: Spotlighting (Microsoft’s technique using randomised delimiters and encoding to distinguish untrusted content from trusted instructions), Microsoft Prompt Shields (classifier-based detection), adversarial training, Watch Mode (Atlas alerting on sensitive sites), and Logged-Out Mode (operating without authenticated credentials). All work at the level of detection or avoidance — not structural prevention.
Each is bypassable. Atlas completed content filter evasion by decoding Base64-encoded content — the encoding mechanism was the bypass, not the defence.
Deterministic controls provide architectural guarantees that certain attack classes cannot succeed regardless of model behaviour.
FIDES (Microsoft Research) is the primary example. It applies information-flow control to agentic systems. Rather than detecting whether a prompt is malicious, FIDES tracks how data propagates and enforces hard constraints on where it can flow. If data from an untrusted source cannot reach an exfiltration channel by construction, the attack fails regardless of what the model does.
Human-in-the-loop (HITL) architecture: explicit human approval before sensitive actions creates a hard gate that prompt injection cannot bypass.
Principle of least privilege applied to agent permissions reduces the blast radius when an injection succeeds.
Never rely solely on probabilistic defences. Pair them with at least one deterministic control. For governance controls that address these risks in enterprise deployments, the architecture question is where safe adoption frameworks begin.
FAQ
Can a website hack my AI browser without me clicking anything?
Yes, through indirect prompt injection. A malicious page can contain hidden instructions that the agent processes automatically when it navigates there — no click required. In messaging contexts, zero-click exfiltration can trigger when the agent processes a received message and the platform’s link preview automatically fetches an attacker-controlled URL.
Is it safe to let my AI browser log in to my company accounts?
No, not for sensitive accounts. OpenAI explicitly warns against using Atlas with “regulated, confidential, or production data” and positions enterprise access as a beta for “low-risk data evaluation” only. Enterprise access is OFF by default. Use Logged-Out Mode for general browsing, and logged-in sessions only for low-sensitivity tasks.
How bad is the prompt injection problem — will vendors fix it soon?
OpenAI’s own CISO called it “a frontier, unsolved security problem,” and the company’s December 2025 blog post stated it “is unlikely to ever be fully solved.” The architectural root cause — semantic indistinguishability — means the vulnerability class will persist. Expect mitigation, not elimination.
What does it mean for an AI agent to operate with “authenticated privileges”?
When an agentic browser operates in logged-in mode, it has access to every site the user is authenticated to — email, cloud storage, banking, corporate systems. Any action the agent takes is indistinguishable from the user taking that action themselves.
Indirect vs. direct prompt injection: which is harder to defend against?
Indirect injection is significantly harder. The malicious instructions come from external content the agent was designed to process — web pages, emails, documents. Every page the agent visits is a potential injection surface, and the payload can be invisible to the user while remaining fully processable by the agent.
How does the attack surface of an agentic browser compare to a conventional browser?
A conventional browser’s attack surface is the browser engine itself — vulnerabilities require memory corruption, sandbox escapes, or similar technical exploits. An agentic browser adds a parallel attack surface: any web content the agent processes can contain instructions that manipulate its behaviour. No browser engine vulnerability is required.
What is the difference between a prompt injection and a jailbreak?
A jailbreak (direct prompt injection) targets the model’s safety guardrails to produce restricted content. Indirect prompt injection targets the model’s behaviour to take actions the user did not authorise — exfiltrating data, sending emails, modifying files. Jailbreaking is about what the model says; indirect injection is about what the model does.
What is “offensive context engineering” and how does it relate to prompt injection?
Offensive context engineering is Johann Rehberger’s term for carefully crafted web content designed to trick AI agents into taking attacker-directed actions. The attacker shapes the entire context window the agent sees — not just a single injected instruction, but the full informational environment the model uses to decide its next action.
Why do security experts say pattern-matching defences cannot solve prompt injection?
Because prompt injection payloads are written in ordinary natural language indistinguishable from legitimate requests. The Miggo Gemini calendar exploit payload was syntactically innocuous — plausible as a user request — yet semantically harmful when executed with model tool permissions. Google deployed a separate detection model. The attack succeeded anyway.
Should I wait for vendors to solve prompt injection before evaluating agentic browsers?
No, but evaluate with clear boundaries. Use agentic browsers for low-sensitivity tasks with Logged-Out Mode, apply deterministic controls (human approval gates, least privilege permissions) for sensitive use cases, and architect for the assumption that prompt injection will eventually succeed — then ensure the blast radius is contained.