Insights Business| SaaS| Technology From Page Fetcher to Execution Environment — The Architectural Shift
Business
|
SaaS
|
Technology
May 19, 2026

From Page Fetcher to Execution Environment — The Architectural Shift

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of the topic From Page Fetcher to Execution Environment — The Architectural Shift

Palo Alto Networks chose their words carefully: “The enterprise browser is transforming from a viewing tool into an execution environment.” That’s a category change, not a feature upgrade. A viewing tool renders content for a human to act on. An execution environment takes real-world actions autonomously on that human’s behalf.

AI-native browsers and agentic frameworks have crossed from experimental to enterprise-deployed. The vocabulary needed to reason about them has not kept pace — and imprecise vocabulary produces imprecise policy.

This article defines the terms you need: execution environment, DOM access, authenticated session, semantic work graph, permission ladder, and agentic attack surface. For the full landscape view, start with the AI browser agents — the complete security guide.

What does “execution environment” actually mean — and why does it change everything?

A traditional browser is a viewing tool. It retrieves and renders content, and every consequential action — clicking, submitting, purchasing — is performed by the human at the keyboard. An execution environment reverses this. The AI agent perceives page state, reasons about the goal, and takes action. The browser is now a runtime, not a display surface.

When the browser is a viewing tool, your security model is about what data the human is allowed to see. When the browser is an execution environment, your security model is about what actions the agent is allowed to take. Those are two completely different problems.

The distinction from AI-assisted browsers is easy to miss. A ChatGPT sidebar extension, a browser copilot, a Grammarly overlay — these assist the human but do not replace human-driven navigation. The human still initiates and confirms every consequential action. The AI in an execution environment is not the assistant. It is the actor.

This also separates agentic browsers from Robotic Process Automation (RPA). RPA executes fixed, pre-scripted action sequences. An agentic browser reasons about the goal — “process this refund,” “file this compliance exception” — and dynamically constructs its action sequence to reach that outcome. Audit frameworks built for deterministic RPA scripts do not map onto agents that reason dynamically and adapt to unexpected page states.

How does DOM access turn a browser from passive to active?

The Document Object Model (DOM) is the live, structured representation of a webpage — its form fields, buttons, links, navigation, and dynamic content. Without DOM access, an AI can read a page’s visible text but cannot interact with it. With DOM access, it can fill fields, click buttons, submit forms, and trigger JavaScript events programmatically. DOM access is the bridge: it transforms “I can read this page” into “I can act on this page.”

The perception-action loop that runs on top of DOM access is what security practitioners call the agentic loop: the agent reads DOM state, reasons about the next action, writes a DOM action, and repeats until the goal is achieved. Each cycle is an opportunity to take a real-world action — and an opportunity for an attacker to influence what action is taken.

This is where indirect prompt injection enters. Attackers embed malicious instructions in DOM content using techniques invisible to human perception — white text on white backgrounds, hidden HTML comments, text buried in image metadata. The agent processes it as a legitimate instruction, and traditional security tooling has no interception point between the malicious prompt and the resulting action. For the full exploitation chain, see how zero-click attacks exploit the execution environment.

Why does an agent inheriting your login state create enterprise-level risk?

When an agent runs inside the user’s browser context, it inherits every active authenticated session — banking, SaaS CRM, HR systems, email, code repositories — without needing to steal or request credentials. This is authenticated session inheritance: the agent does not bypass authentication; it operates inside it. From the application’s perspective, it is indistinguishable from the human user.

As Netwrix puts it, agents “do not create pre-existing privilege sprawl, but they can exercise it at a speed and scale that a human user never would.” OAuth and SAML have no mechanism to distinguish an autonomous agent from the human whose session it inherited.

Traditional perimeter security operates at the wrong layer. DLP and CASB tools monitor network egress and API boundaries — they’re architecturally blind to actions taken inside an authenticated session. An agent completing a data exfiltration task does so through actions that register as entirely normal to those tools.

The audit trail consequence compounds the problem. In most real deployments, agents run using borrowed human credentials — making it impossible to distinguish “the user accessed the database” from “the user’s agent accessed the database.” Palo Alto Networks is direct: “in a world of autonomous clicks, the audit trail vanishes.” For the full taxonomy of what attackers can do with these structural weaknesses, see the full taxonomy of attack categories this architecture enables.

What is the semantic work graph and why does it signal a platform shift?

The semantic work graph is Perplexity‘s framing for what an agentic browser builds on top of execution-environment access: a durable, cross-domain work context that persists across sessions. As MindStudio observed, Comet gives Perplexity a persistent view of your calendar, GitHub, email, and internal tools — “that’s not a search product, that’s a work graph product.”

The graph is composed of semantic work primitives: meaningful units of work — not button clicks, but actual goals: a refund, a reschedule, a payment authorisation. That is how the browser becomes a coordination layer across all authenticated applications, not an interface to any single one.

The correct design response is the permission ladder: a staged privilege model with five rungs:

  1. Read-only — observe and report; no actions
  2. Suggest — propose actions for the human to initiate
  3. Draft — prepare actions that the human reviews and sends
  4. Act with confirmation — execute with explicit human approval at each step
  5. Act autonomously — execute without human review

Risk concentrates at the jump from rung four to rung five. Human-in-the-Loop (HITL) oversight is the practical implementation of rung four — the approval gate that maintains human accountability before full autonomy is granted. Skip straight to rung five and the agent inherits full session scope from the first action, with no checkpoints.

How does a single-tab action become a multi-site autonomous workflow?

The agentic loop is not bounded to one tab or one origin. Consider a concrete example: an agent instructed to process a vendor invoice reads it from email, validates line items against the ERP system, routes for approval via the internal approval application, and confirms processing back to the vendor’s portal — all in a single goal-directed sequence, across four separate authenticated contexts.

This is persistent session state: the durable, cross-site authenticated context that enables multi-site workflows. The end-state is autonomous task completion: a goal goes in, a completed real-world outcome comes out.

Model Context Protocol (MCP) enables the next layer: multi-agent coordination. One browser agent can delegate sub-tasks to specialised agents, each operating in their own execution environment, all coordinating via MCP. The cross-agent coordination is both the productivity multiplier and the attack surface multiplier. A poisoned MCP tool can manipulate an agent into delegating data exfiltration to a malicious external agent, which renders a fake success confirmation — the user sees “task completed successfully” while data is being transferred out.

Why is the agentic attack surface “broad, undocumented, and expanding”?

The agentic attack surface is every system, application, and service reachable by a browser agent operating inside a user’s authenticated session. Zero Networks defines it precisely: “In most enterprises today, that surface is broad, largely undocumented, and expanding faster than it is being governed.”

The Same-Origin Policy (SOP) — which prevents a script on bank.com from reading data from email.com — fails structurally in the agentic context. The agent is the cross-site bridge that SOP was designed to prevent. It legitimately navigates between both origins as part of its workflow, and SOP cannot isolate what it was not designed to isolate.

Trail of Bits models the threat around four trust zones — Chat Context, Third-Party Servers, Browsing Origins, and External Network — and four violation classes describing improper data flow across them. A complete exfiltration attack chains three violations: INJECTION (attacker payload enters the agent’s context), CTX_IN (agent reads sensitive authenticated data), and CTX_OUT (that data sent externally) — all logged as the human user’s actions. Trail of Bits exploited these isolation failures in multiple real agentic browsers. The web security community learned these lessons the hard way; agentic browsers are repeating the same journey.

Shadow AI expands the surface beyond IT visibility. Nearly two-thirds of organisations lack the policies to detect shadow AI (Zero Networks) — meaning an unsanctioned agentic browser is an invisible gap sitting entirely outside any governance tooling you’ve deployed. The full scope of what that gap means for your organisation is covered in the agentic browser security and governance overview.

What is MCP and why does it create a structural vulnerability layer?

Model Context Protocol (MCP) is an open protocol developed by Anthropic that standardises how AI models communicate with external tools, services, and other agents. It is the coordination infrastructure for multi-agent workflows — it enables multi-agent capabilities and introduces a corresponding vulnerability layer.

MCP introduces specific attack vectors: tool poisoning embeds malicious instructions in tool descriptions; rug pull attacks let tools mutate behaviour after approval; cross-server shadowing instructs the agent to always BCC [email protected] on outgoing email and never mention it to the user. Vendor research documents that a significant proportion of deployed MCP servers have command injection flaws, unrestricted URL fetch capabilities, or file path traversal vulnerabilities.

CometJacking, documented by PointGuard AI in November 2025, makes MCP-layer exploitation concrete. The attack begins with a prompt injection via XSS in a webpage, escalates through Perplexity Comet’s MCP API, and achieves full device control — data exfiltration, file encryption, lateral movement, and persistent backdoors — before traditional endpoint detection triggers.

OWASP Top 10 for Agentic Applications 2026, published December 2025, is the first formal governance framework addressing these risks. It covers ten risk categories from ASI01 (Agent Goal Hijack) through ASI10 (Rogue Agents). It exists because existing governance frameworks — built for human users, RPA, and traditional web applications — do not address the execution environment risk model.

What governance frameworks and products address the execution environment risk?

The OWASP Top 10 for Agentic Applications 2026 is the governance baseline. The EU AI Act‘s high-risk AI obligations take effect in August 2026; the Colorado AI Act becomes enforceable in June 2026 — both moving faster than most enterprise security roadmaps.

What you need are controls that operate inside the browser session, at the point of action: identity-aware visibility that distinguishes agent-originated actions from human-originated actions in real time.

Prisma Browser (Palo Alto Networks) is the leading enterprise product addressing this gap. Its AI Runtime Security analyses prompts in real time and blocks malicious instructions before execution. Step-Up MFA and Just-in-Time approval gates map directly onto the permission ladder. For full depth, see the dedicated article.

The bounded identity pattern is the recommended architectural mitigation: scope agent identity to the minimum privilege required for the specific task. If users carry excessive permissions today — which most enterprise users do — a browser agent inherits every one and exercises them at machine speed. Remediating over-provisioned access before agent adoption is the highest-return preparatory action available.

The products that put execution environment concepts into practice: Perplexity Comet as the leading commercial example and MolmoWeb as the open-source implementation. Both expose the same structural risks. For the complete enterprise governance picture, see the AI browser agents — the complete security guide.

FAQ

What is the difference between an agentic browser and using a browser extension like the ChatGPT extension?

A browser extension assists the human — who still initiates and confirms every consequential action. An agentic browser places the AI as the primary actor: it perceives page state, decides actions, and executes them autonomously, inheriting every logged-in service and every permission the user holds.

What does “executing in an authenticated session” mean in practice?

Your browser holds session cookies and tokens that tell logged-in services “this is the authorised user.” An agent inherits those tokens — it can take actions in every logged-in service as if it were you, without your password, and without those applications knowing the actor has changed.

Is the “execution environment” concept specific to Perplexity Comet?

No — it is the category, not the product. MolmoWeb implements the same architecture open-source. The trust zone failures Trail of Bits documented apply across all agentic browser implementations. Palo Alto Networks coined the framing precisely because it describes the category shift.

What is the OWASP Top 10 for Agentic Applications?

Published December 2025 — the first dedicated governance framework for agentic AI, analogous to the classic OWASP Top 10 for web applications. It covers ASI01 (Agent Goal Hijack) through ASI10 (Rogue Agents). Available at genai.owasp.org.

What is the Same-Origin Policy and why does it fail for agentic browsers?

SOP prevents a script on bank.com from reading data from email.com. In the agentic context it fails structurally: the agent is a cross-site actor that legitimately navigates between origins as part of its workflow — it can read authenticated data from one origin and use it in another, defeating the isolation SOP was built to provide.

How is an agentic browser different from RPA (Robotic Process Automation)?

RPA executes fixed, pre-scripted action sequences — change the page layout and the bot fails. An agentic browser reasons about the goal and adapts dynamically. From a governance perspective, RPA’s fixed scripts are auditable in advance; an agent’s dynamic action sequence is not.

What is the permission ladder and why does it matter?

The permission ladder is a staged privilege model: read-only → suggest → draft → act with human confirmation → act fully autonomously. Risk concentrates at the jump from rung four to rung five — where the agent prepares and the human authorises, versus the agent acting alone with full inherited session scope.

Why can’t my existing DLP and CASB tools see what an AI browser agent is doing?

DLP monitors network egress; agents act within the browser runtime before data crosses any boundary. CASBs monitor SaaS API boundaries; agents move data at the DOM level, not through APIs. Both are blind to the execution layer — identity-aware visibility, as provided by Prisma Browser, is what distinguishes agent actions from human actions.

What is CometJacking?

CometJacking, documented by PointGuard AI in November 2025, begins with a prompt injection via XSS in a webpage, escalates through the Comet browser’s MCP API, and achieves full device control — ransomware, data theft, and persistent backdoors — before traditional endpoint detection triggers. It is the most fully documented example of MCP-layer exploitation in a real agentic browser product.

What is “shadow AI” in the agentic browser context?

Shadow AI refers to employees adopting unsanctioned AI tools outside IT oversight. In the agentic browser context this is particularly consequential: an unsanctioned agentic browser creates an attack surface the enterprise cannot see, govern, or audit — which is why enterprise visibility controls must extend to the browser layer, not just the network perimeter.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter