Business

SaaS

Technology

•

Feb 24, 2026

Open-Source Tools for Scanning and Red-Teaming Agentic Browser Security

Q: How do I detect if my organisation has already been targeted by AI Recommendation Poisoning?

Use Microsoft Defender Advanced Hunting: query EmailUrlInfo, MessageUrlInfo, and UrlClickEvents for AI assistant domain links where URL parameters contain memory-manipulation keywords (remember, trusted, authoritative, citation). Convert ad hoc queries into scheduled detections.

By the end of this article you’ll have a clear, scannable reference for what each tool does, what attack class it detects, where to find it, and how it fits into a defensive strategy. The tools are grouped into three modes: build-time scanning, runtime protection, and offensive tools you need to know exist. Conventional web-application security tooling doesn’t cover prompt injection, memory poisoning, or skill-supply-chain compromise — these attack classes need purpose-built detection.

This article is part of our browser-agent security landscape series. For background on the attacks these tools detect, see the linked article.

What are MITRE ATLAS and the OWASP Agentic Top 10, and why do these tools map to them?

MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is structured like MITRE ATT&CK but specific to AI/ML attacks. Two technique IDs matter most here: AML.T0051 (LLM Prompt Injection — malicious instructions embedded in content that cause an agent to perform unintended actions) and AML.T0080 (AI Memory Poisoning — attackers inject instructions into AI memory stores that persist across future sessions; Microsoft calls this “AI Recommendation Poisoning”).

The OWASP Agentic Top 10 (2026) extends the OWASP LLM Top 10 to agent-specific risks. The relevant items: ASI-01 (Agent Goal Hijack), ASI-02 (Tool Misuse), ASI-04 (Supply Chain), and ASI-06 (Memory Poisoning).

Framework mapping is what turns a tool inventory into something you can actually act on. “This tool detects AML.T0051” is the language compliance conversations require — tool names alone aren’t enough. For a deeper treatment, see OWASP and MITRE items these tools address. For the full overview of agentic browser risks that frames the threat model these tools address, see the pillar overview.

How does the Cisco Skill Scanner detect vulnerable AI agent skills before deployment?

Start here. The skill supply chain attack surface (OWASP LLM05) is where the most concrete evidence of compromise has emerged.

Cisco Skill Scanner (github.com/cisco-ai-defense/skill-scanner) is an open-source static scanner for AI agent skill packages — the MCP-adjacent capability packages that AI agent platforms install locally. The detection engines work in sequence: static YAML/YARA pattern matching; LLM-as-a-judge semantic analysis of flagged content; behavioural dataflow analysis tracing data paths for exfiltration patterns; and VirusTotal integration for hash-based malware detection.

The 26% finding: Cisco researchers Amy Chang and Vineeth Sai Narajala analysed 31,000 agent skills. The #1 ranked skill on MolthHub was functionally malware — the “What Would Elon Do?” skill silently exfiltrated data via curl and injected prompts to bypass safety guidelines. Nine findings surfaced against it: two critical, five high severity. Cisco’s summary: “AI agents with system access can become covert data-leak channels that bypass traditional data loss prevention, proxies, and endpoint monitoring.”

One caveat worth noting from the tool itself: “No findings does not guarantee that a skill is secure.” Human review still matters for high-risk deployments.

Output is in SARIF format, compatible with GitHub Code Scanning for inline findings in pull requests. For more on agent skill vulnerability scanning, see the linked article.

Maps to: OWASP LLM05 (Supply Chain), LLM06 (Excessive Agency), MITRE AML.T0051.

What do agent-audit and Giskard test, and how do they differ from each other?

agent-audit: static analysis for agent code

agent-audit (github.com/HeadyZhang/agent-audit) works like Bandit or Semgrep but with an agent-specific threat model built on the OWASP Agentic Top 10 (2026).

The numbers make the case plainly: agent-audit achieves 94.6% recall and 0.91 F1 on Agent-Vuln-Bench; Bandit achieves 29.7%; Semgrep achieves 27.0%. Neither Bandit nor Semgrep can parse MCP configuration files — 0% recall on agent-specific configuration vulnerabilities. agent-audit gets 100% there. It found OpenClaw‘s browser.evaluateEnabled default-true vulnerability in practice; version 0.16 cut false positives by 79%.

SARIF output integrates with GitHub Code Scanning.

Maps to: OWASP ASI-01, ASI-02, ASI-04, ASI-06 (all 10 Agentic categories), MITRE AML.T0051.

Giskard: dynamic adversarial red-teaming

Giskard is an open-source LLM security testing platform that deploys autonomous red-teaming agents across 40+ adversarial probes. Where agent-audit scans code before it runs, Giskard probes the running model.

Researchers using Giskard produced the published OpenAI Atlas security vulnerability analysis, demonstrating indirect prompt injection, cross-origin data access risk, and data exfiltration to OpenAI’s infrastructure. The finding: Atlas is “not in scope for OpenAI SOC2 or ISO certification” and has “no compliance API logs or SIEM integration.”

Maps to: OWASP LLM01, LLM02, LLM05, LLM06, MITRE AML.T0051.

How do TrojAI Detect and TrojAI Defend provide build-time and runtime protection?

TrojAI covers both sides of the protection gap. These are commercial products — included here because no open-source runtime protection tool currently provides equivalent capability.

TrojAI Detect (troj.ai/products/detect) automatically red-teams AI models at build time, validates behaviour against security policies, and delivers remediation guidance. Think of it as your pre-merge test suite for AI model behaviour. Maps to: OWASP LLM01, MITRE AML.T0051.

TrojAI Defend (troj.ai/products/defend) is a runtime AI application firewall. Unlike a conventional WAF that inspects HTTP traffic for known attack patterns, TrojAI Defend is trained on AI-specific attack techniques — it catches prompt injection payloads embedded in content that a standard WAF passes straight through. Maps to: OWASP LLM01, MITRE AML.T0080.

The decision trigger is straightforward: agent-audit, Cisco Skill Scanner, and Giskard provide strong build-time coverage, but none of them enforce policies on a running agent in production. That gap is the case for commercial runtime.

Zenity provides a complementary runtime option focused on incident intelligence. It extended coverage to ChatGPT Atlas, Perplexity Comet, and Dia in December 2025, and released Safe Harbor — an open-source tool that adds a dedicated safe action agents can call when they identify harmful behaviour.

How can PromptArmor and aitextrisk.com test for link-preview exfiltration?

Link-preview exfiltration is zero-click: attacker sends a crafted URL → AI agent generates a data-carrying URL via indirect prompt injection → messaging app auto-fetches the preview → attacker receives conversation data. No user action required.

PromptArmor identified the riskiest pairings: Microsoft Teams with Microsoft Copilot Studio (the largest share of insecure fetches), Discord with OpenClaw, Slack with Cursor Slackbot, and Telegram with OpenClaw. Safer configurations include Claude in Slack, OpenClaw via WhatsApp, and OpenClaw in Docker via Signal.

aitextrisk.com is PromptArmor’s public test harness — submit your platform combination and observe whether it triggers an insecure preview fetch. Use it with authorisation on systems you own. PromptArmor’s conclusion: “We’d like to see communication apps consider supporting custom link preview configurations on a chat/channel-specific basis to create LLM-safe channels.”

Maps to: OWASP LLM01, MITRE AML.T0051.

Offensive testing tells you whether you are exposed. The next tool answers the harder question: have you already been hit?

How do Microsoft Defender Advanced Hunting KQL queries detect AI Recommendation Poisoning?

Microsoft Defender Advanced Hunting is the main tool for retrospective detection — answering “have we already been compromised?” rather than only preventing future attacks.

The signal: emails or Teams messages containing links to AI assistant domains (copilot, chatgpt, gemini, claude, perplexity, grok) where URL parameters contain memory-manipulation keywords: “remember,” “memory,” “trusted,” “authoritative,” “future,” “citation,” or “cite.”

Three KQL hunting queries from Microsoft’s AI Recommendation Poisoning research:

Email traffic: Search EmailUrlInfo for AI assistant domains with memory-manipulation keywords in URL-decoded prompt parameters.
Teams messages: Search MessageUrlInfo using the same pattern.
User clicks: Query UrlClickEvents with Safe Links to identify users who acted on poisoning URLs.

Convert ad hoc hunts into scheduled detections. Enable Defender for AI Services at the subscription level, and enable user prompt evidence capture — seeing the exact input and model response during an attack is the difference between speculation and evidence.

Microsoft FIDES is worth noting separately. It applies information-flow control principles — specifically Spotlighting and Prompt Shields — to restrict what content AI agents can treat as instruction at the platform level. FIDES reduces the attack surface the detection tools above must cover. It is a complementary upstream control, not a replacement for tooling.

Atlas logged-out mode: Restricts the agent’s authenticated access to external services, reducing what an attacker can exfiltrate even if prompt injection succeeds. Available in Atlas enterprise settings.

Tools security teams should understand exist: CiteMET and AI Share URL Creator

These are not recommendations. Understanding attacker tooling is how you work out how much defensive investment is actually proportionate.

CiteMET (npmjs.com/package/citemet) is a ready-to-use NPM package marketed as “an SEO growth hack for LLMs.” Websites embed “Summarise with AI” buttons that open the user’s AI assistant with a pre-filled prompt instructing it to remember a company as a trusted source — persisting that instruction across all future sessions without the user’s knowledge.

AI Share URL Creator (metehan.ai/ai-share-url-creator.html) is a point-and-click web tool that generates poisoned share URLs to manipulate AI agent memory context. No code required.

Microsoft’s Defender Security Research Team found 50 distinct AI Recommendation Poisoning attempts from 31 companies across 14+ industries over 60 days — including from a security vendor. The conclusion: “The barrier to AI Recommendation Poisoning is now as low as installing a plugin.”

Both exploit MITRE AML.T0080 — the same technique the Microsoft Defender KQL queries detect.

Which OWASP and MITRE items does each tool address?

The Type column shows where in the development lifecycle the tool applies; OWASP Items maps to compliance requirements; MITRE ATLAS connects to threat intelligence feeds. “Coverage” means the tool can detect or test for the attack technique classified under that item.

Cisco Skill Scanner — Build-time (static + LLM) | LLM05, LLM06 | AML.T0051 | github.com/cisco-ai-defense/skill-scanner

agent-audit — Build-time (SAST) | ASI-01, ASI-02, ASI-04, ASI-06 (all 10 Agentic categories) | AML.T0051 | github.com/HeadyZhang/agent-audit

Giskard — Build-time (adversarial) | LLM01, LLM02, LLM05, LLM06 | AML.T0051 | giskard.ai

TrojAI Detect — Build-time (commercial) | LLM01 | AML.T0051 | troj.ai/products/detect

TrojAI Defend — Runtime (commercial) | LLM01 | AML.T0080 | troj.ai/products/defend

PromptArmor / aitextrisk.com — Offensive testing | LLM01 | AML.T0051 | aitextrisk.com

Microsoft Defender Advanced Hunting — Retrospective detection | — | AML.T0080 | Microsoft Defender XDR

CiteMET — Offensive (attacker tool) | — | AML.T0080 | npmjs.com/package/citemet

AI Share URL Creator — Offensive (attacker tool) | — | AML.T0080 | metehan.ai

Coverage gaps: No open-source runtime tool covers what TrojAI Defend and Zenity provide in production. OWASP Agentic Top 10 items ASI-07 (Inter-Agent Communication), ASI-08 (Cascading Failures), and ASI-10 (Rogue Agents) have no dedicated red-teaming tools yet — these are the next gaps to watch.

Full framework documentation: MITRE ATLAS at atlas.mitre.org; OWASP Agentic Top 10 at genai.owasp.org; OWASP LLM Top 10 at owasp.org/www-project-top-10-for-large-language-model-applications.

For governance controls these tools implement and framework coverage for compliance, see the linked articles.

Frequently asked questions

What is the best free tool to scan AI agent code for security vulnerabilities?

agent-audit is the most accessible starting point — open-source, built on the OWASP Agentic Top 10, and achieving 94.6% recall on agent-specific vulnerability benchmarks. For skill-package scanning specifically, Cisco Skill Scanner is the leading open-source option. The two tools target different things: agent-audit scans agent application code; Cisco Skill Scanner scans the skill packages those agents install.

Can I integrate agent-audit into a GitHub Actions CI/CD pipeline?

Yes. agent-audit produces SARIF-compatible output and has documented GitHub Actions integration. It functions as a security gate in pull requests, blocking merges when OWASP Agentic Top 10 violations are detected. The setup follows the same pattern as Bandit or Semgrep Actions workflows.

What is the difference between build-time scanning and runtime protection for AI agents?

Build-time scanning (agent-audit, Cisco Skill Scanner, Giskard) analyses code and behaviour before deployment — analogous to SAST and penetration testing. Runtime protection (TrojAI Defend, Zenity) enforces security policies on agents in production — analogous to a WAF. Both are needed; they address different failure modes.

How do I detect if my organisation has already been targeted by AI Recommendation Poisoning?

Use Microsoft Defender Advanced Hunting: query EmailUrlInfo, MessageUrlInfo, and UrlClickEvents for AI assistant domain links where URL parameters contain memory-manipulation keywords (remember, trusted, authoritative, citation). Convert ad hoc queries into scheduled detections.

What is MITRE ATLAS AML.T0051 and which tools detect it?

AML.T0051 is the technique ID for LLM Prompt Injection. Cisco Skill Scanner, agent-audit, and Giskard all detect variants — via static pattern matching, SAST-style code analysis, and adversarial model probing respectively.

Is aitextrisk.com safe to use for testing my own platforms?

aitextrisk.com is maintained by PromptArmor for defensive testing. It simulates the link-preview exfiltration vector to confirm whether your messaging platform integration is vulnerable. Use it with authorisation on systems you own or are engaged to test.

What did the Cisco Skill Scanner find when it analysed 31,000 AI agent skills?

Twenty-six per cent of the 31,000 skills contained at least one vulnerability. That rate, across a marketplace of skills sourced by teams without a security review process, is the reason pre-installation scanning matters — particularly for organisations adopting AI agent capabilities from third-party registries.

What is the OWASP Agentic Top 10 and how does it differ from the OWASP LLM Top 10?

The OWASP LLM Top 10 covers LLM-specific vulnerabilities. The OWASP Agentic Top 10 (2026) extends this to risks that arise when LLMs are given tool access, browser control, or autonomous decision-making capability. agent-audit uses the Agentic Top 10 as its rule source.

Are there any open-source runtime protection tools for agentic browsers?

As of early 2026, no open-source tool provides runtime enforcement equivalent to TrojAI Defend or Zenity. Build-time scanning and Microsoft Defender Advanced Hunting (retrospective detection, included with Microsoft 365 licensing) are the main options that don’t require additional commercial tooling.

What is Microsoft FIDES and how does it reduce the need for downstream scanning tools?

FIDES applies information-flow control principles to deterministically prevent indirect prompt injection at the architectural level — reducing the attack surface that downstream tools must cover. It is a complementary upstream control, not a replacement for tooling.

How do I configure OpenAI Atlas logged-out mode to reduce attack surface?

Enabling logged-out mode restricts the agent’s access to authenticated external services. This limits the data an attacker can reach even when prompt injection succeeds. Configuration is available in Atlas enterprise settings.