Your cyber insurance almost certainly does not cover deepfake fraud losses. There is a clause called the voluntary parting exclusion buried in your policy — and you have probably never heard of it until you are filing a claim you are about to lose.
Arup lost $25 million when an employee joined a deepfake video call with an AI-generated CFO. Standard Social Engineering Fraud Endorsement sublimits cap at $100,000–$250,000. That is a 100:1 gap between what your insurer will pay and what a real attack can cost.
This article explains why standard policies fail, where the Coalition Deepfake Response Endorsement fits (and where it does not), and exactly what language to demand from your broker. This article is part of our comprehensive guide on why deepfake fraud defences keep falling behind policy — start there for the full picture, then come back here for the insurance procurement decision.
Standard cyber insurance is built for data breaches, ransomware, and system outages — technical events where something in your infrastructure was compromised.
Deepfake-induced wire transfers are a different animal entirely. When an employee is deceived by an AI-generated video call into authorising a $25 million transfer, no system was compromised. The employee made a decision — a voluntary one, in the insurer’s view — based on fabricated authenticity.
The PLUS Guide (Kennedys Law, February 2026) puts it plainly: deepfake fraud is a crime and fraud event, not a cyber event. Insurers slot these losses into crime and fidelity frameworks. Traditional controls — firewalls, endpoint detection, encryption — do nothing to stop a convincing deepfake call. Swiss Re‘s SONAR 2025 report said it clearly: “deepfakes may increasingly be used in sophisticated cyberattacks and drive cyber insurance losses.” When the reinsurance market issues systemic risk warnings, underwriters tighten exclusions. Not loosen them.
The voluntary parting exclusion is the clause your insurer will use to deny the claim.
Plain-language version: coverage does not apply when your company voluntarily transferred funds — even when that transfer was induced by fraud.
In practice: your finance manager receives a video call from what appears to be your CEO — actually an AI-generated synthetic. They follow procedure, believe the request is legitimate, and complete the transfer. The exclusion applies because your employee clicked the button. The sophistication of the deception is legally irrelevant under standard policy language.
Jones Walker LLP‘s January 2026 analysis spells it out: the exclusion applies because the policyholder’s agent “willingly parted with” the funds. Think of it like a software licence that excludes liability for user error — except the insurer defines “user error” to include being deceived by a perfect deepfake, because the user still initiated the transaction. The deception is your problem. Unless you have the right coverage in place.
In December 2025, Coalition launched the Coalition Deepfake Response Endorsement — widely described as the first purpose-built deepfake insurance product, available across eight markets including Australia, the US, the UK, and Canada.
It covers forensic analysis, legal takedown support, and crisis communications for reputational harm from synthetic media. That last part — reputational harm — is the key phrase. It does not cover wire transfer losses. If your finance manager was deceived into wiring $25 million, the Coalition endorsement pays out nothing.
Coalition’s Head of Cyber Portfolio Underwriting confirmed that “deepfake-enabled fraud leading to fraudulent transfers” was already covered through existing social engineering fraud coverage. The December endorsement expanded into a different risk category — reputational harm — not the wire fraud scenario most finance teams are worried about. Understanding the mechanics of how deepfake fraud works helps clarify why these distinctions matter so much for coverage decisions.
Worth having. Just not the solution to the voluntary parting exclusion problem.
The sublimit is the maximum your insurer will pay on a social engineering fraud claim. Per IRMI, the standard range is $100,000 to $250,000 — roughly one percent of the Arup loss. To understand the full scale of what organisations actually lose, see the Arup $25M loss and MSUFCU avoided exposure figures — the gap between standard sublimits and real-world losses is the core procurement problem. Orion, a Luxembourg-based supplier, disclosed approximately $60 million in losses from a social engineering wire fraud attack. Deloitte projects generative AI-related fraud losses in the US reaching $40 billion by 2027.
The sublimit is the highest-priority procurement action. The best endorsement language delivers nothing if a $5 million wire fraud loss is covered to $250,000.
How to size it:
Insurers are not enthusiastic about raising social engineering fraud sublimits right now. But it is the only number that determines whether your coverage is real or performative.
Six provisions. Why each matters, and what happens without it.
1. Social Engineering Fraud Endorsement with explicit deepfake and synthetic media language. The endorsement must reference AI-generated audio, video, and text-based deception — not just “impersonation” or “pretexting.” Without synthetic media specificity, an insurer can argue a deepfake event is not covered under existing definitions.
2. Explicit exception to the voluntary parting exclusion for deepfake-induced payments. Negotiate explicit language stating the exclusion does not apply to payments induced by deepfake impersonation. Without this, the endorsement may exist and the claim will still fail.
3. Sublimit adequate to maximum single-transaction exposure. Not the IRMI default. Negotiate based on your largest plausible single transaction.
4. Removal of verification clause, or a deepfake-specific carve-out. Ask your broker directly: does this policy contain a verification clause? Jack Keilty at New Dawn Risk advises clients to “steer clear of policies bearing this wording.” If it cannot be removed, negotiate carve-out language for scenarios where verification procedures were followed but defeated by AI-quality impersonation. The clause that was reasonable in 2019 can void legitimate coverage in 2026.
5. Coverage triggers that include both “social engineering fraud” and “funds transfer fraud” terminology. Some policies cover one, not both. This gap matters especially for FinTech companies where both attack vectors are plausible.
6. Crime policy placement, not cyber-only. If the endorsement is attached only to a cyber policy, the underwriting classification mismatch puts the claim at risk — even if the endorsement language appears to cover it.
Get specific policy language in writing before signing. Verbal broker assurances are not coverage.
Getting the right endorsement and sublimit in place is the procurement side. Making sure a claim holds up is the other — and that depends on documentation.
The PLUS Guide (Kennedys Law, February 2026) notes that “post-incident scrutiny focuses on what procedures were in place to prevent an incident from occurring.” Underwriters assess your governance, not just the event.
Three control categories that improve claims defensibility:
Deepfake detection tools, deployed and documented. Log evidence of detection capability matters. An organisation that deployed tooling — even if it did not catch a sophisticated attack — is in a stronger position than one with nothing in place. For the full comparison of what documented detection and provenance controls you should have in place, see our architectural guide — the controls you choose directly affect the strength of your coverage position.
Incident response plan covering deepfake-specific scenarios. A generic IRP does not demonstrate preparedness for synthetic media attacks. Address deepfake impersonation specifically: who authorises unusual transfer requests, what independent verification is required, when to escalate before funds move.
C2PA implementation across organisational media workflows. Jones Walker links C2PA (Coalition for Content Provenance and Authenticity) implementation to legal reasonableness benchmarks — and notes that organisations failing to implement available authentication technologies are “increasingly vulnerable to negligence claims” as industry standards emerge.
Documented controls do not guarantee a payout. They shift the burden. For whether regulatory compliance creates any audit trail that aids coverage claims — including EU AI Act audit trails — see our regulatory analysis. Compliance documentation and coverage defensibility are directly connected.
Will my insurer pay out if an employee was tricked by a deepfake CFO video into wiring money? Under standard policies, probably not. The voluntary parting exclusion applies because the employee authorised the transfer voluntarily, even though they were deceived. You need a Social Engineering Fraud Endorsement with explicit deepfake language and an exception to the voluntary parting exclusion.
What is the voluntary parting exclusion in insurance? A policy clause that denies coverage when the insured voluntarily transferred funds, even if the transfer was induced by fraud. In deepfake scenarios, the employee is considered to have acted voluntarily regardless of the AI-generated deception that prompted it.
How much deepfake fraud coverage should my company have? Standard sublimits of $100,000–$250,000 are inadequate. SMBs with $10M–$50M revenue should target at least $500,000–$1M. Companies with high-value wire transfer activity should target $1M–$5M minimum, benchmarked against maximum single-transaction exposure.
What is a Social Engineering Fraud Endorsement? An optional rider to a crime or cyber policy that extends coverage to losses from deception-based fraud, including deepfake impersonation attacks that cause employees to authorise fraudulent wire transfers. It is the primary coverage instrument for deepfake wire fraud.
What is a verification clause in an insurance policy? A provision that may deny a social engineering fraud claim if you failed to verify the requestor’s identity through an independent channel before authorising a transfer. Some endorsements include this clause, potentially voiding coverage even when the Social Engineering Fraud Endorsement exists.
Does implementing C2PA help with insurance claims? Jones Walker identifies C2PA implementation as a legal reasonableness benchmark. Organisations that demonstrate they implemented content provenance standards strengthen their argument that losses occurred despite reasonable precautions.
Is the Coalition Deepfake Response Endorsement the same as social engineering fraud coverage? No. The Coalition endorsement covers reputational harm — forensic analysis, legal takedown, crisis communications. For wire transfer fraud caused by deepfake impersonation, you need the Social Engineering Fraud Endorsement, not the Coalition product.
Deepfake Detection vs Content Provenance — Choosing the Right Defence ArchitectureDeepfake fraud capability is advancing faster than the defences designed to stop it. That’s not a temporary lag you can wait out — it’s a structural problem baked into how reactive detection works. The standard response is to buy a detection tool. That response is losing.
Lab accuracy for deepfake detection sits at around 96%. In production, that number collapses to 50–65%. The gap isn’t a vendor problem you can shop your way out of. It’s built into the architecture itself.
Three defensively distinct approaches exist: reactive detection, proactive content provenance, and proof-of-humanness verification. They’re not points on a spectrum. They’re different architectural choices with different trade-offs, different attack surfaces, and different implementation requirements.
This article puts all three in a single framework — with concrete vendor examples, production evidence, implementation requirements, and a decision matrix. By the end, you’ll know which one to tackle first based on your threat exposure, your existing architecture, and your compliance obligations.
For context on the gap between deepfake fraud capability and institutional defence, read the pillar article first.
Detection fails in production because attackers optimise their generators against known detection classifiers before they deploy. The 96% lab accuracy figure is measured against yesterday’s generation techniques. By the time a detection model ships, the generators it was trained on have already moved on.
State-of-the-art systems drop 45–50% in performance against deepfakes actually circulating online. Under targeted adversarial attack — where attackers test their content against known detection tools — accuracy can fall below 1% of its original baseline. BRSide puts it plainly: detection alone cannot be your primary defence strategy.
DeepStrike frames the structural dynamic as an asymmetric arms race where the defence is constantly playing catch up. An expert survey rated detection tools’ effectiveness at 3.4 out of 7 — the lowest of all mitigation strategies tested. Generative capability simply advances faster than detection methods can keep pace with.
There’s a compounding problem too: the liar’s dividend. When synthetic media becomes pervasive, even genuine evidence can be dismissed as fake. So the erosion of trust runs in both directions.
If detection is structurally inadequate as a sole architecture, the question becomes which paradigm you use instead — and when you layer detection back in as a risk-scoring complement, not a primary control. This structural problem is part of a larger pattern explored in our series on why defences keep falling behind.
Each of the three paradigms addresses a different layer of the problem. They’re not interchangeable, and mixing them is valid — but each requires distinct infrastructure investment.
Reactive Detection analyses content for synthetic artefacts after it’s presented. It’s the current industry default and the paradigm most continuously arms-raced. Examples include Pindrop (passive voice scoring) and Modulate‘s Velma (real-time voice fraud detection). Detection makes sense for inbound content you don’t control, where provenance metadata isn’t available.
Proactive Provenance embeds authentication at creation time. Rather than asking “is this real?” at receipt, it creates a cryptographic chain that answers “where did this come from and what has happened to it?” C2PA uses cryptographic metadata signing. SynthID (Google DeepMind) uses pixel-level watermarks embedded at generation time. Two different implementations, same paradigm.
Proof-of-Humanness bypasses the fake/real binary entirely. Instead of asking whether content is authentic, it asks whether a real person with a specific physical device is behind the interaction. FIDO2/passkeys use hardware-backed cryptographic device binding. Tools for Humanity uses decentralised iris-based uniqueness verification.
The attack surface distinction is the thing to hold onto:
Each one warrants a closer look — starting with reactive detection, which remains the industry default despite its structural limitations.
Deepfake attacks bypassing KYC liveness checks increased 704% in 2023. Active liveness gates — blink detection, skin texture analysis, micro-movement tracking — provide the illusion of a security check that advanced generators have already learned to defeat.
Passive scoring works differently. The caller doesn’t know which signals are being measured, so they can’t optimise against them. Pindrop analyses hundreds of audio characteristics in the background of a call and produces a continuous risk score rather than a binary real/fake determination.
The MSUFCU deployment gives you real-world numbers to work with. An $8.3B credit union deployed Pindrop across its contact centre. Results after one year: $2.57M in prevented fraud exposure, a 10-point NPS improvement, and 58 seconds saved per call. One in every 106 calls was identified as synthetic. For a full breakdown of the MSUFCU passive scoring deployment and the financial case it makes for this approach, see the companion article on deepfake fraud costs.
Modulate’s Velma takes an adaptive AI approach for real-time voice fraud detection. Their enterprise survey found 91% of respondents plan to increase voice fraud spending in the next 12 months — which tells you something about where the industry thinks this is heading.
Reactive detection makes sense when you face inbound content you don’t control — call centres, user-submitted media — and when provenance metadata isn’t available. Treat it as a risk-scoring complement to a multi-paradigm architecture. Not a primary control on its own.
C2PA (Coalition for Content Provenance and Authenticity) is a cryptographic standard that creates a verifiable chain of custody for digital content from creation through distribution. At every step — creation, editing, transmission, publication — machine-readable metadata is signed and attached. Anyone can verify the chain at any point.
The standard is backed by Adobe, Microsoft, Google, and OpenAI. ISO standardisation is advancing. The Content Authenticity Initiative (CAI) drives C2PA implementation with 6,000+ members and runs the Conformance Programme — your procurement evaluation tool for identifying compliant products.
Jones Walker identifies C2PA as the emerging legal reasonableness benchmark — a status explained in more detail in our article on the regulatory landscape. Organisations that fail to implement available authentication technologies face negligence exposure, particularly where industry standards have emerged and peers have adopted them. In other words: if your competitors are doing it and you’re not, that’s a problem.
Add to that EU AI Act Article 50, which takes effect on August 2, 2026. AI-generated content must be marked in a machine-readable format and detectable as artificially generated. Penalties run up to €15 million or 3% of global turnover. If you generate or distribute AI content and you have EU customers, C2PA compliance isn’t optional. It’s the mechanism for demonstrating you’ve met a recognised standard.
Production adoption confirms the standard is leaving early-adopter territory. Google Pixel 10 launched with C2PA Content Credentials built in. Sony’s PXW-Z300 ships with Content Credentials at capture. The CAI director noted 2025 as the turning point: “Content Credentials are no longer theoretical.”
SynthID (Google DeepMind) is a complementary but distinct approach. Where C2PA signs content at creation and tracks it through distribution, SynthID embeds pixel-level watermarks at generation time — provenance-by-generation rather than provenance-by-signing. SynthID has watermarked over 10 billion pieces of content. They do different things; they work together.
Implementation requires C2PA SDK integration, Conformance Programme registration, surfacing credentials to end users, and maintaining chain of custody through content distribution. CAI’s developer education is at learn.contentauthenticity.org.
The detection paradigm asks: “Is this content real?” Proof-of-humanness asks: “Is there a real person with a specific physical device behind this interaction?” The first question gets harder to answer as generation quality improves. The second is answered by cryptographic proof.
Adrian Ludwig at Tools for Humanity frames it directly: if detecting the fake is failing, the smarter approach is proving the real.
Two implementation strands exist.
FIDO2/passkeys use hardware-backed cryptographic keys bound to a physical device. A passkey generates a cryptographic signature using a private key that physically cannot leave the device — it lives inside a secure hardware element and cannot be exported, copied, or transmitted. A deepfake can impersonate someone visually. It cannot produce the cryptographic key stored on their physical device. There’s no password to phish. No OTP to intercept. No biometric data travelling over a network to deepfake. The constraint isn’t computational. It’s physical.
Tools for Humanity / World ID takes the decentralised identity route. It verifies a real, unique person is behind an interaction using iris-based biometric uniqueness — without storing biometric data centrally. It applies anywhere a human needs to prove presence without relying on document checks that deepfakes can defeat.
The distinction from KYC is architectural. Traditional KYC relies on document checks, live selfie matching, and video verification calls — all of which deepfakes can defeat. It also accumulates biometric data that can be leaked or stolen, which directly accelerates the ability to impersonate real people. Proof-of-humanness avoids both problems.
Vendor due diligence is not optional. Jones Walker and the EU AI Act create enforceable liability for organisations that fail to audit their AI tool vendors. There are five contractual provisions you need.
1. Prohibited-use lists. Explicit contractual restrictions on synthetic content generation for fraud, impersonation, or deception. Not a general-purpose acceptable use policy — specific enumerated prohibited use cases with enforcement mechanisms.
2. Watermarking commitments. The vendor must embed provenance metadata — C2PA or equivalent — in all AI-generated content. This is your primary evidence the vendor takes synthetic content governance seriously. No watermarking commitment means they can’t be treated as a compliant supplier under EU AI Act Article 50.
3. Audit rights. Contractual right to audit the vendor’s synthetic content generation capabilities and usage logs. Without this, the prohibited-use list is unenforceable.
4. Takedown cooperation. The vendor must cooperate with removal of fraudulent content within defined SLAs. The TAKE IT DOWN Act (signed May 2025) requires covered platforms to remove non-consensual intimate deepfakes within 48 hours. Your vendor contracts need to align with that.
5. Indemnities for misuse. Vendor liability for damages caused by synthetic content generated through their tools when safeguards fail. This matters most when you’re facing regulatory penalties or civil claims.
If your procurement leverage is limited, watermarking commitments and prohibited-use lists carry the most weight with the least effort. The remaining three require active vendor cooperation to enforce. Understanding how documented detection controls affect insurance coverage is also worth reviewing — documented vendor due diligence strengthens your position when claiming on a deepfake fraud endorsement.
Three questions determine your starting point.
Question 1: Is your primary threat inbound or outbound? Inbound means content you receive and must evaluate — call centre calls, user-submitted media. Outbound means content you generate and distribute — marketing, product content. Inbound exposure points toward detection first. Outbound exposure points toward provenance.
Question 2: Do you have an existing authentication layer to harden? If you’re using passwords, SMS OTP, or video verification for any high-value transactions, you have an immediate attack surface. Replacing those with FIDO2/passkeys is the highest-impact near-term change regardless of your other threat exposure.
Question 3: What is your compliance exposure? If you generate or distribute AI content and have EU customers, you have a hard deadline: August 2, 2026. C2PA provenance implementation needs to be on your roadmap now.
Content platforms and media companies. Start with provenance (C2PA). You generate and distribute content; the legal reasonableness benchmark applies directly, and EU AI Act Article 50 creates a compliance deadline you can’t ignore.
Financial services and call centres. Start with detection (passive scoring). Your primary threat is inbound — callers and transactions you don’t control. The MSUFCU deployment validates the production approach at financial-services scale. Complement it with passkeys for high-value transactions.
SaaS with user accounts and identity-sensitive products. Start with proof-of-humanness (passkeys/FIDO2). The authentication layer is your primary attack surface. Passkeys eliminate the shared-secret vulnerability that deepfake fraud exploits.
AI tool vendors and content generation platforms. You need all three. C2PA for provenance output, vendor due diligence compliance for your supply chain, and detection for abuse monitoring.
First: Passkeys/FIDO2. Lowest cost, fastest deployment, highest immediate risk reduction. Start with high-value transaction flows.
Second: C2PA provenance. EU AI Act Article 50 creates urgency for content platforms and AI tool vendors. Higher implementation effort, but the compliance deadline makes it non-negotiable for relevant organisations.
Third: Detection. Layer passive scoring for residual inbound risk. Detection tools carry ongoing licensing costs and require continuous model updates — they’re not a set-and-forget solution.
No single paradigm provides complete protection. The architecture should layer all three over time — sequenced by risk exposure, not by what vendors are pushing hardest this quarter.
C2PA embeds verifiable provenance metadata into digital content at creation time. Jones Walker identifies it as the emerging legal reasonableness benchmark — failure to implement available provenance standards may create negligence exposure. It’s backed by Adobe, Microsoft, Google, and OpenAI. EU AI Act Article 50 creates a hard compliance deadline of August 2, 2026.
Detection accuracy drops from 96% in lab conditions to 50–65% in real-world deployment. Passive scoring (Pindrop, Modulate) performs better than binary gates because signals are measured without the user’s knowledge. Expert survey data rates detection at 3.4 out of 7 — viable as a risk-scoring layer, not a sole defence.
Proof-of-humanness verifies a real person controls a real device using cryptographic methods (FIDO2 passkeys) or decentralised identity (World ID). Unlike KYC — which relies on document checks and video calls that deepfakes can defeat — proof-of-humanness bypasses the spoofable biometric layer. It proves device possession, not visual identity.
Passkeys use hardware-backed cryptographic keys bound to a physical device. They eliminate the shared-secret layer (passwords, SMS OTP, video verification) that deepfake fraud exploits. A deepfake can impersonate someone visually but cannot produce the cryptographic key stored on their device. The constraint is physical, not computational.
Passive voice scoring (Pindrop) analyses hundreds of audio signals in the background without alerting the caller, producing a risk score rather than a binary determination. Combine it with out-of-band verification for high-value transactions — no tool provides 100% certainty.
The liar’s dividend is the state where synthetic media is so prevalent that genuine evidence can be dismissed as fake. Provenance becomes relevant not just for fraud detection but for preserving the evidentiary value of authentic content — the second strategic argument for implementing C2PA.
SynthID embeds pixel-level watermarks at generation time — provenance-by-generation. C2PA creates a cryptographic chain recording creation, editing, and distribution — provenance-by-signing. SynthID works on content from specific AI models; C2PA works on any signed content. They’re complementary.
Five provisions: (1) prohibited-use lists for fraud and impersonation, (2) watermarking commitments embedding provenance metadata in all AI-generated output, (3) audit rights over generation capabilities and usage logs, (4) takedown cooperation aligned to TAKE IT DOWN Act timelines, (5) indemnities for damages when safeguards fail. If leverage is limited, watermarking and prohibited-use lists carry the most weight.
The Content Authenticity Initiative (CAI) is the Adobe-led coalition with 6,000+ members driving C2PA adoption. C2PA is the technical standard; CAI publishes implementation guidance, runs the Conformance Programme, and maintains the Conformance Explorer. Start at learn.contentauthenticity.org.
No single paradigm provides complete protection. Start with the paradigm that addresses your most exposed attack surface, then layer the others over time. For most companies: passkeys/FIDO2 first, then C2PA provenance, then passive detection for residual inbound risk.
Open-source models achieve 61–69% accuracy on real-world datasets. Commercial tools achieve 82–98% — a 30–37% performance gap. Commercial tools invest in continuous retraining; open-source models lag the adversarial arms race. For production deployment, commercial passive scoring tools are more defensible.
Why Deepfake Laws Are Multiplying While the Fraud Keeps Getting WorseSince 2022, US state legislatures alone have enacted 169 deepfake laws across 46 states. The EU AI Act has an enforcement deadline of August 2026. The UK government launched what it called a “world-first” deepfake detection framework in February 2026.
And yet deepfake-enabled fraud losses are going up, not down.
Here is the situation: compliance is necessary — the penalties are real and the deadlines are coming — but compliance alone will not stop active fraud against your organisation. Regulation always addresses yesterday’s threat while attackers have already moved on. That gap has a name: policy response lag.
This article breaks down what each major framework actually requires, why cybersecurity experts say the frameworks will not work as fraud prevention, and how to build a compliance matrix for a multi-jurisdiction SaaS or FinTech product. If you want the broader picture on how deepfake fraud is outpacing policy response, the pillar guide covers that in full.
Jones Walker‘s January 2026 analysis documents 169 deepfake laws enacted across 46 US states since 2022, with 146 bills introduced in 2025 alone.
The problem is what those laws actually target. The overwhelming majority address non-consensual intimate imagery (NCII), election manipulation, and right of publicity. Texas §255.004 criminalises creating deepfake videos within 30 days of elections. Minnesota §609.771 escalates penalties for repeat electoral interference offences. Virginia §18.2-386.2 covers deepfake pornography. Tennessee’s ELVIS Act was the first law protecting voice as a right of publicity in the AI context.
Notice what is missing: enterprise fraud. CFO impersonation. Synthetic job candidates passing live video interviews. Wire transfer fraud. None of these are the primary target of any major deepfake law. Most legislation treats deepfakes as a content-moderation problem rather than organised criminal infrastructure. A content-removal framework cannot solve a fraud problem.
This creates a real paradox: more laws produce more compliance obligations without actually reducing the attack surface. A company that fully complied with every applicable state law in 2025 would still have been exposed to the Arup incident — the $25 million wire transfer fraud where an employee authorised 15 transactions to a deepfaked CFO on a video call. Nothing in the existing regulatory stack addresses that scenario. To understand what deepfake incidents actually cost organisations and why compliance investment needs to be benchmarked against real financial exposure, see the financial case analysis.
On February 5, 2026, Home Secretary Liz Kendall and Safeguarding Minister Jess Phillips announced a “world-first” deepfake detection framework.
Here is what it actually does: establishes an evaluation methodology for deepfake detection technologies — the Deepfake Detection Challenge, with more than 350 participants including INTERPOL, Five Eyes members, Microsoft, and academic institutions. The result is industry benchmarks for assessing detection tools against real-world threats.
Here is what it does not do: impose compliance obligations on businesses. There is no requirement to adopt specific tools, no penalties for failing to detect deepfakes, and no coverage of the generation side. It is useful for vendor assessment. It is not a regulatory mandate.
The UK’s criminal enforcement layer is separate and already in force — legislation making it illegal to create deepfake intimate images of adults without consent. That criminal provision and the detection framework operate independently.
Dr. Ilia Kolochenko, CEO of ImmuniWeb — a Swiss cybersecurity firm specialising in AI-driven security testing — was blunt in his assessment to The Register. The plan “will quite unlikely make any systemic improvements in the near future.”
The structural problem is this: detection technologies are evaluated against a fixed snapshot of generation capability. But generation capability evolves continuously. By the time a benchmark is validated and adopted, the generation methods it evaluates have already been superseded. This is speed asymmetry — not a one-time gap but a compounding feature of generative AI development.
There is also the metadata stripping problem. Legitimate AI generators may comply with C2PA watermarking standards — backed by Adobe, Microsoft, Google, and OpenAI. But bad actors strip those markers before deploying deepfakes. The EU Code of Practice acknowledges this openly: its labelling framework “applies only to lawful deepfakes.” A criminal generating a deepfake for a wire transfer scheme is entirely outside the labelling regime.
Kolochenko’s conclusion is worth quoting directly: “We need a systemic and global amendment of legislation — not just legally unenforceable code of conduct or best practices.” Compliance frameworks establish accountability norms. They do not stop active fraud.
EU AI Act Article 50 is the broadest synthetic media disclosure framework currently in force. Enforcement date: August 2, 2026. If your product generates or manipulates synthetic media, this is the deadline that matters.
What it requires: providers of AI systems that generate or manipulate synthetic media must ensure content is “marked in a machine-readable format and detectable as artificially generated.” Deployers must disclose synthetic content to users no later than at first interaction.
The EU Code of Practice — with its final version anticipated in June 2026 — mandates a multilayered approach: visible disclosures combined with machine-readable markers, including metadata, watermarking, and content provenance signals. Interim visible markers are two-letter acronyms: “AI”, “KI”, “IA” depending on language.
On penalties: transparency violations under Article 50 carry a ceiling of €7.5 million or 1.5% of global annual turnover. High-risk system non-compliance rises to €15 million or 3% of global turnover. Article 72 extends penalty exposure to governance record-keeping, not just implementation failures.
The extraterritorial reach is the critical point for global SaaS operators. Article 50 applies to any company whose AI output reaches EU users, regardless of where the company is headquartered. Jakarta, Sydney, Singapore — if you serve EU customers, you are in scope. The Digital Services Act adds complementary obligations for platforms hosting user-generated content.
And here is the blind spot: Article 50 is a disclosure obligation for legitimate providers. A criminal impersonating your CFO on a video call is not going to watermark the deepfake. Compliance avoids penalties. It does not protect you from being defrauded.
The TAKE IT DOWN Act was signed by President Trump on May 19, 2025. It was bipartisan legislation championed by Senators Ted Cruz and Amy Klobuchar — and it does almost nothing to address enterprise deepfake fraud.
Its scope is NCII. Covered platforms must establish a takedown process within one year and remove flagged content within 48 hours. The FTC enforces it with civil penalties up to $53,088 per violation. CFO impersonation, synthetic job candidates, wire transfer fraud — none of it is in scope. When enterprise deepfake fraud occurs, prosecutors fall back on wire fraud, identity theft, and extortion statutes.
The compliance patchwork remains. There is no federal preemption. Jones Walker documents 169 laws across 46 states, each with different categories and penalty structures. A SaaS platform with users in multiple states needs to work out which state laws apply. US state laws and EU AI Act requirements are not alternatives — both apply simultaneously depending on where your users are.
One more thing worth flagging: even full regulatory compliance does not guarantee insurance coverage when deepfake fraud occurs. Standard crime policies typically include a “voluntary parting” exclusion — when a deceived employee authorises a fraudulent transaction, coverage often does not apply. Understanding how regulatory audit trails affect insurance coverage claims is critical — compliance is not a substitute for understanding your insurance exposure.
The organising principle here is straightforward: compliance obligations are determined by where your users are located, not where you are headquartered.
EU AI Act: The broadest compliance trigger. Any AI system whose output reaches EU users triggers Article 50 regardless of company domicile. EU customers, AI-generated media — you are in scope for August 2026.
United Kingdom: Two distinct instruments. The detection framework is not a compliance mandate. UK criminal law applies to fraud targeting UK-based entities and is separate from EU AI Act obligations.
United States: No single federal standard for enterprise deepfake fraud. State laws apply based on where affected individuals or operations are located. The TAKE IT DOWN Act applies to covered platforms hosting NCII.
Southeast Asia: No country in the region currently has comprehensive deepfake-specific legislation. Indonesia’s PDP Law and Singapore’s PDPA provide partial data protection coverage but were not designed for synthetic media. Compliance obligations flow from EU and UK requirements — not local law.
A single global policy built on the least restrictive regime will fail. Domestic regulatory silence does not provide protection — a company in Jakarta serving EU customers is subject to Article 50 from August 2026 regardless of what Indonesian law requires. For the full scope of the deepfake threat that drives these compliance obligations, the pillar overview covers the complete landscape.
Map applicable laws to specific business functions rather than attempting a single global policy. That is Jones Walker’s methodology and it is the right approach.
Start with an AI use-case inventory. You cannot build a compliance matrix if you do not know what AI tools your organisation is using and what they generate. The NCUA AI Compliance Plan — developed for US credit unions — requires a centralised registry of all AI tools deployed and what they output. It is not FinTech-specific. It is a governance foundation that any organisation deploying AI should establish.
From that inventory, identify which operations involve synthetic media or AI-generated content: content generation, user verification, financial transactions, HR screening, customer communications. Each function is a potential compliance trigger wherever it reaches users.
The matrix maps each business function against the relevant jurisdiction, applicable law, specific requirement, effective date, penalty, and responsible owner. EU AI Act Article 50 requires labelling from August 2026. UK obligations flow from criminal law. US obligations vary by state. Southeast Asian obligations at present flow through from EU and UK requirements.
The matrix is a living document. The EU Code of Practice finalises in June 2026. US state legislatures introduced 146 bills in 2025 alone. Assign an owner and schedule quarterly reviews.
Building this matrix demonstrates governance maturity and satisfies regulatory obligations. But it does not stop a deepfake attack in progress. A fully compliant organisation can still be defrauded if it lacks independent verification procedures, live detection capability, and incident response protocols. Compliance is the floor. Defence requires more. See the guide on building a compliance matrix as part of a practical defence roadmap for a phased, actionable approach to implementing what the matrix documents.
Yes. The EU AI Act has extraterritorial reach — it applies to any company deploying AI systems whose output reaches EU users, regardless of where the company is headquartered. If your SaaS product serves EU customers and uses AI that generates or manipulates synthetic media, Article 50 applies to you. August 2, 2026 is your compliance deadline.
Transparency violations under Article 50 reach up to €7.5 million or 1.5% of global annual turnover. High-risk system non-compliance reaches €15 million or 3% of global turnover. Article 72 establishes documentation requirements as evidence of compliance — penalty exposure extends to governance record-keeping.
The TAKE IT DOWN Act (signed May 2025) targets non-consensual intimate imagery. Covered platforms must establish a takedown process within one year and remove flagged NCII content within 48 hours. FTC enforcement with civil penalties up to $53,088 per violation. Enterprise deepfake fraud is outside its scope entirely.
No. The UK Home Office framework is a detection evaluation methodology — it establishes benchmarks but does not impose compliance mandates. Creating deepfakes for fraud is separately criminalised under UK law. The framework and criminal provisions are distinct instruments.
As of early 2026, 46 US states have enacted deepfake legislation, with 169 total laws documented by Jones Walker since 2022. Different categories — elections, NCII, right of publicity, fraud — with no federal preemption. Companies in multiple states must comply with each applicable state law individually.
Deepfake technology is dual-use. The same capabilities used for fraud power legitimate applications in entertainment, accessibility, and education. A blanket ban would criminalise legitimate uses and would be unenforceable across borders. The LSE analysis identifies the core problem: treating deepfakes as content to be banned rather than criminal infrastructure to be disrupted. The governance approach is mismatched to the threat.
Detection tools are trained on known generation techniques, but generation methods evolve continuously. Each new model produces outputs existing detectors were not trained to recognise. Bad actors also strip C2PA markers before deploying deepfakes — rendering upstream compliance irrelevant at the point of attack. As ImmuniWeb’s Dr. Ilia Kolochenko says: “We need a systemic and global amendment of legislation — not just legally unenforceable code of conduct or best practices.” Speed asymmetry is structural.
A compliance matrix maps applicable deepfake laws to your business functions — content generation, identity verification, financial transactions, HR screening — across each jurisdiction where you serve customers. Start with an AI use-case inventory: a registry of all AI tools your organisation deploys and what they generate. Then map each use case against applicable laws, documenting what is required, when, and what the penalty is.
No. Compliance avoids penalties and demonstrates governance maturity, but it does not prevent active attacks. Standard crime policies typically include a “voluntary parting” exclusion that voids coverage when an employee authorises a fraudulent transaction, even under deepfake deception. Compliance is necessary but insufficient; pair it with technical detection, operational verification, and incident response capability.
Southeast Asian countries currently lack comprehensive deepfake-specific legislation. Indonesia’s PDP Law and Singapore’s PDPA provide partial data protection coverage but were not designed for synthetic media. Companies in Southeast Asia with EU or UK customers inherit compliance obligations from those jurisdictions — your compliance matrix must account for these flow-through requirements.
The NCUA AI Compliance Plan requires US credit unions to maintain a centralised AI use-case inventory — a registry of all AI tools deployed. While sector-specific, any company deploying AI can adapt the approach: you cannot build a compliance matrix if you do not know what AI tools your organisation uses. It is the governance foundation that compliance depends on.
What Deepfake Fraud Actually Costs and the Financial Case for Better DefencesDeepfake fraud is no longer a theoretical risk. It’s a line item on someone’s loss statement right now. Group-IB documented $347 million in verified losses in a single quarter. The FTC reported $12.5 billion in total US consumer fraud for 2024. Deloitte projects $40 billion in US generative AI-enabled fraud by 2027. These aren’t competing figures — they’re measuring different things — but together they describe a threat that is accelerating faster than most organisations’ defences.
Two cases tell you everything you need to know about the stakes. In January 2024, a finance employee at Arup‘s Hong Kong office watched what looked like a video conference with the company CFO and several colleagues, then authorised 15 wire transfers totalling $25.6 million. Every person on that call except the victim was a deepfake. The money has never been recovered. On the other side of the ledger: Michigan State University Federal Credit Union deployed Pindrop‘s voice fraud detection platform in August 2024 and documented $2.57 million in avoided fraud over 14 months — plus a 10% NPS improvement and 58 seconds saved on every authentication call.
The difference between those two outcomes is whether a working defence was in place. This article lays out the real financial data, explains why financial services companies cop the worst of it, and walks you through a per-incident cost model you can apply to your own organisation. For broader context, see this overview of how deepfake fraud operates and why the policy response has lagged.
The headline numbers are real, but they’re measuring different things. Worth unpacking before you use any of them in a board presentation.
Group-IB’s $347 million is verified quarterly losses from confirmed, investigated incidents — cloned executives, fake video calls, documented dollar amounts. It’s the most conservative figure because it only counts what was actually investigated and attributed to deepfake fraud.
The FTC’s $12.5 billion is total US consumer fraud across all types for 2024. Deepfakes are an accelerating slice of that, not the whole thing. Investment scams were the largest deepfake-specific category at $900 million, or 57% of all deepfake losses tracked by Surfshark.
Deloitte’s $40 billion is a forward projection for all US generative AI-enabled fraud growing from $12.3 billion in 2023 at roughly 32% compound annual growth. Use it as a planning anchor, not a current figure.
The acceleration is the sharpest data point of the lot. Surfshark tracked cumulative deepfake losses growing from $130 million across 2019 to 2023, to $400 million in 2024, to $1.56 billion in 2025. Losses tripled year-on-year. The WEF confirmed more than $200 million in Q1 2025 alone — more in a single quarter than the preceding four years combined.
For company-level planning, the most useful figure comes from DeepStrike: nearly $500,000 average per-incident cost in 2024. That’s the basis for the exposure modelling later in this article. And the tooling driving these attacks is cheap — voice cloning runs $0.01–$0.20 per minute and needs only three seconds of source audio to get started.
Three structural factors make financial services the preferred hunting ground.
KYC reliance. Know Your Customer workflows are built on the assumption that identity verification works — that a voice or a face can confirm who someone is. Deepfakes attack that assumption directly.
Call centre volume. Financial institutions handle millions of authentication calls every year. Pindrop documented a 1,300% year-on-year increase in deepfake calls targeting financial institutions in 2024 — with 1 in every 106 calls being machine-generated for Pindrop customers by late 2024.
Wire transfer workflows. A single authorised transaction can move millions of dollars. The Arup case demonstrates exactly what that looks like in practice — and Arup is an engineering consultancy, not a bank. Any company with wire transfer authority is exposed to the same risk.
Credit unions and community banks get specifically targeted because fraudsters know the technology gap exists. Frank McKenna of Point Predictive puts it plainly: “Fraudsters are targeting credit unions and smaller community banks because they know that they have not invested in the sophisticated technology that the bigger banks have.” As larger banks harden their defences, attackers move down-market.
The Modulate “State of Voice-Based Fraud 2026” survey found 91% of enterprises plan to increase voice fraud spending, but nearly half aren’t confident in their current detection capabilities. And 44% cite friction as the top consequence of adding security — so even organisations that do invest are under pressure to water it down.
The same tooling that targets a credit union can target your customer verification workflow. For context, see our analysis of the Deepfake-as-a-Service ecosystem.
In January 2024, a finance employee at Arup’s Hong Kong office received a spear-phishing email from a purported CFO, asking them to join a confidential video call. Every other participant on that call was a real-time deepfake built from publicly available footage — LinkedIn videos, conference recordings. Fifteen wire transfers totalling $25.6 million (HK$200 million) were authorised and executed in a single day. As of early 2025: no arrests, no recovered funds.
DeepStrike’s assessment cuts right to it: “The $25 million Arup fraud was not a failure of an employee’s detection skills; it was a failure of organisational process.” A mandatory out-of-band verification protocol — confirm all large transfer requests via a pre-registered phone number — would have stopped it regardless of how convincing the deepfakes were.
Arup is not a bank. It is a global engineering consultancy. Any company with wire transfer authority has the same exposure. That is the upper bound of what the cautionary scenario looks like.
For context on why standard cyber insurance often doesn’t cover these losses, see our analysis of the deepfake fraud coverage gap.
MSUFCU — Michigan State University Federal Credit Union, with $8.26 billion in assets and 367,000 members — deployed Pindrop’s Passport and Protect products in August 2024. The goal was straightforward: reduce fraud without adding friction, increase efficiency, improve member experience. Over 14 months, it delivered on all three.
$2.57 million in avoided fraud exposure — deepfake calls blocked before they reached agents. “Avoided fraud exposure” is the probability-weighted value of fraudulent transactions stopped before execution.
10% NPS improvement — Net Promoter Score moved from 57 to 63 immediately after go-live. Colleen Cole, VP of MSUFCU’s member service centre: “There was an immediate jump, and then it’s been maintained and sustained since then.”
58 seconds saved per authentication call — Pindrop’s passive scoring eliminates manual security questions at the start of each call. Less friction for legitimate members; more scrutiny for the suspicious ones.
The mechanism is important here. Pindrop analyses calls in real time between connection and agent pickup, without the caller doing anything extra. Security is invisible to legitimate members — which is why NPS went up, not down. For a deeper look at the passive call fraud scoring approach Pindrop uses and how it compares to other defensive architectures, see our analysis of detection versus content provenance strategies.
McKenna again: “$2.57 million on the bottom line is a significant amount for them. I would expect it to grow year over year because these AI attacks are going to be far more frequent.”
Start with the DeepStrike anchor: $500,000 average per-incident cost in 2024. For voice fraud specifically, Modulate puts the typical range at $5,000 to $25,000, with 20% of organisations reporting $25,000 to $100,000. The expected loss calculation is straightforward:
Compare those numbers against the annual cost of detection tooling and you’ve got your business case.
Here’s the counterintuitive part: the SMB exposure gap is larger than the enterprise gap. Enterprise companies have security operations centres and dedicated fraud teams. A 200-person company has a CFO, a finance team of two or three, and often a single person with wire transfer authority. CEO fraud now targets at least 400 companies per day. More than half of business leaders admit employees have received zero deepfake training.
For SMBs that can’t yet justify detection investment, process controls are the starting point — and they cost nothing.
Voice clone fraud is the higher-volume, higher-frequency risk. Pindrop’s 1,300% year-on-year increase in deepfake calls is the ongoing daily attack surface. Voice cloning requires three seconds of source audio and costs fractions of a cent per minute. It goes after high-volume routine processes: customer authentication, account verification, call-centre identity checks.
Human detection is structurally unreliable here. McKenna: “Overwhelmingly, 80% of the people identify the deepfake voice clone as my real voice. People cannot tell a deepfake apart.” Humans correctly identify high-quality deepfakes only 24.5% of the time in controlled studies. That’s not a training problem. That’s a structural limitation.
Video deepfake fraud is the lower-volume, higher-value risk. The Arup case required real-time deepfaking of multiple participants — rarer, but getting more accessible fast. Gartner predicts that by 2026, 30% of enterprises will no longer consider standalone identity verification reliable because of deepfakes.
Voice clone fraud needs AI-powered passive detection. Video deepfake fraud needs procedural controls that a convincing deepfake simply can’t satisfy. One is your daily exposure; the other is your catastrophic tail risk.
The MSUFCU deployment gives you a three-stream ROI framework that shifts the conversation from “security is important” to “here is the expected financial return.”
Stream 1: Direct fraud avoidance. Per-incident cost × annual incident probability × detection improvement rate. MSUFCU documented $2.57 million over 14 months. For an SMB, the equivalent is probably $25,000–$100,000 annually.
Stream 2: Operational efficiency. Pindrop saved 58 seconds per call. For a contact centre handling 100,000 calls per year at $0.50/minute agent cost, that’s approximately $48,000 in annual efficiency gains.
Stream 3: Customer experience uplift. MSUFCU’s NPS moved from 57 to 63. If your business has a measured relationship between NPS and churn, this stream becomes a real number you can put in a spreadsheet.
The cost-of-inaction argument is simple: $25.6 million lost with no defences versus $2.57 million avoided with detection deployed. Experian calls 2026 a “tipping point” — 72% of business leaders rank AI-enabled fraud as a top operational challenge. The investment window is before your first incident, not after. For a broader view of the regulatory response, see this overview of how deepfake fraud operates and the policy response lag.
DeepStrike documented a $500,000 average per-incident cost in 2024, rising to $680,000 for large enterprises. For voice fraud specifically, Modulate found the typical range is $5,000 to $25,000. The high-end case is Arup at $25.6 million from a single video conference attack.
Three structural factors: KYC reliance (which deepfakes directly undermine), high call centre volume, and wire transfer workflows. Credit unions are specifically targeted because, as Frank McKenna of Point Predictive documents, fraudsters know they haven’t invested in the detection technology larger banks have deployed.
Process-based defences cost nothing: out-of-band verification, dual approval for large transfers, 24-hour holds for unusual transactions — free to implement, and they would have stopped the Arup attack. Technology-based detection is priced for enterprise scale. For SMBs: process controls first, then technology when the cost modelling justifies it.
Passive call fraud scoring analyses calls in real time before connecting the caller to an agent — no additional steps required from the caller. Pindrop examines voice characteristics, call metadata, and behavioural patterns, then presents a risk score to the agent. The “passive” part is why MSUFCU’s NPS improved rather than declining.
They measure entirely different things. Group-IB’s $347M is verified quarterly losses from documented deepfake incidents globally. The FTC’s $12.5B is total US consumer fraud across all types in 2024. Deloitte’s $40B is a 2027 projection for all US generative AI-enabled fraud combined. Complementary, not contradictory.
Yes. Pindrop documented a 1,300% year-on-year increase in deepfake calls targeting financial institutions. McKenna demonstrated that 80% of conference attendees misidentified a deepfake voice as real. Humans correctly identify high-quality deepfakes only 24.5–50% of the time. Voice cloning costs as little as $0.01 per minute and needs just three seconds of source audio.
Surfshark tracked losses growing from $130 million over four years (2019–2023) to $400 million in 2024, then to $1.56 billion in 2025 — tripling year-on-year. Deloitte projects growth from $12.3 billion in 2023 to $40 billion by 2027 at roughly 32% compound annual growth.
Because humans are structurally unable to reliably detect high-quality synthetic media — only 24.5% accuracy on high-quality deepfake video. Even when explicitly warned, 33% of participants in a 2025 study still shared sensitive information with a synthetic voice bot. AI-to-AI defence is the only scalable response.
Five controls at zero cost: (1) never call back on the caller’s number — use a pre-stored known number; (2) require dual approval for transfers above a defined threshold; (3) implement a mandatory 24-hour hold for large or unusual transactions; (4) treat any urgency to bypass verification with immediate suspicion; (5) confirm all video call participants through a separate channel before executing any financial instruction.
$2.57 million in avoided fraud exposure over 14 months, operational savings from 58 seconds saved per authentication call, and a 10% NPS improvement. The subscription cost hasn’t been publicly disclosed, but the three-stream ROI framework provides the structure to calculate net return against any known price.
For most SMBs, in-house is impractical. Detection requires continuous model retraining, dedicated ML engineering, and large training datasets — third-party solutions have multi-year head starts. Implement free process controls now, then evaluate tooling when the per-incident cost modelling justifies it.
How Deepfake Fraud Became a Five Dollar Subscription ServiceDeepfake fraud is not a new threat. It is a supply-chain maturation event. And if you have been in software long enough to remember the early 2000s, you have seen this pattern play out before.
In 2003, exploiting a SQL injection vulnerability required real technical skill. By 2006, packaged exploit kits had reduced that barrier to a point-and-click exercise available on any forum. By the end of the decade, SQL injection attacks were automated, commoditised, and sold to non-technical operators on subscription. The specialists who built the tools made their money. Everyone else just bought access.
That same cycle has now completed for deepfake fraud. According to Group-IB‘s January 2026 research, a synthetic identity kit — AI-generated face, cloned voice sample, supporting documentation — sells for approximately $5 on dark web markets. A Dark LLM subscription runs around $30 per month. The tooling required to impersonate your CFO on a live video call is now priced below a Netflix subscription.
The consequences are measurable. Group-IB documents $347 million in verified deepfake fraud losses in a single quarter. Pindrop‘s 2025 report records an 880% surge in deepfake attacks in 2024.
This article maps the Deepfakes-as-a-Service (DaaS) market: what it sells, what it costs, how attacks work in practice, and why static defences are structurally inadequate against a commodity threat. For the broader context on how policy is struggling to keep pace, see our series overview on deepfake fraud and the policy response lag.
The shift is not primarily technical. It is economic.
Creating a convincing deepfake in 2019 required machine learning expertise, significant compute, and hours of model training. The skills were specialist, the barrier to entry kept the threat confined to well-resourced actors. That world no longer exists.
DaaS is the latest phase of Cybercrime-as-a-Service (CaaS) — the broader criminal services economy that previously commoditised ransomware, phishing kits, and credential-theft tooling. It is not a standalone AI phenomenon. It is the newest mature sub-market in a criminal services economy that has been running this playbook for over two decades.
The commoditisation of SQL injection toolkits, phishing kits, and now synthetic identity generation all follow the same trajectory: specialist capability packaged into turnkey tooling, distributed through criminal marketplaces, priced for volume. Group-IB describes AI as “the plumbing of modern cybercrime, quietly turning skills that once took time and talent into services that anyone with a credit card and a Telegram account can rent.”
The adoption curve data confirms the timing. AI-related mentions on dark web forums have grown 371% since 2019, with threads generating more than 23,000 new posts in 2025 alone. Identity fraud attempts using deepfakes surged 3,000% in 2023 as the technology crossed from niche curiosity to mainstream criminal tool.
The $5 price point matters because of what it represents: complete cost-barrier collapse for identity fraud. The remaining constraint is not access to tools or technical knowledge. It is willingness to commit fraud.
Deepfakes-as-a-Service (DaaS) is a subscription-based criminal market that sells pre-packaged deepfake generation tools on dark web platforms, without requiring technical skills from the buyer. It is a mature sub-market of the broader CaaS economy, with tiered pricing, customer support channels, and product update cycles.
Here is how the pricing breaks down.
Entry-level synthetic identity kits sell for approximately $5 per package. That gets you a generated face image, a cloned voice sample, and fabricated supporting credentials. Everything required to construct a fraudulent identity for KYC bypass or social engineering.
Dark LLM subscriptions are the second tier. Language models with safety restrictions removed, available from documented vendors for between $30 and $200 per month, with over 1,000 active subscribers. The Register described the pricing as comparable to a Netflix subscription — which is accurate, and that is exactly the point.
For higher-volume operations, real-time deepfake video platforms sit at a premium tier between $1,000 and $10,000. These are the tools capable of an Arup-class attack — live, interactive, multi-participant video impersonation. Group-IB recorded 8,065 deepfake-enabled fraud attempts at a single financial institution over eight months.
The aggregate impact: $347 million in verified deepfake fraud losses per quarter. Deloitte projects generative AI fraud losses in the US will climb from $12.3 billion in 2023 to $40 billion by 2027.
The financial consequences for organisations are covered in depth in the financial losses that result from this commoditised threat.
Dark LLMs are the scripting layer that makes deepfake fraud scalable.
A Dark LLM is a large language model with safety guardrails deliberately removed, sold on dark web platforms to assist criminal operations. Unlike ChatGPT or Claude, which refuse to generate phishing content or social engineering scripts, Dark LLMs are purpose-built for exactly those tasks. Products like WormGPT and FraudGPT produce personalised spear-phishing content and social engineering scripts on demand — no refusals.
Group-IB documents 1,000+ active subscribers and a continuous development cycle. These are actively maintained software products. They receive updates. They respond to support requests. The operational model mirrors legitimate SaaS.
Within the DaaS ecosystem, Dark LLMs fill the written communication layer. A synthetic identity kit provides the face and the voice. A Dark LLM provides the email impersonating the CFO, the follow-up establishing urgency, the social engineering script for the call that precedes the video conference. A single operator with a $5 identity kit and a $30/month Dark LLM subscription has access to all three attack components.
Voice cloning is currently the highest-frequency deepfake attack vector — cheaper, faster to deploy, and requiring less compute than real-time video deepfakes.
The technical barrier is lower than most defenders appreciate. Scammers need as little as three seconds of source audio to produce a voice clone with an 85% match to the original speaker. Conference talks, earnings calls, LinkedIn videos, corporate webinars — for most public-facing executives at SMB companies, sufficient voice samples are already publicly available.
Pindrop’s data is direct: deepfake attacks grew 880% in 2024, followed by a 1,210% surge by December 2025. Their Voice Intelligence and Security Report documents a 1,300% year-on-year increase in deepfake voice calls. US call centres saw a 173% increase in synthetic voice calls between Q1 and Q4 2024 alone.
The attack pattern is consistent. The attacker acquires voice samples from public sources, generates a real-time or pre-recorded clone, and places a call. Often a synthetic voice bot makes initial contact to probe IVR systems and validate credentials before the social engineering phase begins. When the voice-cloned call arrives, the target hears what sounds like their CFO applying urgency framing: “I’ll explain later” and “I need you to take care of this right now.”
Modulate‘s CTO Carter Huffman puts it plainly: “human ears and human eyes are just not enough — they’re rendered ineffective at determining what’s real.” Modulate’s January 2026 survey found 91% of enterprises planning to increase spending on voice fraud prevention over the next 12 months. That tells you everything about how the current defences are performing.
In January 2024, an Arup employee in Hong Kong authorised 15 wire transfers totalling HKD 200 million (approximately USD 25.6 million) in a single day. Every visible and audible participant on the video conference call — including the company’s CFO — was an AI-generated deepfake. Hong Kong Police Force reported the incident in February 2024.
The attack chain is instructive.
Initial access came via spear-phishing: an email impersonating the CFO, establishing a pretext around a confidential financial matter. The execution phase was the video conference — not a single deepfake, but multiple deepfakes representing the CFO and several colleagues, constructing a multi-participant call that provided both authority and social consensus. Urgency framing was applied. Explicit requests for confidentiality were made. All of it was standard social engineering, delivered through a technically convincing synthetic media layer. Source material came from publicly available content: LinkedIn videos, company conference recordings, corporate media appearances.
The employee had no technical means to distinguish the call from reality. Human accuracy in identifying high-quality deepfake video falls to 24.5% in controlled studies. A 2025 iProov study found only 0.1% of participants could correctly identify all fake and real media presented to them. This is not a negligence problem. It is a technology problem.
The fraud was discovered when the employee contacted actual headquarters to discuss the “secret transaction.” No arrests have been announced. The funds remain unrecovered.
Traditional fraud defences were designed for an era when fraud required skill, investment, and time. DaaS commoditisation has collapsed all three barriers.
The result is speed asymmetry: DaaS tooling iterates on subscription-funded development cycles — decentralised, competitive, market-driven — while enterprise fraud-detection systems update on compliance-driven cycles that are slow and reactive.
Here is how legacy defences fail against DaaS-class attacks specifically.
Knowledge-based authentication (KBAs) are defeated by synthetic identity kits pre-loaded with fabricated answers. Separately, 60% of organisations already report fraudsters using compromised PII from data breaches to bypass KBAs. The identity kit makes this trivially accessible.
One-time passwords (OTPs) are bypassed not by defeating the authentication mechanism, but by defeating the human before verification is reached. By the time an employee is asked to confirm a transaction, the social engineering has already succeeded. The OTP confirms an action the attacker has already authorised the human to take.
Rule-based fraud detection flags anomalies against historical baselines. DaaS tooling generates novel attack patterns with each update cycle. A rule set calibrated against yesterday’s signatures is obsolete before it is deployed. Gartner projects that by 2026, 30% of enterprises will no longer consider standalone identity verification reliable in isolation.
The full argument for why the arms race between generation and detection consistently favours the attacker is developed in our architecture guide. For the broader deepfake threat landscape — including the regulatory and compliance dimensions — the series overview provides the full context.
Current DaaS attacks still require a human operator to place the call, manage the wire transfer instructions. The human is both a capability and a constraint.
Experian‘s 2026 Future of Fraud Forecast identifies the removal of that constraint as the top emerging threat. Agentic AI fraud describes fully automated, machine-to-machine fraud that executes the entire attack chain — target identification, social engineering, financial extraction — without human operators. The scaling constraint shifts from operator availability to computational capacity.
The progression is already partially visible. Pindrop documents a major US healthcare provider facing over $40 million in account exposure from automated AI bot calls in 2025. The FBI has documented North Korean operatives using deepfake identities to secure IT employment positions at US companies and divert salaries back to the regime — an application of DaaS capabilities well beyond conventional financial fraud.
The FTC documented $12.5 billion in US consumer fraud losses in 2024 — a 25% increase despite fraud report volumes remaining stable. That increase reflects scams becoming more effective, not more numerous. Agentic automation removes the human bottleneck. The trajectory points toward $40 billion in generative AI fraud losses by 2027.
The insurance and liability exposure from losses at this scale is examined in our guide to why fraud losses at this scale expose significant insurance gaps.
What is Deepfakes-as-a-Service?
Deepfakes-as-a-Service (DaaS) is a subscription-based criminal market on dark web platforms that sells pre-packaged deepfake generation tools — synthetic faces, cloned voices, and fabricated identity documents — at commodity pricing. Entry-level synthetic identity kits cost approximately $5; Dark LLM subscriptions run around $30 per month. (Source: Group-IB, January 2026)
How much does it cost to buy a deepfake voice clone?
A synthetic identity kit including a cloned voice sample, AI-generated face, and supporting documentation costs approximately $5 on dark web markets. Scammers can produce an 85% voice match from as little as three seconds of source audio. Dark LLM subscriptions for generating social engineering scripts run around $30 per month.
What is a Dark LLM and is it legal?
A Dark LLM is a large language model with safety guardrails removed, sold on dark web platforms to assist criminal operations. WormGPT and FraudGPT are documented examples. Possessing or distributing these tools is illegal in most jurisdictions under computer fraud legislation, though enforcement is limited by the pseudonymous dark web structure.
How do criminals use AI to commit fraud?
DaaS platforms combine three components: synthetic identity kits (~$5) for fake faces and documents; voice cloning tools for replicating executive voices; and Dark LLMs (~$30/month) for personalised phishing and social engineering scripts. Together, they enable non-technical operators to conduct sophisticated identity fraud and wire transfer scams at scale.
Can you really lose millions to a deepfake video call?
Yes. In January 2024, Arup lost HKD 200 million (approximately USD 25.6 million) when an employee in Hong Kong authorised 15 wire transfers after a video conference where all participants — including the company’s CFO — were AI-generated deepfakes. The Hong Kong Police Force confirmed the incident.
How fast are deepfake fraud attacks growing?
Pindrop documented 880% deepfake attack growth in 2024, followed by a 1,210% surge by December 2025, and a 1,300% year-on-year increase in deepfake voice calls. Group-IB reports AI-related dark web forum mentions grew 371% since 2019, with deepfake fraud losses reaching $347 million per quarter.
What is the difference between deepfake fraud and a regular social engineering attack?
Traditional social engineering relies on text-based deception. Deepfake fraud adds a synthetic media layer — AI-generated video, cloned voices, and fabricated documents — that defeats the visual and auditory verification humans naturally rely on. The Arup case showed a live, multi-participant deepfake video call can deceive trained employees in ways text-based phishing cannot.
What is a deepfake scam and how does it work?
A deepfake scam uses AI-generated synthetic media — typically a cloned voice or fabricated video — to impersonate a trusted individual and deceive a target into authorising a financial transaction. The attacker acquires voice or video samples from public sources, generates a real-time deepfake, and uses it in a phone call or video conference.
Why is employee awareness training not enough to stop deepfake fraud?
Awareness training assumes employees can detect deception through vigilance. Deepfake technology defeats the sensory cues that vigilance relies on. The Arup employee saw and heard the CFO on a live video call with no technical means to detect the deepfake. Human accuracy in identifying high-quality deepfake video falls to 24.5% in controlled studies. Detection requires technological countermeasures, not human perceptiveness.
What is agentic AI fraud?
Agentic AI fraud, identified by Experian as the top emerging threat for 2026, describes fully automated fraud where AI systems execute the entire attack chain — target identification, social engineering, financial extraction — without a human operator. It is the next evolution beyond current DaaS models, which still require a human to operate the tools.
How does DaaS compare to earlier cybercrime tooling like phishing kits?
DaaS follows the same commoditisation trajectory as SQL injection toolkits (2003–2008) and phishing kits (2010s): specialist capability packaged into turnkey tooling, sold at subscription pricing, distributed to non-technical operators via criminal marketplaces. Supply-chain maturation within the broader Cybercrime-as-a-Service economy.
Is voice deepfake fraud more common than video deepfake fraud?
Currently, yes. Voice cloning is the highest-frequency deepfake attack vector. Pindrop documents a 1,300% year-on-year increase in deepfake voice calls. Voice cloning requires less compute and less training data — as little as three seconds of audio — making it cheaper and faster to deploy at scale. The $5 synthetic identity kit includes a cloned voice sample for exactly this reason.
How Browser Agents Are Rewriting the Rules of Web SecurityYour team is probably already using an agentic browser. Cyberhaven’s research found that 27.7% of enterprises had at least one employee with ChatGPT Atlas installed within a week of its launch — before IT knew it existed. These are autonomous agents that navigate pages, submit forms, read email, and execute workflows using your logged-in credentials. The security frameworks for managing them are months behind the deployments. What follows maps six distinct threat surfaces, each linking to the article that covers it in depth.
A conventional browser displays content. An agentic browser acts on it — it is a second user operating under your account, with your authority, across every domain at once. This creates the confused deputy problem: the agent holds legitimate cross-domain authority, so attackers redirect it using content it processes. Same-Origin Policy and CORS provide zero protection because the agent’s cross-domain access is intentional and authorised by design.
That architectural gap is why prompt injection remains unsolved. Malicious instructions embedded in a webpage, image, or calendar invite are indistinguishable from legitimate content at the model level. Indirect prompt injection requires no user action and cannot be blocked by input sanitisation. OpenAI’s own CISO has called it “a frontier, unsolved security problem”. Microsoft’s FIDES project proposes deterministic architectural controls — enforced separation of trust rather than filters that can be bypassed.
Read more: Why Prompt Injection Is the Unsolved Problem Inside Every Agentic Browser
Between August 2025 and February 2026, researcher-attributed vulnerabilities were disclosed in Perplexity Comet, Opera Neon, Fellou, and ChatGPT Atlas — with Atlas vulnerable within 24 hours of its beta launch. The OpenClaw incident saw credential exfiltration via a single crafted email in about five minutes. Miggo Security‘s calendar-invite semantic attack against Google Gemini showed that even Google’s own detection model failed when the payload was indistinguishable from a legitimate entry. The pattern: vendors cannot patch the ambient web content their agents process. Each of the six demonstrated exploits in the incident record shares one root cause — the architectural problem inside every agentic browser described above.
Read more: Six Demonstrated Exploits That Prove Agentic Browser Security Is Not Theoretical
The exploits above target the browser interface. The less-visible attack surfaces are AI skill registries and messaging app link previews. Cisco analysed 31,000 AI agent skills and found 26% contained at least one vulnerability — the top-ranked skill in MolthHub was functionally malware, exfiltrating data to an external server via a silent curl command.
The Model Context Protocol (MCP) creates a second supply chain surface: malicious MCP servers can be installed like any other tool and execute code with elevated permissions. Zero-click link preview exfiltration through Teams, Telegram, and Slack requires no user action beyond having the agent deployed. The full picture of the AI skill supply chain and zero-click messaging exploits represents the attack surface most security reviews miss entirely.
Blocking deployment is not sustainable — 69% of enterprises are already piloting or running early production agent deployments, and employees are self-adopting consumer-grade agents without IT oversight. The practical approach: assess current shadow AI exposure, run a sandboxed pilot with non-production data, implement governance controls before company-wide access, and integrate agent telemetry into existing SIEM/XDR platforms without requiring new tooling. A governance framework for rollout — covering acceptable use policy, least-privilege configuration, and an incident response playbook for browser-agent compromise — is the practical complement to the threat model laid out above.
Read more: A Safe Adoption Playbook for Agentic Browsers Before Company-Wide Rollout
Several open-source options exist. Cisco’s Skill Scanner (GitHub: cisco-ai-defense/skill-scanner) combines YARA signature detection, LLM-as-judge semantic analysis, and VirusTotal correlation. Perplexity’s BrowseSafe achieves F1 ~0.91 on prompt injection benchmarks. Giskard offers LLM red teaming with 40+ adversarial probes. TrojAI covers build-time testing (Detect) and runtime protection (Defend). PromptArmor’s aitextrisk.com tests specifically for link-preview exfiltration. These open-source scanning and red-teaming tools give security teams practical options regardless of budget.
Read more: Open-Source Tools for Scanning and Red-Teaming Agentic Browser Security
The open-source tools above map directly to recognised risk classifications. OWASP LLM Top 10 covers the core risks under LLM01 (Prompt Injection), LLM02 (Sensitive Information Disclosure), LLM05 (Improper Output Handling), and LLM06 (Excessive Agency). The OWASP Top 10 for Agentic Applications 2026 extends this to cascading failures, orchestration vulnerabilities, and memory poisoning.
The liability question matters: when an agent manipulated by prompt injection sends PII externally, the data controller — the company deploying the agent — bears the compliance obligation, not the vendor. OpenAI’s documentation explicitly states Atlas is not in scope for SOC 2 or ISO attestations. For board-ready risk language that maps these incidents to GDPR, HIPAA, and SOC 2 obligations, the full framework analysis is the place to start.
Read more: What OWASP, MITRE, and Compliance Frameworks Say About Agentic Browser Risk
Start with the threat model, not the tooling. Read the prompt injection mechanics first — it explains why the problem is architectural, not a configuration issue you can patch. Then assess what your team has already deployed with a shadow AI audit.
Three time-sensitive actions: (1) audit for self-deployed agents already in use, (2) check whether any agentic browser has authenticated access to regulated data, and (3) confirm your SIEM can ingest agent action logs. The attack surface — the ambient web — is not patchable. The appropriate posture is governed deployment with monitoring, not block-or-allow.
Suggested reading order: Prompt injection mechanics → The evidence base → Attack surfaces you may be missing → Governance playbook → Tooling → Compliance language for boards and auditors
A regular browser renders content for a human to read and interact with. An agentic browser uses an embedded AI agent to interpret content and take actions autonomously — navigating, form-filling, clicking, reading email, and executing multi-step workflows — on behalf of the user, using the user’s authenticated session. The security distinction is that the agent acts, where the browser only displays.
Yes. Indirect prompt injection embeds malicious instructions in web content the agent processes — a page it navigates to, an email it reads, an image it OCR-parses. PromptArmor documented zero-click exfiltration through messaging app link previews in February 2026, requiring no user interaction beyond having the agent deployed.
See also: Six Demonstrated Exploits That Prove Agentic Browser Security Is Not Theoretical
These tools intercept known attack signatures and enforce data policies at the file or endpoint level. Agentic browser threats operate at the semantic layer — the attack is a natural-language instruction indistinguishable from legitimate content. DLP cannot inspect content inside an agent prompt. CASB sees an authenticated session and no policy violation. EDR sees no malware.
See also: The Agentic Browser Attack Surface You Are Not Thinking About
Shadow AI refers to employee-adopted AI agents deployed without IT oversight. OpenClaw went viral with 60,000+ GitHub stars in 48 hours, with developers using it to manage email, Slack, and calendars before IT knew it existed — acting with full user credentials, outside any monitoring. See also: The Agentic Browser Attack Surface You Are Not Thinking About
OpenAI’s own documentation explicitly states: “Do not use Atlas with regulated, confidential, or production data.” When an agent exfiltrates PII via a prompt injection triggered by a third-party website, the company deploying the agent bears data controller obligations under GDPR, including the 72-hour breach notification requirement. See also: What OWASP, MITRE, and Compliance Frameworks Say About Agentic Browser Risk
The OWASP LLM Top 10 covers language model risks broadly; the OWASP Top 10 for Agentic Applications 2026 is a dedicated extension that addresses the additional risk surface introduced when an LLM is given the ability to take autonomous actions — covering cascading failures, orchestration vulnerabilities, memory poisoning, and excessive agency in ways the LLM Top 10 does not.
ChatGPT Atlas was found to be vulnerable to prompt injection within 24 hours of its beta launch in November 2025. The attack surface is the ambient web, not the product code — researchers do not need to reverse-engineer the product, they need to construct an injection payload that the agent will process during normal browsing.
The Agentic Browser Attack Surface You Are Not Thinking About: AI Skills and Zero-Click Messaging ExploitsYou run npm audit before every deploy. You pin your dependency versions, watch for CVEs, and think hard before touching the lockfile. You have a mental model for supply chain risk, and you actually apply it.
Your AI agents have a dependency tree too. It is called a skill registry. And when Cisco researchers analysed 31,000 agent skills in January 2026, 26% of them contained at least one security vulnerability. The number-one ranked skill on MolthHub — the primary skill registry for OpenClaw — was, under scrutiny, functionally malware.
At the same time, PromptArmor documented a second attack class that requires no user interaction at all. AI agents integrated into messaging platforms like Microsoft Teams or Telegram can be tricked into exfiltrating sensitive data through automatic link preview fetches. No click required. No awareness required.
These are not theoretical. This article maps both threat classes to frameworks you already know, walks through the attack chains, and gives you a practical starting point for finding out what your team has already deployed. For the broader agentic threat landscape, the pillar article covers the full risk picture. For the underlying injection mechanics, see indirect prompt injection as the enabling mechanism.
The structural analogy is pretty direct. An npm package is third-party code installed and executed inside your application’s trust boundary. An AI skill is third-party instructions, scripts, and resources installed and executed inside your AI agent’s trust boundary.
The registries — ClawHub for OpenClaw, Claude Skills from Anthropic, OpenAI Codex Skills — work like open registries where anyone can publish. Popularity signals serve as loose proxies for legitimacy. There is no systematic pre-publication security review. That is npm circa 2015, before the tooling matured.
Here is the critical difference: what “code” means in this context. An AI skill is often a markdown file. In an agent ecosystem, markdown is an installer. As 1Password‘s Jason Meller put it: “Markdown isn’t ‘content’ in an agent ecosystem. Markdown is an installer — ‘run this command,’ ‘paste this in Terminal.'” Your SCA scanner has no visibility into a natural-language instruction that tells an AI agent to silently exfiltrate your credentials.
The blast radius is also bigger than a typical npm package. A skill runs with the agent’s full permissions: local file system access, shell command execution, environment config files storing API keys and database credentials, and messaging integrations across WhatsApp, Slack, Teams, Telegram, and Discord.
If your team has installed 15 agent skills, statistical expectation puts three to five of them containing at least one vulnerability. There is no skill lockfile, no version pinning, no provenance signing. The Cisco Skill Scanner for supply chain assessment is the closest thing to npm audit now available.
In January 2026, Cisco researchers Amy Chang and Vineeth Sai Narajala pulled the top-ranked skill from MolthHub and ran it through their Skill Scanner. The skill was called “What Would Elon Do?” and its popularity had been artificially inflated.
The scanner returned nine security findings: two critical, five high severity. The skill was functionally malware. It instructed the bot to execute a curl command sending data to an external server controlled by the skill author — silently, without user awareness — and ran a direct prompt injection to force the assistant to bypass its safety guidelines. High severity findings included command injection via embedded bash commands and tool poisoning with malicious payloads in the skill files.
This is the AI skill equivalent of a typosquatted npm package that runs malicious postinstall scripts. The only difference is that the “script” is a natural-language instruction the agent interprets as legitimate, which means your existing tooling cannot catch it.
AuthMind documented 230 malicious skills in the OpenClaw/ClawHub ecosystem since January 27, 2026. A 1Password analysis found a “Twitter” skill — top downloaded at the time — that appeared normal but delivered macOS infostealing malware targeting browser sessions, saved credentials, developer tokens, SSH keys, and cloud credentials. The ClickFix technique runs through many of these: skills display fake UI prompts instructing users to paste commands in Terminal. For the full documented incident record, see the OpenClaw and MolthHub incident record.
In February 2026, PromptArmor documented an attack class that requires no user interaction beyond having an AI agent deployed in a messaging channel. Here is the attack chain using Microsoft Teams and Copilot Studio:
The attack is zero-click because the only trigger is the AI agent processing the message. “Don’t click suspicious links” is categorically irrelevant when the link is never presented to a human.
EchoLeak was patched by Microsoft in May 2025. It bypassed four defences in sequence: Microsoft’s XPIA prompt-injection classifier, Copilot’s link redaction mechanisms, image auto-fetch Content Security Policy, and CSP domain controls via the Teams proxy allowlist. Each layer was individually insufficient. The patch addressed that specific chain; the structural pattern persists across other platforms. PromptArmor operates aitextrisk.com to test specific combinations.
The risk is combinatorial — it depends on the messaging platform’s link preview behaviour combined with the AI agent’s response handling. Neither alone determines the outcome.
Microsoft Teams with Copilot Studio is the highest-volume risk, accounting for the largest share of link preview fetches in PromptArmor’s data because of enterprise deployment scale. EchoLeak is patched for that specific chain, but the structural pattern persists for other agent integrations. Telegram with OpenClaw is the highest-risk-per-fetch option: link previews are enabled by default and require a manual config change to mitigate — one most users will never make. Discord with OpenClaw and Slack with Cursor Slackbot are both documented at-risk. Slack is especially relevant given how embedded it is in developer toolchains.
On the safer end: the Claude App in Slack and OpenClaw on WhatsApp are reported lower-risk in PromptArmor’s current data. Signal with a containerised Docker deployment of OpenClaw is reported safe but requires infrastructure most teams will not maintain.
“Safer” is not “certified safe” — those designations reflect current PromptArmor data, not permanent guarantees. The real question is which combinations your team is actually running. Use aitextrisk.com to check your specific setup.
Here is a specific pathway rather than an abstract risk description.
A backend developer downloads OpenClaw for personal productivity. It is free, open-source, and well-reviewed on GitHub. They connect it to their work Slack account and work email. They install four community skills from ClawHub. One contains malicious code — and they had no way to know that.
OpenClaw now has access to every Slack message in every channel their account can read, the ability to send messages as that user, their entire email history, their local file system including AWS credentials and database passwords, their GitHub personal access token, and persistent memory of all interactions. OpenClaw has a Telegram interface with default link preview behaviour enabled. Zero-click exfiltration is possible. Your IT team has zero visibility.
AuthMind documented this exact scenario as a representative case study. Bitsight observed more than 30,000 distinct OpenClaw instances exposed online between January 27 and February 8, 2026, appearing in healthcare, finance, government, and insurance environments.
Here is the critical distinction from shadow IT: a developer using Dropbox creates a data residency risk. A developer running OpenClaw creates an autonomous agent that can send messages, execute shell commands, and interact with external services without per-action approval. In a 150-person company with 30 developers, three running OpenClaw means three unvetted agents with full access to corporate channels. For shadow AI governance in the adoption playbook, ART003 covers the response.
Start with discovery before governance. You cannot govern what you do not know exists.
Network footprint. Look for rapid sequential API calls, off-hours access spikes, and new OAuth grants that do not correspond to IT-approved integrations. AI agents have identifiable network signatures.
Credential exposure surface. Map where developer credentials live on your endpoints — environment config files, personal access tokens, cloud credentials, and broad-scope OAuth tokens. These are what an unvetted skill targets first.
Messaging platform integrations. Review Slack and Teams app integrations and bot connections for anything not provisioned through IT. OpenClaw connects across WhatsApp, Slack, Teams, Telegram, and Discord.
Direct conversation with your team. Ask what AI tools people are using, framed as enablement rather than enforcement. You will get honest answers if it does not feel like an interrogation.
Skill inventory. For any discovered deployments, identify installed skills and run each through the Cisco Skill Scanner. It combines static analysis, behavioural analysis, LLM-assisted semantic analysis, and VirusTotal integration — the closest equivalent to npm audit for AI skills. See open-source tooling for skill vetting for a full treatment.
After discovery: establish an AI tool allowlist, make skill scanning a gate before production use, assess platform combinations using aitextrisk.com, and be clear on what is patched versus structural. EchoLeak is patched. The skill supply chain risk and shadow AI entry pathway require ongoing governance.
One misconception worth addressing: MCP (Model Context Protocol) does not provide a security sandbox. It is a communication protocol — a doorbell, not a door lock. Skills do not need to use MCP at all and can bundle executable scripts that run outside the MCP tool boundary. The goal is safe adoption, not prohibition. For responding to employee-adopted agent tools and a complete browser-agent security overview, those resources cover the broader response.
An AI skill is an organised folder of instructions, scripts, and resources an agent loads dynamically to perform specialised tasks. Unlike browser extensions that operate in a sandboxed environment with per-use approval, AI skills are invoked autonomously at runtime with the agent’s full permissions — file system, shell commands, and messaging integrations. There is no meaningful boundary between “reading the skill” and “executing the skill.”
XPIA is Microsoft’s term for indirect prompt injection that crosses context boundaries. The attacker embeds malicious instructions in external content — a document, webpage, or message — that the AI agent processes as data. Unlike direct prompt injection, where the attacker controls user input directly, XPIA exploits the agent’s retrieval of untrusted external content. The victim sees nothing unusual.
Yes. If an AI agent is integrated into a messaging platform with link previews enabled, an attacker can craft a message that tricks the agent into generating a URL containing sensitive data on an attacker-controlled domain. The platform’s link preview automatically fetches that URL, sending data to the attacker’s server with zero user interaction. PromptArmor documented this chain in February 2026 across multiple platform and agent combinations.
OpenClaw’s own documentation states: “There is no ‘perfectly secure’ setup.” It runs shell commands, reads and writes files, and executes scripts. It has been documented to leak plaintext API keys. If an employee connects it to work accounts, corporate data becomes accessible to an unvetted agent running unscanned skills. 1Password’s guidance: “If you have already run OpenClaw on a work device, treat it as a potential incident and engage your security team immediately.”
No. MCP is a communication protocol for connecting AI agents to external tools and data sources. It does not sandbox skill execution, validate skill content, or prevent malicious behaviour. Skills do not need to use MCP at all and can bundle executable scripts that run outside the MCP tool boundary.
PromptArmor operates aitextrisk.com, where you can check whether your specific AI agent and messaging platform combination triggers insecure link previews. It provides empirical data on which combinations are currently safe versus vulnerable.
A patched vulnerability has a specific fix — EchoLeak (CVE-2025-32711) was patched in May 2025; the Telegram/OpenClaw issue can be fixed with a config change. A structural vulnerability has no single fix. The AI skill supply chain has no certification standard, no lockfile equivalent, no provenance system. Structural vulnerabilities require ongoing governance, not one-time patches.
AuthMind documented 230 malicious skills in the OpenClaw/ClawHub ecosystem since January 27, 2026. Cisco found 26% of 31,000 agent skills analysed contained at least one vulnerability. The #1 ranked skill on MolthHub — scanned by Cisco researchers Amy Chang and Vineeth Sai Narajala — returned nine security findings including two critical and five high severity.
Cisco’s open-source Skill Scanner (github.com/cisco-ai-defense/skill-scanner) combines static analysis, behavioural analysis, LLM-assisted semantic analysis, and VirusTotal integration. It provides severity ratings, file locations, and actionable guidance — the closest equivalent to npm audit for AI skills.
Audit for existing shadow AI deployments first. Establish an AI tool allowlist with procurement review criteria. Require Cisco Skill Scanner analysis before any skill is approved. Assess messaging platform and agent combinations using PromptArmor’s aitextrisk.com. Configure link preview settings where platforms allow. And ensure everyone understands that AI agents act autonomously — they read, write, send, and execute on behalf of whoever connected them.
CiteMET and tools like the AI Share URL Creator give attackers no-code access to link-based memory manipulation — the ability to inject content into an AI agent’s context without custom exploit code. Their existence means the zero-click exfiltration and supply chain techniques in this article are accessible to a broader attacker population than the underlying mechanics might suggest.
What OWASP MITRE and Compliance Frameworks Say About Agentic Browser RiskWhen you’re explaining agentic browser risk to your board or legal team, a technically accurate description of how prompt injection works won’t get you very far. These audiences aren’t asking what can go wrong. They’re asking which framework classifies it, what regulation it violates, and what your notification obligation is.
This article gives you the framework-to-risk mapping and compliance vocabulary you need to turn a security concern into a governance action item. Three major frameworks now classify agentic browser risk: OWASP LLM Top 10, OWASP Top 10 for Agentic Applications 2026, and MITRE ATLAS. For broader context, see the full browser-agent risk landscape.
Boards, auditors, and cyber insurers think in terms of regulatory exposure and control gaps — not attack vectors. Telling a board that “prompt injection can hijack a browser agent” is accurate, but it doesn’t answer their questions. Which framework classifies this? What’s our notification obligation? What does our insurer require?
Recognised frameworks carry weight because they’re independently maintained, peer-reviewed, and already embedded in audit and insurance evaluation processes. The vocabulary shift is what makes it actionable. “Prompt injection” becomes OWASP LLM01. “Too many permissions” becomes OWASP LLM06 Excessive Agency. “AI attack technique” becomes a MITRE ATLAS technique ID. Some cyber insurers are already asking for evidence of controls aligned to ISO/IEC 42001 or NIST AI RMF before covering agentic workflows. For the technical detail behind those classifications, see how these frameworks classify the attack mechanics.
OWASP LLM Top 10 is the most widely referenced security taxonomy for LLM-based systems. Four entries map out a complete attack chain from injection through to data exfiltration. Giskard’s analysis of OpenAI Atlas shows how they fit together:
LLM01 — Prompt Injection is the entry vector. Indirect prompt injection from a third-party webpage hijacks the agent’s goal and redirects its actions. This is how most agentic browser compromises begin.
LLM02 — Sensitive Information Disclosure is the data risk. Browser agents process content from every page they visit — authenticated email, CRM, internal tools — and transmit it to cloud inference engines, often without a Data Processing Agreement in place.
LLM05 — Improper Output Handling is the exfiltration mechanism. Agent-generated URLs and link construction can contain malicious payloads that execute in the browser context.
LLM06 — Excessive Agency is the amplifier. In agent mode, the system makes real-time decisions about form submissions, clicks, and navigation across authenticated sessions. A single injected instruction can propagate across multiple domains before anyone notices. For documented incidents, see real-world examples of LLM01 in practice.
The OWASP Top 10 for Agentic Applications 2026, published in December 2025 by more than 100 industry experts, is a dedicated framework for autonomous AI systems. It addresses risks the LLM Top 10 simply wasn’t designed for.
The LLM Top 10 evaluates individual model risks. The Agentic Top 10 evaluates system-level risks: multi-step workflows, tool use, autonomous decision-making. For audit purposes, the Agentic Top 10 is the more applicable reference for your risk register.
ASI01 — Agent Goal Hijack formally names the browser-agent prompt injection attack. ASI02 — Tool Misuse and Exploitation covers what happens when the hijacked agent turns browser capabilities — email, form submission, API calls — against the employee’s own session. The Arize compliance guide maps each ASI entry to the audit trail artefacts you’ll need.
MITRE ATT&CK covers human-initiated cyberattack tactics. MITRE ATLAS extends that taxonomy for AI systems. Agentic browser risk sits firmly in ATLAS territory.
In the first MITRE ATLAS update of 2026, Zenity contributed browser-agent-specific techniques:
MITRE ATLAS carries more weight with boards, insurers, and auditors than vendor-defined risk categories because MITRE is independently maintained and maps to existing ATT&CK-based threat intelligence workflows.
Frameworks classify risk. But organisations also need to know what architectural approaches satisfy those framework requirements. That’s where Microsoft FIDES comes in.
FIDES is a Microsoft Research framework that uses information-flow control to address indirect prompt injection. Microsoft’s MSRC blog describes it as an approach for “deterministically preventing indirect prompt injection in agentic systems.”
The distinction matters for compliance. Content filtering and prompt shields detect most attacks most of the time. FIDES provides hard architectural guarantees — certain attacks cannot succeed regardless of model behaviour. Probabilistic controls are harder to attest to. Deterministic controls are auditable.
FIDES draws a hard line between trusted user instructions and what a webpage tells the agent to do. Deployments that implement FIDES-style trust boundaries can document a control addressing OWASP LLM01 and ASI01 with architectural mitigation — exactly the type SOC 2 auditors can evaluate.
This is not legal advice, but the exposure under each framework is concrete.
When a browser agent exfiltrates personal data from an authenticated session, that maps to a “personal data breach” under GDPR. Article 33 requires the controller to notify the supervisory authority within 72 hours of becoming aware. The organisation deploying the browser agent is the data controller — a third-party attacker does not transfer that responsibility. If the agent service provider has no Data Processing Agreement in place, controller exposure is direct.
OpenAI’s own FAQ is explicit: “Can we use Atlas with regulated data such as PHI or payment card data? No.” There is no Business Associate Agreement option for Atlas. PHI processed without a BAA creates per-incident HIPAA violation exposure.
OpenAI’s enterprise documentation states Atlas is “not currently in scope for OpenAI SOC 2 or ISO attestations.” Using a product excluded from the vendor’s own certification in a SOC 2-audited environment creates a control gap auditors will flag. Agent actions must be logged, attributable, and reviewable.
If a browser agent processes, transmits, or stores payment card data during automated transactions, it’s an in-scope system component under PCI-DSS. Giskard’s conclusion is direct: treat Atlas as out of scope for any systems processing regulated data. For practical controls for GDPR and SOC 2, the governance playbook covers implementation.
Here’s the scenario: an employee uses an agentic browser for work. It navigates to a site containing indirect prompt injection, reads sensitive data from another open tab, and sends a data-exfiltrating email using the employee’s credentials. Who bears legal responsibility?
Under GDPR, the answer is the organisation deploying the browser agent. Article 32 requires controllers to implement “appropriate technical and organisational measures.” Deploying a browser agent without adequate controls for a foreseeable attack vector may constitute a failure of those obligations. The 72-hour notification clock starts on awareness — and the method of breach doesn’t change that obligation.
Obrela frames this as the “confused deputy” problem: a compromised agent executes unauthorised business logic on your behalf. In one documented incident, a malicious webpage caused an agent to read data from open tabs, encode it to evade DLP, and exfiltrate it while endpoint tools saw nothing but standard browser behaviour.
This is not legal advice. But the direction from Shumaker is clear: without documented controls aligned to recognised frameworks, you’ll struggle to demonstrate compliance diligence when a breach occurs. For the incidents these frameworks were validated against, the incident record makes for instructive reading.
The framework-to-control mapping is the most useful compliance artefact — it directly answers the auditor’s question.
Framework-to-control mapping: records which OWASP LLM Top 10, OWASP ASI Top 10, and MITRE ATLAS entries apply to your deployment and which controls address each.
Regulatory exposure assessment: records GDPR, HIPAA, SOC 2, and PCI-DSS applicability based on the data types your browser agents can access.
Vendor compliance gap analysis: records the vendor’s own compliance scope and your compensating controls. Acuvity notes that Atlas is enabled by default for Business tier customers without administrative approval workflows.
Agent permission inventory: records what actions your browser agents can perform and what authentication tokens they hold — maps to OWASP LLM06 and Zero Trust Architecture least-privilege principles.
Incident response procedures: records how your organisation detects, responds to, and reports a browser-agent-mediated breach — including the 72-hour GDPR notification pathway. NIST AI RMF and ISO/IEC 42001 are useful complementary references here.
For the governance playbook that satisfies these compliance requirements and for tools that address specific OWASP items, the companion articles cover implementation. The browser-agent security overview provides the full landscape.
The LLM Top 10 addresses risks in individual LLM deployments — prompt injection, data disclosure, excessive agency. The Agentic Top 10, published December 2025, addresses system-level risks specific to autonomous AI agents — goal hijacking (ASI01) and tool misuse (ASI02). For agentic browser deployments, the Agentic Top 10 is the more applicable reference: it’s built for multi-step agent workflows and autonomous decision-making.
The 2026 ATLAS update includes browser-agent-specific techniques contributed by Zenity: AML.T0098 (AI Agent Tool Credential Harvesting), AML.T0099 (AI Agent Tool Data Poisoning), AML.T0100 (AI Agent Clickbait), and AML.T0101 (Data Destruction via AI Agent Tool Invocation). These technique IDs are what auditors and insurers expect to see in board-level risk presentations.
No. OpenAI’s enterprise documentation states Atlas is “not currently in scope for OpenAI SOC 2 or ISO attestations” and “Do not use Atlas with regulated, confidential, or production data.” Atlas also lacks compliance API logs, SIEM integration, SSO enforcement, and IP allowlists.
Under GDPR, the organisation deploying the browser agent is the data controller and bears primary responsibility for implementing appropriate technical and organisational measures (Article 32). A prompt-injection-driven exfiltration constitutes a personal data breach, potentially triggering 72-hour notification obligations under Article 33.
FIDES uses information-flow control to create deterministic architectural guarantees against indirect prompt injection. Unlike probabilistic defences (content filtering, prompt shields), FIDES ensures certain attacks cannot succeed regardless of model behaviour. That deterministic approach is auditable in ways probabilistic filtering simply isn’t.
The confused deputy problem occurs when a privileged component (the browser agent) is tricked into misusing its authority on behalf of an attacker. CometJacking demonstrated this: hidden instructions on a malicious webpage caused an agent to read data from other open tabs, encode it, and exfiltrate it while appearing to endpoint tools as standard browser behaviour.
Published on the OWASP GenAI Security Project website at genai.owasp.org, released December 9, 2025. It covers ASI01 through ASI10 with detailed risk descriptions, prevention strategies, and example attack scenarios.
Open-Source Tools for Scanning and Red-Teaming Agentic Browser SecurityBy the end of this article you’ll have a clear, scannable reference for what each tool does, what attack class it detects, where to find it, and how it fits into a defensive strategy. The tools are grouped into three modes: build-time scanning, runtime protection, and offensive tools you need to know exist. Conventional web-application security tooling doesn’t cover prompt injection, memory poisoning, or skill-supply-chain compromise — these attack classes need purpose-built detection.
This article is part of our browser-agent security landscape series. For background on the attacks these tools detect, see the linked article.
MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is structured like MITRE ATT&CK but specific to AI/ML attacks. Two technique IDs matter most here: AML.T0051 (LLM Prompt Injection — malicious instructions embedded in content that cause an agent to perform unintended actions) and AML.T0080 (AI Memory Poisoning — attackers inject instructions into AI memory stores that persist across future sessions; Microsoft calls this “AI Recommendation Poisoning”).
The OWASP Agentic Top 10 (2026) extends the OWASP LLM Top 10 to agent-specific risks. The relevant items: ASI-01 (Agent Goal Hijack), ASI-02 (Tool Misuse), ASI-04 (Supply Chain), and ASI-06 (Memory Poisoning).
Framework mapping is what turns a tool inventory into something you can actually act on. “This tool detects AML.T0051” is the language compliance conversations require — tool names alone aren’t enough. For a deeper treatment, see OWASP and MITRE items these tools address. For the full overview of agentic browser risks that frames the threat model these tools address, see the pillar overview.
Start here. The skill supply chain attack surface (OWASP LLM05) is where the most concrete evidence of compromise has emerged.
Cisco Skill Scanner (github.com/cisco-ai-defense/skill-scanner) is an open-source static scanner for AI agent skill packages — the MCP-adjacent capability packages that AI agent platforms install locally. The detection engines work in sequence: static YAML/YARA pattern matching; LLM-as-a-judge semantic analysis of flagged content; behavioural dataflow analysis tracing data paths for exfiltration patterns; and VirusTotal integration for hash-based malware detection.
The 26% finding: Cisco researchers Amy Chang and Vineeth Sai Narajala analysed 31,000 agent skills. The #1 ranked skill on MolthHub was functionally malware — the “What Would Elon Do?” skill silently exfiltrated data via curl and injected prompts to bypass safety guidelines. Nine findings surfaced against it: two critical, five high severity. Cisco’s summary: “AI agents with system access can become covert data-leak channels that bypass traditional data loss prevention, proxies, and endpoint monitoring.”
One caveat worth noting from the tool itself: “No findings does not guarantee that a skill is secure.” Human review still matters for high-risk deployments.
Output is in SARIF format, compatible with GitHub Code Scanning for inline findings in pull requests. For more on agent skill vulnerability scanning, see the linked article.
Maps to: OWASP LLM05 (Supply Chain), LLM06 (Excessive Agency), MITRE AML.T0051.
agent-audit (github.com/HeadyZhang/agent-audit) works like Bandit or Semgrep but with an agent-specific threat model built on the OWASP Agentic Top 10 (2026).
The numbers make the case plainly: agent-audit achieves 94.6% recall and 0.91 F1 on Agent-Vuln-Bench; Bandit achieves 29.7%; Semgrep achieves 27.0%. Neither Bandit nor Semgrep can parse MCP configuration files — 0% recall on agent-specific configuration vulnerabilities. agent-audit gets 100% there. It found OpenClaw‘s browser.evaluateEnabled default-true vulnerability in practice; version 0.16 cut false positives by 79%.
SARIF output integrates with GitHub Code Scanning.
Maps to: OWASP ASI-01, ASI-02, ASI-04, ASI-06 (all 10 Agentic categories), MITRE AML.T0051.
Giskard is an open-source LLM security testing platform that deploys autonomous red-teaming agents across 40+ adversarial probes. Where agent-audit scans code before it runs, Giskard probes the running model.
Researchers using Giskard produced the published OpenAI Atlas security vulnerability analysis, demonstrating indirect prompt injection, cross-origin data access risk, and data exfiltration to OpenAI’s infrastructure. The finding: Atlas is “not in scope for OpenAI SOC2 or ISO certification” and has “no compliance API logs or SIEM integration.”
Maps to: OWASP LLM01, LLM02, LLM05, LLM06, MITRE AML.T0051.
TrojAI covers both sides of the protection gap. These are commercial products — included here because no open-source runtime protection tool currently provides equivalent capability.
TrojAI Detect (troj.ai/products/detect) automatically red-teams AI models at build time, validates behaviour against security policies, and delivers remediation guidance. Think of it as your pre-merge test suite for AI model behaviour. Maps to: OWASP LLM01, MITRE AML.T0051.
TrojAI Defend (troj.ai/products/defend) is a runtime AI application firewall. Unlike a conventional WAF that inspects HTTP traffic for known attack patterns, TrojAI Defend is trained on AI-specific attack techniques — it catches prompt injection payloads embedded in content that a standard WAF passes straight through. Maps to: OWASP LLM01, MITRE AML.T0080.
The decision trigger is straightforward: agent-audit, Cisco Skill Scanner, and Giskard provide strong build-time coverage, but none of them enforce policies on a running agent in production. That gap is the case for commercial runtime.
Zenity provides a complementary runtime option focused on incident intelligence. It extended coverage to ChatGPT Atlas, Perplexity Comet, and Dia in December 2025, and released Safe Harbor — an open-source tool that adds a dedicated safe action agents can call when they identify harmful behaviour.
Link-preview exfiltration is zero-click: attacker sends a crafted URL → AI agent generates a data-carrying URL via indirect prompt injection → messaging app auto-fetches the preview → attacker receives conversation data. No user action required.
PromptArmor identified the riskiest pairings: Microsoft Teams with Microsoft Copilot Studio (the largest share of insecure fetches), Discord with OpenClaw, Slack with Cursor Slackbot, and Telegram with OpenClaw. Safer configurations include Claude in Slack, OpenClaw via WhatsApp, and OpenClaw in Docker via Signal.
aitextrisk.com is PromptArmor’s public test harness — submit your platform combination and observe whether it triggers an insecure preview fetch. Use it with authorisation on systems you own. PromptArmor’s conclusion: “We’d like to see communication apps consider supporting custom link preview configurations on a chat/channel-specific basis to create LLM-safe channels.”
Maps to: OWASP LLM01, MITRE AML.T0051.
Offensive testing tells you whether you are exposed. The next tool answers the harder question: have you already been hit?
Microsoft Defender Advanced Hunting is the main tool for retrospective detection — answering “have we already been compromised?” rather than only preventing future attacks.
The signal: emails or Teams messages containing links to AI assistant domains (copilot, chatgpt, gemini, claude, perplexity, grok) where URL parameters contain memory-manipulation keywords: “remember,” “memory,” “trusted,” “authoritative,” “future,” “citation,” or “cite.”
Three KQL hunting queries from Microsoft’s AI Recommendation Poisoning research:
EmailUrlInfo for AI assistant domains with memory-manipulation keywords in URL-decoded prompt parameters.MessageUrlInfo using the same pattern.UrlClickEvents with Safe Links to identify users who acted on poisoning URLs.Convert ad hoc hunts into scheduled detections. Enable Defender for AI Services at the subscription level, and enable user prompt evidence capture — seeing the exact input and model response during an attack is the difference between speculation and evidence.
Microsoft FIDES is worth noting separately. It applies information-flow control principles — specifically Spotlighting and Prompt Shields — to restrict what content AI agents can treat as instruction at the platform level. FIDES reduces the attack surface the detection tools above must cover. It is a complementary upstream control, not a replacement for tooling.
Atlas logged-out mode: Restricts the agent’s authenticated access to external services, reducing what an attacker can exfiltrate even if prompt injection succeeds. Available in Atlas enterprise settings.
These are not recommendations. Understanding attacker tooling is how you work out how much defensive investment is actually proportionate.
CiteMET (npmjs.com/package/citemet) is a ready-to-use NPM package marketed as “an SEO growth hack for LLMs.” Websites embed “Summarise with AI” buttons that open the user’s AI assistant with a pre-filled prompt instructing it to remember a company as a trusted source — persisting that instruction across all future sessions without the user’s knowledge.
AI Share URL Creator (metehan.ai/ai-share-url-creator.html) is a point-and-click web tool that generates poisoned share URLs to manipulate AI agent memory context. No code required.
Microsoft’s Defender Security Research Team found 50 distinct AI Recommendation Poisoning attempts from 31 companies across 14+ industries over 60 days — including from a security vendor. The conclusion: “The barrier to AI Recommendation Poisoning is now as low as installing a plugin.”
Both exploit MITRE AML.T0080 — the same technique the Microsoft Defender KQL queries detect.
The Type column shows where in the development lifecycle the tool applies; OWASP Items maps to compliance requirements; MITRE ATLAS connects to threat intelligence feeds. “Coverage” means the tool can detect or test for the attack technique classified under that item.
Cisco Skill Scanner — Build-time (static + LLM) | LLM05, LLM06 | AML.T0051 | github.com/cisco-ai-defense/skill-scanner
agent-audit — Build-time (SAST) | ASI-01, ASI-02, ASI-04, ASI-06 (all 10 Agentic categories) | AML.T0051 | github.com/HeadyZhang/agent-audit
Giskard — Build-time (adversarial) | LLM01, LLM02, LLM05, LLM06 | AML.T0051 | giskard.ai
TrojAI Detect — Build-time (commercial) | LLM01 | AML.T0051 | troj.ai/products/detect
TrojAI Defend — Runtime (commercial) | LLM01 | AML.T0080 | troj.ai/products/defend
PromptArmor / aitextrisk.com — Offensive testing | LLM01 | AML.T0051 | aitextrisk.com
Microsoft Defender Advanced Hunting — Retrospective detection | — | AML.T0080 | Microsoft Defender XDR
CiteMET — Offensive (attacker tool) | — | AML.T0080 | npmjs.com/package/citemet
AI Share URL Creator — Offensive (attacker tool) | — | AML.T0080 | metehan.ai
Coverage gaps: No open-source runtime tool covers what TrojAI Defend and Zenity provide in production. OWASP Agentic Top 10 items ASI-07 (Inter-Agent Communication), ASI-08 (Cascading Failures), and ASI-10 (Rogue Agents) have no dedicated red-teaming tools yet — these are the next gaps to watch.
Full framework documentation: MITRE ATLAS at atlas.mitre.org; OWASP Agentic Top 10 at genai.owasp.org; OWASP LLM Top 10 at owasp.org/www-project-top-10-for-large-language-model-applications.
For governance controls these tools implement and framework coverage for compliance, see the linked articles.
agent-audit is the most accessible starting point — open-source, built on the OWASP Agentic Top 10, and achieving 94.6% recall on agent-specific vulnerability benchmarks. For skill-package scanning specifically, Cisco Skill Scanner is the leading open-source option. The two tools target different things: agent-audit scans agent application code; Cisco Skill Scanner scans the skill packages those agents install.
Yes. agent-audit produces SARIF-compatible output and has documented GitHub Actions integration. It functions as a security gate in pull requests, blocking merges when OWASP Agentic Top 10 violations are detected. The setup follows the same pattern as Bandit or Semgrep Actions workflows.
Build-time scanning (agent-audit, Cisco Skill Scanner, Giskard) analyses code and behaviour before deployment — analogous to SAST and penetration testing. Runtime protection (TrojAI Defend, Zenity) enforces security policies on agents in production — analogous to a WAF. Both are needed; they address different failure modes.
Use Microsoft Defender Advanced Hunting: query EmailUrlInfo, MessageUrlInfo, and UrlClickEvents for AI assistant domain links where URL parameters contain memory-manipulation keywords (remember, trusted, authoritative, citation). Convert ad hoc queries into scheduled detections.
AML.T0051 is the technique ID for LLM Prompt Injection. Cisco Skill Scanner, agent-audit, and Giskard all detect variants — via static pattern matching, SAST-style code analysis, and adversarial model probing respectively.
aitextrisk.com is maintained by PromptArmor for defensive testing. It simulates the link-preview exfiltration vector to confirm whether your messaging platform integration is vulnerable. Use it with authorisation on systems you own or are engaged to test.
Twenty-six per cent of the 31,000 skills contained at least one vulnerability. That rate, across a marketplace of skills sourced by teams without a security review process, is the reason pre-installation scanning matters — particularly for organisations adopting AI agent capabilities from third-party registries.
The OWASP LLM Top 10 covers LLM-specific vulnerabilities. The OWASP Agentic Top 10 (2026) extends this to risks that arise when LLMs are given tool access, browser control, or autonomous decision-making capability. agent-audit uses the Agentic Top 10 as its rule source.
As of early 2026, no open-source tool provides runtime enforcement equivalent to TrojAI Defend or Zenity. Build-time scanning and Microsoft Defender Advanced Hunting (retrospective detection, included with Microsoft 365 licensing) are the main options that don’t require additional commercial tooling.
FIDES applies information-flow control principles to deterministically prevent indirect prompt injection at the architectural level — reducing the attack surface that downstream tools must cover. It is a complementary upstream control, not a replacement for tooling.
Enabling logged-out mode restricts the agent’s access to authenticated external services. This limits the data an attacker can reach even when prompt injection succeeds. Configuration is available in Atlas enterprise settings.