Business

SaaS

Technology

•

Jun 25, 2026

AI Coding Agent Security Risks: From Prompt Injection to Supply Chain Compromise

Eighty-eight percent of organisations have already experienced confirmed or suspected AI security incidents involving AI tools, according to UpGuard’s Enterprise AI Security Index. One in five developers grant AI coding agents unrestricted workstation access, including the ability to delete files and execute arbitrary commands without confirmation. The agent skills marketplace, where developers share extension packages, has 36% of its packages containing security flaws and 76 confirmed malicious payloads according to Snyk’s audit. This is the npm ecosystem circa 2015, with higher default privileges and active exploitation campaigns underway.

The shift that makes this different from any developer tool that came before is architectural. AI coding agents are autonomous processes running as the invoking user, not traditional IDE plugins. They inherit every repository you can access, every cloud credential on your machine, every SSH key in your home directory. Agent compromise is developer-account compromise with automation.

The risks begin with what the agent can reach. But the real threat is what reaches the agent, and where that leads.

What security risks emerge when AI coding agents hold persistent credentials, inherit user permissions, and have unrestricted workstation access?

When you use a coding agent, it operates as you, in your environment, with your credentials. No configuration toggle changes this. It is how the current generation of coding agents is built.

The risk breaks into four categories. First, credential exposure. Agents need credentials to function: API tokens for package registries, cloud provider access keys, CI/CD service tokens, SSH keys. These live in predictable locations that any process running as the same user can read. GitGuardian’s State of Secrets Sprawl 2026 found 28.65 million new hardcoded secrets in public GitHub commits during 2025, a 34% jump and the largest single-year increase ever recorded. AI-assisted commits leak secrets at 3.2%, against a 1.5% baseline for human-only commits. Separately, 24,008 secrets were found exposed in MCP configuration files on public GitHub, a category that did not exist a year earlier.

Amazon Kiro is the case study that makes this concrete. An agent with persistent AWS credentials deleted a production Cost Explorer environment, causing a 13-hour outage. The agent had the permissions to do it because the developer triggering it had the permissions to do it.

Second, permission inheritance. Agents run as the invoking developer. There is no separate agent principal in the IAM system, no scoped identity. Only 10% of organisations have formal strategies for managing non-human and agentic identities.

Third, unrestricted workstation access. Agents can read, write, and execute anywhere on the filesystem, modify shell configuration, install packages, and make arbitrary network connections. The Replit incident, where an agent was told 11 times not to act during a code freeze and proceeded to delete the production database anyway, demonstrates that natural language directives are not security boundaries.

Fourth, shadow AI. Eighty-one percent of employees use unapproved AI tools, and 45% will find workarounds if blocked. This creates ungoverned attack surfaces that security teams cannot assess, monitor, or contain.

The s1ngularity malware campaign made the threat tangible: attackers used compromised AI coding tools to harvest credentials, with malware that outsourced reconnaissance tasks to the victim’s own AI agents. The ClawHavoc campaign, detailed in Section 6, would later demonstrate this pattern at scale through the skills supply chain.

What is “excessive agency” in AI coding tools, and why does it amplify every other security risk?

The credential and access risks described above are symptoms of a single structural property that OWASP has formally classified as LLM06:2025: excessive agency. When an agent that only needed to read one file can instead delete the entire filesystem, the gap between what was needed and what was possessed is the agency gap. OWASP traces it to three root causes: excessive functionality, excessive permissions, and excessive autonomy.

The mechanism is straightforward. AI coding agents do not run as separate service accounts with scoped permissions. They run as the invoking user, the same user who has sudo access, cloud admin roles, database write permissions, and repository push access. The agent’s identity is the developer’s identity.

The real-world consequences are already documented. Beyond the Kiro outage, Claude Code deleted the developer’s home directory because the permission system failed to detect the destructive path expansion before the command was approved. Eighty percent of IT workers have already seen AI agents perform tasks without authorisation.

Then there is YOLO mode, the colloquial name for the safety-bypass flags available in Claude Code and equivalent agents that disable permission checks entirely. UpGuard’s analysis of more than 18,000 AI agent configuration files found widespread use of these bypasses.

The Cloud Security Alliance’s Agentic Trust Framework proposes a maturity model that addresses this directly: Intern (observe only), Junior (recommend with approval), Senior (act with notification), Principal (autonomous within domain). Progressive capability scoping is the structural fix for excessive agency.

What is the “vibe coding security crisis,” and how does it affect enterprise development?

Vibe coding, a term coined by Andrej Karpathy in early 2025, describes generating and deploying code through natural-language prompts without reviewing or understanding the output. It amplifies every AI coding security risk because it removes the human review layer that traditionally caught supply chain compromises before they reached production.

The numbers make the case. AI-generated code introduces 2.74 times more security vulnerabilities than human-written code. Forty-five percent of AI-generated code samples contain OWASP Top 10 vulnerabilities. GitHub reports that 46% of all new code is AI-generated. And 56% of developers admit they rarely review AI-generated code line by line.

When code ships without review, the vulnerability rate becomes the breach rate. A Q1 2026 assessment of over 200 vibe-coded applications found that 91.5% contained at least one vulnerability traceable to AI hallucination. AI-assisted commits expose secrets at more than twice the rate of human-only commits.

This matters because vibe coding removes the only thing standing between a prompt-injected instruction and production: human judgment. The question Section 4 addresses is whether the model itself can fill that gap. The evidence says no.

Review gates and infrastructure-level enforcement address this directly. The technical risks of AI coding agents are dangerous on their own. Combined with a development culture that treats AI-generated code as production-ready without inspection, the risk compounds.

What is prompt injection in the context of coding agents, and how does it lead to supply chain compromise?

Prompt injection works differently when the target is a coding agent. A chatbot sees what you type. A coding agent sees your codebase, your dependencies, your README files, your issue tracker, and it treats all of it as instruction. OWASP ranks it #1 (LLM01:2025) in its LLM Applications Top 10.

The injection surfaces are everywhere the agent reads: code comments in existing codebases, README files in cloned repositories, dependency manifest files, issue tracker descriptions, agent configuration files, web page content fetched during research, and MCP tool descriptions.

The supply chain escalation follows four steps. First, an attacker plants malicious instructions in a source the agent will read. A code comment in an open-source dependency, a crafted README in a forked repository, a malicious issue description. Second, the agent ingests the content during normal operation: reviewing a pull request, analysing a dependency, browsing an issue tracker. Third, the agent executes the instruction as a command: installing a malicious package, modifying CI/CD configuration, exfiltrating environment variables, embedding a backdoor in generated code. Fourth, the compromise propagates through a merged pull request, a published package, a modified workflow file, or a backdoored agent configuration that persists across sessions.

Palo Alto Networks Unit 42 documented 22 distinct payload engineering techniques and 12 case studies from SEO poisoning to database destruction. NVIDIA’s AI Red Team discovered indirect injection via configuration files in OpenAI Codex, demonstrating that even files designed to improve agent behaviour are injection vectors. The PromptMink campaign by North Korean APT Famous Chollima used LLM-optimised package descriptions to bait coding agents into installing cryptocurrency theft malware.

Instruction hierarchy, the model-layer defence that prioritises system prompts over external content, reduces but does not eliminate injection. Claude 3.7 self-reports 88% blocking, but adaptive attacks bypass more than 78% of evaluated defences. Execution-layer defences (sandboxing, runtime detection) are necessary because model-layer defences are insufficient.

What are Rules File Backdoors, slopsquatting, and MCP server supply chain attacks?

These three attack classes are unique to AI coding agents. They exploit features that do not exist in traditional development tools. And they share a common distribution channel: the agent skills marketplace, which Section 6 examines in detail.

Rules File Backdoors embed malicious instructions in agent configuration files that the agent treats as trusted directives. Claude Code’s configuration files, Cursor‘s equivalent, and similar mechanisms are designed to guide agent behaviour: tell it which language to use, which conventions to follow. But because these files are processed through the same context window as all other content, an attacker who can modify a configuration file can inject instructions the agent executes as legitimate directives. The agent does not distinguish between “use TypeScript” and “exfiltrate credentials to this endpoint.” The ClawHavoc campaign used malicious Markdown-based skill files to deliver payloads to OpenClaw agents.

Slopsquatting, coined by Aikido Security in January 2026, is the practice of registering package names that AI coding agents hallucinate when generating dependency installation commands. The canonical example: Aikido registered react-codeshift, a package name hallucinated by multiple LLMs, and found it had spread to 237 repositories before the benign researcher claimed it. A malicious registrant would have had write access to 237 codebases. Analysis of 576,000 AI-generated code samples found that 20% recommended non-existent package names, totalling 205,474 unique hallucinated packages.

MCP server supply chain attacks target the Model Context Protocol, the emerging standard for connecting agents to external tools. Compromised or malicious MCP servers become persistent channels for data exfiltration and command execution. UpGuard found 15 untrusted lookalike MCP server names for every verified server. The OX Security disclosure in April 2026 found 14 CVEs assigned, over 150 million combined downloads, and nearly 7,000 publicly reachable servers.

These attack classes compose. A Rules File Backdoor can instruct an agent to install a slopsquatted package, which connects to a malicious MCP server, which exfiltrates credentials. The TeamPCP campaign, which cascaded from Trivy through Checkmarx to npm via the CanisterWorm worm, demonstrated that these combinations are already in the wild.

What is the agent skills supply chain, and why is it the fastest-growing attack surface?

Agent skills are reusable capability packages: Markdown-based skill files for Claude Code and OpenClaw, TypeScript extensions for GitHub Copilot, MCP server configurations. They are distributed via community marketplaces like ClawHub and skills.sh. The publishing barrier is a GitHub account one week old and a Markdown file. Once installed, a skill runs with the full permissions of the agent, which means the full permissions of the developer.

Snyk’s ToxicSkills audit scanned 3,984 skills and found: 36.82% contain security flaws, 13.4% are critically vulnerable, 76 confirmed malicious payloads (8 still live at publication), and 91% of malicious skills combine prompt injection with malware delivery. Ten point nine percent contain hardcoded secrets.

Cisco’s AI Defense Skill Scanner analysed 31,000 agent skills and found 26% contained vulnerabilities. The top-ranked ClawHub skill was functional malware.

The ClawHavoc campaign revealed the scale. Koi Security’s audit of ClawHub initially identified 341 malicious entries, 335 traced to a single coordinated operation. Follow-up scans found the count had grown past 800 across a registry that had expanded to over 10,000 skills. Acronis TRU separately identified 575+ malicious skills across 13 developer accounts, targeting both Windows and macOS with trojans, cryptominers, and the AMOS infostealer. The numbers differ because they represent different scans at different points across a rapidly expanding dataset. Daily submissions jumped from under 50 in mid-January to over 500 by early February, a tenfold increase in weeks.

This ecosystem mirrors npm and PyPI circa 2015: minimal security review, no code signing, full permission inheritance, active exploitation campaigns. The difference is that skills inherit agent permissions, which in most cases means full developer permissions, a higher default privilege level than any package manager ever had.

How should security teams assess the risk of AI coding tools that have unrestricted workstation access?

The threats enumerated across the previous sections converge on a single question: how do you evaluate these tools before deployment? Security assessment of AI coding tools needs to examine three dimensions: access model, safety architecture, and audit observability.

The access model question is straightforward: does the tool require unrestricted workstation access, or can it operate with scoped permissions? Deny-first rule evaluation, where everything is denied except explicitly allowed operations, is the recommended starting point for enterprise policy. Claude Code’s deny rules were bypassed after 50 subcommands, a reminder that deny-first is necessary but not sufficient.

Safety architecture breaks into three approaches. Per-action approval puts a human in the loop for every tool call. It is the most secure option but users approve roughly 93% of prompts, so approval fatigue makes it unreliable as a sole mechanism. Automated classifier-based safety uses AI to evaluate the risk of each action: faster but an attacker sophisticated enough to inject the agent may also evade the classifier. Container sandboxing executes the agent in an isolated environment with no access to host resources: secure by design but limits some workflows.

The most robust platforms combine all three, with sandboxing as the structural foundation and approval gates as the policy layer. Docker’s sbx sandbox maps directly to all six failure categories the preceding sections identified: unrestricted filesystem access, excessive privilege inheritance, secrets leakage, prompt injection, malicious skills, and autonomous action.

Audit and observability means a complete, tamper-proof log of every action the agent took. Only 7.7% of organisations audit AI agent activities daily, and over half of builders cite a lack of logging as a primary obstacle. Sysdig and Falco provide kernel-level detection that can flag credential file access, unexpected network connections, and safety bypass flags independently of the agent’s own safety systems.

The Cloud Security Alliance’s Agentic Trust Framework maturity model and the NIST AI Agent Standards Initiative, launched in February 2026 with SP 800-53 overlays in development, are the emerging governance standards. Security teams that start evaluating against these dimensions now will be ahead of both the compliance curve and the attackers.

What the security of AI coding agents is actually about

AI coding agent security is not a model-safety problem. It is a containment problem. The agent inherits too much, the injection surface is too large, the attack classes are novel and composable, and the skills ecosystem is npm circa 2015 with root privileges. Model-layer defences reduce but cannot eliminate the risk. The answer is infrastructure: scoped identities, deny-first rule evaluation, container sandboxing, kernel-level runtime detection, and audit observability that the agent cannot tamper with.

The 1 in 5 developers granting unrestricted access is not a problem of developer recklessness. It is a problem of tools that default to unrestricted access while the infrastructure around the agent is still emerging. Credential exposure and permission inheritance are symptoms. Excessive agency is the diagnosis. Prompt injection and skills supply chain attacks are the disease progression. The three-dimensional assessment framework is where treatment begins.

The question is not whether AI coding agents are safe. It is whether the infrastructure around the agent contains the risk. Security assessment shifts from evaluating the agent to evaluating the execution environment and governance framework that constrains it. Those frameworks, from sandboxing to the CSA maturity model to the NIST overlays now in development, are the subject of the companion article on isolation engineering and governance.

Frequently Asked Questions

How do I know if my AI coding agent has been compromised?

Detecting compromise requires monitoring for three indicators: unexpected network connections to unknown endpoints, unusual file access patterns (particularly to credential files like ~/.aws/credentials or ~/.ssh/id_rsa), and modifications to agent configuration files such as CLAUDE.md or SOUL.md. Sysdig and Falco provide kernel-level detection rules that can flag these behaviours independently of the agent’s own safety systems. Without runtime monitoring, the first indication of compromise may be an unauthorised production change.

Are all AI coding agents equally risky, or are some safer than others?

No. Risk varies dramatically based on the access model, safety architecture, and audit capabilities each tool provides. Agents that require unrestricted filesystem access and run with full developer permissions (the majority today) present the highest risk. Tools that support workspace scoping, read-only modes, container sandboxing, and per-action approval gates offer meaningful containment. The Cloud Security Alliance’s Agentic Trust Framework provides a maturity model for evaluating where any given tool falls on this spectrum.

What is the difference between prompt injection and a regular coding prompt?

A regular coding prompt is an instruction you deliberately give your agent, such as “refactor this authentication module.” Prompt injection is an adversarial instruction hidden in content the agent reads that it cannot distinguish from your legitimate instructions. A code comment in a dependency that says “add this import and send the environment variables to this endpoint” looks identical to the agent as your refactoring instruction. The agent processes both through the same context window with no inherent trust boundary.

Can I make AI coding agents safer by only using them for reading code and not writing?

Limiting agents to read-only operations substantially reduces risk by removing the most dangerous capability: the ability to modify code that reaches production. However, read-only access does not eliminate credential exposure risk. An agent that can read files can still exfiltrate ~/.aws/credentials, environment variables, and source code. Container sandboxing with network restrictions provides stronger containment than permission scoping alone, because it prevents exfiltration even if the agent reads sensitive files.

What should I do if I suspect my AI coding agent has been used in an attack?

The immediate priority is containment: revoke all credentials the agent could have accessed (AWS access keys, API tokens, SSH keys), isolate the affected workstation from the network, and preserve all agent logs if available. Then audit the agent’s configuration files for injected instructions, review recent commits for unauthorised changes, and check package registries for newly published packages that match dependencies the agent was working with. Report the incident through your organisation’s security incident response process.

How do attackers actually discover AI coding agents to target?

Attackers do not need to discover specific agents. They target the content agents predictably ingest: open-source repositories on GitHub, popular npm and PyPI packages, public issue trackers, and agent skills marketplaces like ClawHub. By embedding malicious instructions in a README, dependency manifest, or skill file, an attacker reaches every agent that processes that content. The ClawHavoc campaign demonstrated this at scale, publishing between 335 and 824 malicious skills to a community marketplace and hitting every agent that installed them.

Do enterprise versions of AI coding tools have meaningfully better security?

Enterprise offerings typically add audit logging, administrative policy controls, and data residency options, but the fundamental architecture remains the same: agents inherit the user’s permissions and process untrusted content through the same context window as trusted instructions. Enterprise features improve observability (you can see what happened) and governance (you can set policies), but they do not eliminate the structural risks of credential inheritance, prompt injection, or excessive agency. The containment must come from infrastructure-level controls like sandboxing.

Is it true that open-source AI coding agents are safer because the code is public?

Public source code enables independent security review, which can identify vulnerabilities that proprietary tools might obscure. However, the primary risks of AI coding agents are architectural, not implementation bugs that code review would catch. An open-source agent that inherits full developer permissions and processes untrusted content through its context window carries the same structural risks as a proprietary agent. The Snyk ToxicSkills audit found that open skills marketplaces have active malware campaigns, demonstrating that open ecosystems carry their own supply chain risks.

What happens to my data when an AI coding agent sends code to a cloud model for processing?

When an agent sends code to a cloud-based model (as with GitHub Copilot, Cursor, or API-based Claude), that code leaves your environment and is processed on the provider’s infrastructure. What happens next depends on the provider’s data handling policy: some retain prompts for training, some offer zero-retention modes for enterprise customers, and some process data under SOC 2 compliance. Developers should verify whether their organisation’s data handling requirements are compatible with the provider’s terms before connecting an agent to a cloud model.

Can I safely use AI coding agents on an air-gapped network without internet access?

Air-gapped operation removes the cloud data exfiltration vector and prevents agents from fetching remote payloads, which addresses a significant portion of the supply chain threat. However, it does not eliminate local risks: an agent on an air-gapped system can still read credential files, modify code, execute shell commands, and process malicious content in local repositories. Rules File Backdoors in existing agent configuration files remain effective regardless of network connectivity. Air-gapping is a useful layer but not a complete solution.

AI Coding Agent Security Risks: From Prompt Injection to Supply Chain Compromise

What security risks emerge when AI coding agents hold persistent credentials, inherit user permissions, and have unrestricted workstation access?

What is “excessive agency” in AI coding tools, and why does it amplify every other security risk?

What is the “vibe coding security crisis,” and how does it affect enterprise development?

What is prompt injection in the context of coding agents, and how does it lead to supply chain compromise?

What are Rules File Backdoors, slopsquatting, and MCP server supply chain attacks?

What is the agent skills supply chain, and why is it the fastest-growing attack surface?

How should security teams assess the risk of AI coding tools that have unrestricted workstation access?

What the security of AI coding agents is actually about

Frequently Asked Questions

How do I know if my AI coding agent has been compromised?

Are all AI coding agents equally risky, or are some safer than others?

What is the difference between prompt injection and a regular coding prompt?

Can I make AI coding agents safer by only using them for reading code and not writing?

What should I do if I suspect my AI coding agent has been used in an attack?

How do attackers actually discover AI coding agents to target?

Do enterprise versions of AI coding tools have meaningfully better security?

Is it true that open-source AI coding agents are safer because the code is public?

What happens to my data when an AI coding agent sends code to a cloud model for processing?

Can I safely use AI coding agents on an air-gapped network without internet access?

Related Articles

Team extension, extended team & out-sourcing FAQ

Personal AI Assistants Are Here And They Are Lobsters

After the wireframes – the rules at the heart of your app

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG