In March 2026, the Bank for International Settlements issued a formal warning about AI infrastructure financing. Their Quarterly Review — written by Egemen Eren, Ingomar Krohn, and Karamfil Todorov — found that hyperscalers had quietly moved more than $120 billion in AI compute debt off their balance sheets in about 18 months. They did it using structured financial vehicles designed to keep that debt out of the headline numbers investors and counterparties actually look at.
If you’re an enterprise signing multi-year AI cloud compute contracts, this is not some abstract macro-financial story you can ignore. It’s a counterparty risk question: what happens to your AI workloads if your compute provider runs into serious financial trouble? This article translates what the BIS found into practical terms — what an SPV is, why residual value guarantees matter, and what five questions you should be asking before you commit to a long-term compute contract. The warning exists because the AI infrastructure arms race has pushed capital deployment well beyond what traditional on-balance-sheet financing can absorb.
The BIS coined the term “shadow borrowing” to describe financial obligations that function like debt but don’t show up on corporate balance sheets. It’s not accounting fraud — it’s the deliberate, legal use of off-balance-sheet structures that, when you add them all up, represent substantial hidden financial exposure.
The March 2026 Quarterly Review identified nine risk categories, including GPU collateral depreciation risk, private credit concentration, and contagion via securitisation. The headline finding: more than $120 billion in AI compute debt moved off hyperscaler balance sheets in roughly 18 months. That’s a structural shift in how AI infrastructure gets financed — not a handful of isolated deals.
Two other institutions have lined up alongside the BIS. The Chicago Fed found that banks’ AI-adjacent exposure averaged 0.8% of total assets through indirect private credit channels. And in January 2026, four U.S. Senators wrote formally to flag “complex and opaque debt markets” and the risk of “a broader financial crisis.” Three major institutions raising similar concerns in six months is a pretty reliable signal that regulatory action is coming. The question for you isn’t whether regulators will act — it’s what effect that will have on compute pricing and provider stability when they do. The AI infrastructure spending environment encompassing $725 billion in capex is what makes structured finance unavoidable: there’s simply too much capital at stake for standard on-balance-sheet borrowing to handle.
A Special Purpose Vehicle (SPV) is a legally separate, bankruptcy-remote entity set up to hold one asset and raise one pool of debt. The “bankruptcy-remote” part is the key bit: if the SPV fails, the failure is — in theory — contained, and doesn’t automatically pull the parent company into default with it.
In AI infrastructure, it works like this. A hyperscaler creates or co-owns an SPV that builds a data centre. The SPV raises debt from private credit investors. The hyperscaler then leases back compute capacity from the SPV, turning capital expenditure into operating expense. The capex stays off the balance sheet; the compute access stays on.
Meta’s Hyperion data centre in Louisiana is the clearest example of this in practice. It’s a $30 billion project financed through the Beignet Investor SPV — 80% owned by Blue Owl Capital, 20% by Meta — with private credit investors including Apollo, Pimco, BlackRock, and Morgan Stanley holding the SPV debt. Meta signed a compute offtake agreement as the SPV’s revenue stream.
Here’s the disclosure that makes this a BIS-level concern: Meta’s residual value guarantee (RVG) of up to $28 billion appears only in the footnotes of its financial statements. That’s a $28 billion contingent liability that activates if GPU assets held by the SPV depreciate below a threshold — something that could happen fast if a new GPU generation triggers a sudden market-wide repricing. Oracle’s Stargate SPVs follow a similar pattern, with Oracle issuing $18 billion in bonds in a single day as part of its Stargate financing programme. Blue Owl’s $7 billion digital infrastructure fund, which holds positions in both structures, includes capital from New York and Pennsylvania state pension funds. The risk has already left hyperscaler balance sheets and landed with institutional investors. The same structured finance logic underpins the Amazon-Anthropic and Google-Anthropic investment commitments — large-scale capital deployments that reshape compute access while moving financial exposure through structured vehicles rather than direct balance-sheet borrowing.
GPU-collateralised lending is structured finance where GPU clusters serve as the collateral — a lender provides capital against the assessed value of a GPU fleet, expecting to recover its money from compute revenue or, if things go wrong, by selling the GPUs.
The critical risk is depreciation. GPUs have an estimated useful life of 2.5 to 3 years — which means the asset securing a five-year loan is worth 60–70% less by mid-term. That creates what’s sometimes called an impairment cliff: depreciation looks smooth on paper until a new GPU generation launches and the market suddenly reprices the older generation downward, all at once.
CoreWeave‘s $7.5 billion GPU-collateralised debt facility, arranged by Blackstone, is structured as GPU-backed Asset-Backed Securities (ABS). Fitch rated CoreWeave Compute Acquisition Co. VIII loans at A-sf — an investment-grade rating whose quality depends entirely on GPU residual value assumptions holding. If GPU market prices drop sharply, the collateral value falls below debt thresholds and you’re looking at margin call risk.
Why does any of this matter to you as an enterprise buyer? Because if your production AI workloads run on CoreWeave or a similar neocloud, you have counterparty exposure to the CoreWeave GPU-collateralised debt structure sitting underneath your compute access. Neoclouds have thinner financial reserves than hyperscalers and no diversified revenue to buffer against GPU market stress. That stress translates directly into neocloud financial stress in a way it simply doesn’t for a hyperscaler.
A take-or-pay contract — called a compute offtake agreement in AI infrastructure — is a commitment to pay for a defined volume of compute capacity whether or not you actually use it. For the SPV or neocloud, it’s the revenue stream servicing its debt. For you, it’s a fixed cost regardless of actual consumption.
The buyer-side risk that most coverage doesn’t address: what actually happens to your enterprise AI workloads when a compute provider hits financial stress? Here are three scenarios, based on how analogous distress events have played out in energy and telecoms infrastructure:
Scenario A — Managed distress: The provider seeks refinancing or a buyer. Workloads continue, but SLAs may degrade and pricing terms may be renegotiated under duress.
Scenario B — Disorderly failure: Lenders exercise step-in rights over data centre assets. Whether your compute access continues depends on the lender’s willingness to maintain service continuity — which is not guaranteed unless you’ve explicitly negotiated it into the contract.
Scenario C — Bankruptcy: Your data and workloads are frozen pending court proceedings. Non-portable fine-tuned models and proprietary data pipelines may be inaccessible for weeks or months.
The trap runs both ways. If you try to exit a multi-year compute contract to reduce counterparty risk, you may face take-or-pay litigation — paying for capacity you’re no longer using because the provider hasn’t technically failed yet. The variable that determines how bad Scenarios B and C get is workload portability: how easily your models, data pipelines, and inference endpoints can move to another provider. Low-portability workloads amplify every distress scenario.
The securitisation cascade is how AI infrastructure debt, originated at the SPV level, gets repackaged into rated securities and distributed through the financial system until it reaches pension funds — and ultimately retail savers.
The chain goes like this: hyperscaler or neocloud creates SPV → SPV raises private credit debt → debt is securitised into ABS (Asset-Backed Securities) or CMBS (Commercial Mortgage-Backed Securities) → ABS and CMBS are sold to insurance companies, pension funds, and private credit funds → pension exposure reaches retail savers.
Blue Owl’s $7 billion digital infrastructure fund, holding positions in both Meta Hyperion and Oracle Stargate SPVs, includes capital from New York and Pennsylvania state pension funds. Morgan Stanley projects $800 billion in private credit data centre financing over two years — and at that scale, securitisation distributes AI infrastructure risk across the entire financial system. The $7 trillion scale of potential systemic exposure is what the BIS is treating as a financial stability concern, not just a sector financing story.
For enterprises, the securitisation cascade creates an indirect regulatory risk. The breadth of institutional exposure creates political pressure for regulatory intervention. If SPV disclosure requirements tighten, the cost and structure of hyperscaler AI financing changes — and that flows through to compute pricing and provider financial health.
The BIS warning is not a reason to avoid multi-year compute contracts. It’s a reason to negotiate them more carefully. These questions work in an RFP or in contract negotiations. Neoclouds are more likely to carry SPV and ABS structures, but the Meta Hyperion example shows the questions are relevant at the hyperscaler tier too — both are turning to structured finance to fund capacity growth in the broader $725 billion AI infrastructure spending environment.
Question 1: Is this compute capacity owned on your balance sheet, or financed through an SPV or third-party structure? If the vendor can’t answer clearly, treat that as a risk signal. Opacity about financing structure is itself a counterparty risk indicator.
Question 2: Does your capacity agreement contain take-or-pay obligations — and what happens to those if you enter financial distress or are acquired? You need to know whether the payment obligation survives a restructuring or change of control, and whether you could end up paying a new operator for capacity you didn’t choose.
Question 3: What step-in and workload continuity provisions apply if a lender acquires your data centre assets? Get explicit contractual commitment that any step-in party must honour your service agreement for a defined continuity period — 90 to 180 days is a reasonable starting position.
Question 4: Can I migrate my models, data pipelines, and inference endpoints to another provider within 30 days? Include migration assistance as a contractual obligation, not a verbal assurance. Test portability with a small-scale migration exercise before you sign anything.
Question 5: Are your GPU assets subject to residual value guarantees or ABS issuance, and can you disclose those terms? A new GPU generation launch could simultaneously collapse your vendor’s collateral value and trigger a crisis in their structured finance programme — with your production workloads right in the middle of it.
These five questions won’t eliminate counterparty risk, but they will surface it before you sign rather than after. For the full picture of how structured finance fits into the $725 billion AI infrastructure buildout — from neocloud compute economics to sovereign geography — see our AI infrastructure arms race resource hub.
The BIS March 2026 Quarterly Review — by Egemen Eren, Ingomar Krohn, and Karamfil Todorov — identified AI infrastructure financing as a systemic risk concern, coining “shadow borrowing” for debt-equivalent obligations moved off balance sheets via structured vehicles. The key finding: more than $120 billion in AI compute debt was moved off hyperscaler balance sheets in approximately 18 months. The report flagged nine risk categories including GPU collateral depreciation, private credit concentration, and ABS/CMBS contagion channels. The BIS — the Bank for International Settlements — is a Switzerland-based institution that serves as a bank for central banks and whose research carries significant weight with financial regulators worldwide.
A Special Purpose Vehicle is a legally separate, bankruptcy-remote entity created to own one asset and raise one pool of debt. Tech companies use SPVs for AI infrastructure to convert large capital expenditures into multi-year operating expenses, keeping the associated debt off the corporate balance sheet while retaining compute access. SPVs are legal and widely used in infrastructure, real estate, and energy financing. The BIS concern is about their scale and opacity in AI, not their existence.
Shadow borrowing is the BIS-coined term for financial obligations that function economically like debt but don’t appear on corporate balance sheets. In AI infrastructure, it arises from SPV leases, compute offtake guarantees, and residual value guarantees that add up to debt-equivalent obligations only partially visible in standard financial statements. “Shadow” doesn’t mean illegal — it means obscured from standard balance sheet analysis.
A residual value guarantee is a commitment by a hyperscaler or neocloud to guarantee the minimum resale value of GPU assets to an SPV’s lenders if those assets depreciate below a threshold. Meta’s RVG of up to $28 billion on the Hyperion deal appeared only in footnotes of Meta’s financial statements — not in headline debt figures. It represents a contingent liability that could be triggered by a GPU market price collapse, which itself could be triggered by a new GPU generation launch.
A neocloud is a GPU-focused cloud provider — CoreWeave and Lambda Labs are examples — specialising in AI compute, typically with GPU-backed structured finance and thinner financial reserves than hyperscalers. Neoclouds are more exposed to GPU market stress because AI compute is their entire business. Hyperscalers have diversified revenue — cloud services, advertising, software — that buffers against GPU collateral value declines. That doesn’t make hyperscalers risk-free, but their financial depth provides a larger buffer before enterprise service continuity is threatened.
Workload portability is the degree to which your enterprise AI workloads — models, data pipelines, inference endpoints — can be migrated from one compute provider to another without significant technical or commercial cost. Low-portability workloads, such as non-portable fine-tuned models or proprietary pipelines locked to a provider’s tooling, amplify the damage of any provider financial stress event. Test portability with a small-scale migration exercise before signing a long-term compute contract, and include migration assistance obligations in the contract itself.
Yes, under a standard compute offtake agreement. Take-or-pay contracts require payment for committed capacity whether or not it’s used, and the obligation typically survives restructuring, sale, and in some configurations, bankruptcy. You may owe payments to a new operator or lender you didn’t choose. To protect against this, negotiate termination-for-cause provisions that include provider financial distress as a trigger for contract exit without penalty, and have legal counsel review take-or-pay obligations before signing.
Step-in rights are contractual provisions allowing a lender or designated third party to assume operational control of infrastructure assets if the operator defaults. They’re dual-natured: they can protect your compute access if the step-in party is committed to maintaining service, or they can threaten it if that party’s priority is asset recovery. Negotiate explicit language requiring any step-in party to honour existing enterprise service agreements for a defined continuity period — 90 to 180 days is a reasonable starting position.
How Akamai’s $1.8 Billion AI Deal Reveals a Third Path Beyond Hyperscalers and NeocloudsOn 7 May 2026, Akamai’s stock jumped 26% in a single session — its best single-day performance in 22 years. The trigger was a $1.8 billion AI infrastructure commitment. That kind of market reaction doesn’t come from a revenue upgrade. It comes from investors fundamentally rethinking what a business is.
Most AI infrastructure conversations get stuck in a binary. You pick a hyperscaler — AWS, Azure, or Google Cloud — or you go GPU-first with a neocloud like CoreWeave or Lambda Labs. That is usually the whole menu. The Akamai deal argues there is a third option, and it is different enough from both alternatives to warrant its own decision framework. This article is part of our comprehensive series on the AI infrastructure arms race — where the $725 billion in 2026 capex is going and what it means for computing, finance, and enterprise strategy.
One quick caveat: multiple credible sources report Anthropic as the counterparty. Akamai’s official announcement says only “a leading frontier model provider.” This article treats Anthropic as the reported counterparty — not the confirmed one.
Akamai is the world’s most geographically distributed compute platform: 4,300+ locations, 700 cities, 130 countries. It was built for content delivery, but the geographic logic maps directly onto AI inference.
The product that makes this work is the Akamai Inference Cloud — the industry’s first global-scale implementation of the NVIDIA AI Grid reference architecture. It runs NVIDIA RTX PRO 6000 Blackwell GPUs with BlueField DPUs across three tiers: far-edge (4,400+ locations) for real-time responsiveness, metro edge for scale, and core clusters for large models. An intelligent orchestrator routes each request based on what the workload actually needs.
The metric that matters is time-to-first-token (TTFT).
💡 Time-to-first-token (TTFT) is the AI inference equivalent of page load time: it measures how quickly a model begins responding, which for real-time applications is often more important than total response throughput.
According to a large-scale latency study, edge infrastructure provides lower latency than cloud locations for 92–97% of end-users globally. That gap is the entire argument for edge inference.
Akamai announced a seven-year, $1.8 billion cloud infrastructure commitment — the largest customer deal in company history. Multiple outlets, including TIKR, point to Anthropic as the counterparty. CEO Tom Leighton declined to name the customer in either the earnings call or the CNBC interview that followed.
It’s worth noting that Akamai’s Cloud Infrastructure Services segment had already grown 40% year-on-year in Q1 2026 before this deal landed. Investors weren’t surprised by the trajectory. They were surprised by a frontier AI lab committing at this scale to distributed edge rather than centralised cloud. “I think we’ve been undervalued for a while, and investors have been looking for some real validation that our different approach is going to pay off,” Leighton said.
For more context on the macro AI infrastructure spending picture — including the full Q1 2026 earnings breakdown across all four hyperscalers — see our analysis of what those numbers actually reveal.
Edge inference means running AI model computations at distributed nodes close to end users, rather than routing requests to a centralised data centre and waiting for the round trip.
The alternative is what Akamai’s COO Adam Karon calls the “AI factory” — centralised, GPU-dense clusters built for training. In Karon’s framing, AI factories will keep delivering the best economics for training runs, but real-time and highly concurrent personalised experiences demand inference at the point of contact.
The Akamai AI Grid handles this with workload-aware orchestration — routing each request to the most efficient resource available based on latency and cost. Akamai calls this “tokenomics” — cost-per-token / TTFT optimisation. Nothing to do with cryptocurrency.
Edge inference also leans on semantic caching.
💡 Semantic caching stores inference outputs for queries that are semantically equivalent — even if worded differently — so the model does not need to recompute the same answer, reducing both cost and latency.
And edge inference doesn’t replace centralised compute. The same request pipeline can use edge nodes for real-time responses and core clusters for heavier processing. It’s about routing, not replacing.
Centralised data centres typically introduce 80–200ms round-trip latency for users in Southeast Asia, Latin America, or sub-Saharan Africa. Akamai’s edge network targets sub-50ms TTFT for users within range of its 4,300+ locations. Here is where that matters.
FinTech fraud detection. Real-time fraud models have to respond within the transaction window. Financial fraud detection requires low-latency data streaming to flag suspicious transactions in real time, and a US-hosted model simply cannot meet that SLA for a user in Jakarta or São Paulo.
HealthTech real-time inference. Clinical decision support and remote patient monitoring need low-latency inference at the point of care. Privacy and data sovereignty requirements push toward edge — data that never reaches a centralised cloud reduces compliance surface area significantly.
EdTech personalisation. Adaptive learning systems break down when perceptible delays interrupt the responsiveness that makes personalised tutoring actually work.
Agentic AI workloads. Autonomous agents compound latency across every step. For agentic AI systems coordinating across workflows, each round trip becomes state propagation delay. Build this into your planning from the start.
To be clear: edge inference is wrong for large-scale training, batch processing, and frontier model fine-tuning. All of those belong on centralised infrastructure. Edge inference is for real-time, user-facing workloads where TTFT is a first-class requirement.
Here is the decision framework the Akamai deal makes possible.
Path 1 — Hyperscalers (AWS, Azure, Google Cloud). Broadest service portfolio. Deepest enterprise tooling integration. Established compliance certifications. They carry the same centralised latency constraints as neoclouds. Best fit: teams where service breadth matters more than inference latency.
Path 2 — Neoclouds (CoreWeave, IREN, Lambda Labs). GPU-first infrastructure, no distractions.
💡 Neoclouds are a category that emerged in late 2024 for GPU-specialised cloud providers like CoreWeave and Lambda Labs — they offer hyperscaler-alternative infrastructure focused exclusively on AI compute.
Neocloud revenues exceeded $25 billion in 2025, up 223% year-on-year. GPU compute runs significantly cheaper here than on hyperscalers — but the architecture is still centralised. Best fit: AI-native teams that need large GPU clusters for training or high-throughput inference.
Path 3 — Distributed Edge (Akamai and altscaler peers). Geographically distributed, targeting sub-50ms TTFT. Futuriom groups Akamai, Fastly, and Cloudflare under the term “altscaler.” Akamai’s differentiator is scale: 4,300+ locations versus Cloudflare’s 300+. Best fit: real-time, user-facing AI workloads where TTFT is a first-class SLA.
The question here is workload routing, not picking a winner. Most teams will span at least two paths. The long runway that justifies infrastructure investment at this scale — the $7 trillion projection stress-tested against realistic return assumptions — means all three paths will scale simultaneously.
This deal is part of a bigger pattern. AI infrastructure demand is reshaping companies across the entire stack — not just expanding the hyperscaler market.
TeraWulf transitioned from bitcoin mining to AI infrastructure. IREN moved from GPU bare metal to fully managed cloud services. Akamai pivoted from CDN to distributed AI inference. The altscaler category has real substance: Fastly’s AI Accelerator deploys the same semantic caching approach Akamai is building at scale. Cloudflare’s Workers AI supports over 50 models across 200+ cities. The CDN-to-inference transition isn’t a branding exercise. It’s a category move.
The three-path framework above isn’t static either. As edge inference matures and pricing becomes public, the total cost of ownership calculus will shift. Keep an eye on this category — not just the hyperscaler roadmaps.
The broader $725 billion capex story makes Akamai-scale bets rational. The $1.8 billion is a fraction of that — but it represents the piece that was hardest to see coming: distributed inference capacity for frontier model workloads. For a complete overview of all dimensions of this spending cycle, see our full AI infrastructure arms race guide.
Akamai announced a seven-year, $1.8 billion cloud infrastructure commitment — the largest in company history. Multiple credible sources report Anthropic as the counterparty; Akamai’s official announcement says only “a leading U.S.-based frontier model provider.”
Multiple reports — including TIKR and SiliconAngle — point to Anthropic. Akamai has not confirmed this. Treat Anthropic as the reported counterparty: likely, but not officially confirmed.
A neocloud is a GPU-specialised cloud provider — CoreWeave, IREN, Lambda Labs — focused exclusively on AI compute rather than a broad service portfolio. Hyperscalers offer hundreds of services; neoclouds offer GPU compute at significantly lower cost.
Edge inference runs AI model computations at distributed nodes close to end users rather than centralised data centres. It shares CDN’s geographic distribution logic but involves active GPU-accelerated compute rather than passive content caching.
The 26% single-session gain — AKAM’s best since 2003 — reflected a fundamental business model revaluation. The deal validated that Akamai’s distributed edge network is a premium AI infrastructure asset, not a legacy CDN in structural decline.
TTFT measures how quickly an AI model begins returning its response. For real-time, user-facing applications it determines whether an interaction feels instantaneous or delayed. Edge inference targets sub-50ms TTFT by shortening the physical distance between the inference node and the end user.
Real-time, user-facing workloads where TTFT is a first-class SLA: FinTech fraud detection, HealthTech clinical decision support, EdTech adaptive learning, agentic AI. Training, batch processing, and throughput-dominated workloads belong on centralised infrastructure.
Three-tier compute continuum: far-edge (4,400+ locations), metro edge (regional clusters), and core (dedicated GPU clusters). An intelligent orchestrator routes each inference request across tiers based on latency and cost — what Akamai calls “tokenomics” optimisation. Hardware is NVIDIA RTX PRO 6000 Blackwell GPUs with BlueField DPUs.
All three are CDN-origin providers pivoting toward AI inference compute; Futuriom calls them “altscalers.” Akamai’s differentiator is scale: 4,300+ locations versus Cloudflare‘s 300+ and Fastly’s smaller footprint, plus the first global-scale implementation of NVIDIA’s AI Grid.
Amazon’s $25 Billion Anthropic Bet and What Hyperscaler Vertical Integration Actually Means for Enterprise AIAmazon has committed $33 billion in equity to Anthropic — $8 billion up front, then another $25 billion in April 2026. On top of that, there’s a separate agreement where Anthropic will spend up to $100 billion on AWS infrastructure over the next decade. Google turned around almost immediately and announced a parallel $40 billion in cash and compute. At that scale, these are not portfolio bets. The big cloud providers are locking in their preferred AI model vendors at every layer of the stack.
This is part of the AI infrastructure arms race that’s reshaping how enterprises think about technology procurement in 2026. If you’re evaluating your AI vendor strategy, the question that actually matters isn’t whether these deals are good for Anthropic. It’s what they mean for your vendor neutrality — and what you should do about it.
Vertical integration in the traditional sense means a company controlling multiple stages of its own production process — a car maker owning its own steel mill. In the AI-cloud context it works a bit differently.
A cloud provider takes a major equity stake in an AI model lab, then becomes its primary compute supplier, then integrates the lab’s models into its managed platform. That’s three layers of entanglement: financial (equity), commercial (compute pre-commitment), and technical (governance bundling).
The thing that separates this from a simple partnership is the equity stake. Amazon now has a financial incentive to route your enterprise AI workloads toward Claude. Anthropic has a financial incentive to deepen its integration with AWS. Both are pushing toward the same outcome — and your organisation is inside the resulting architecture whether you’ve thought about it or not.
This pattern is now running across Amazon ($33B equity, 100Bcompute), Google(40B cash and compute), and the original template: Microsoft and OpenAI. “AI vendor choice” is now a structural infrastructure choice at exactly the same time.
The two numbers — $33 billion in equity and $100 billion in compute — are different instruments with very different lock-in effects. The equity aligns incentives. The compute commitment is the operative lock-in mechanism.
Under the compute agreement, Anthropic has contractually committed to running its model training and inference on AWS infrastructure, specifically on Amazon’s custom Trainium chips — Trainium 2, 3, and 4, with nearly one gigawatt of capacity expected by the end of 2026. That ties Anthropic’s model architecture and training pipelines to AWS’s custom silicon roadmap for a decade.
Project Rainier makes the depth of this concrete. It’s a joint Amazon-Anthropic compute cluster running over one million Trainium2 chips — one of the largest AI training clusters in the world. This is not standard cloud tenancy. It’s a co-engineered infrastructure project where the integration runs below the API layer to the hardware itself.
The “up to” qualifiers do matter. Neither figure is a guaranteed payment schedule; both are capacity ceilings that depend on Anthropic’s growth trajectory. With $30 billion in run-rate revenue and a $350 billion valuation, this is strategic alignment between two fast-growing parties, not a rescue operation. More than 100,000 organisations are already running Claude via Amazon Bedrock.
The scale of these commitments also draws scrutiny beyond the boardroom. The financing structures the BIS flagged in its March 2026 review include investment commitments of exactly this type — equity stakes and compute pre-commitments that create off-balance-sheet exposure at hyperscaler scale.
Google’s $40 billion commitment — $10 billion initial cash, up to $30 billion more contingent on compute milestones — is structured similarly to Amazon’s. The operative mechanism is again the compute commitment, with Broadcom co-designing custom TPU silicon: 3.5 gigawatts of capacity confirmed for Anthropic beginning in 2027. If you’re on Google Cloud Platform, the governance integration happens through Google Cloud Vertex AI, where Claude runs alongside Gemini with GCP identity management and compliance frameworks inherited automatically.
Amazon and Google both hold major Anthropic equity. Neither has exclusivity over Claude’s distribution. But “available on all three clouds” is not the same as “equally integrated on all three.” Organisations on Azure get Claude via Azure Foundry, but without the hardware-layer entanglement that characterises Bedrock and Vertex AI.
If you’re looking for a compute-layer alternative to the hyperscaler model altogether, it’s worth looking at the neocloud model that providers like CoreWeave represent.
Anthropic frames Claude as “the only frontier AI model available on all three of the world’s largest cloud platforms.” That’s a real differentiator. But it obscures an infrastructure-layer concentration that matters more for enterprise risk assessment.
The Microsoft-OpenAI comparison makes the point. OpenAI’s infrastructure is deeply integrated with Azure — GPT models are not natively available on AWS or GCP. That’s the highest-lock-in reference case: enterprises that standardised on Azure are now locked in at the model, infrastructure, and governance layers simultaneously.
Claude being accessible on all three clouds is structurally different. But when you’re moving away from the AWS or GCP integration layers, the switching costs include all the compliance infrastructure you’ve built around them — not just the model API. Access layer multi-cloud does not equal infrastructure layer neutrality. That’s the distinction worth keeping in mind.
The most important and least-discussed mechanism here is governance integration. When your organisation deploys Claude through AWS Bedrock, workloads automatically inherit your existing AWS IAM policies, data controls, compliance certifications, and regional availability settings. Anthropic calls this “same account, same controls, same billing.”
For regulated industries, that’s genuinely valuable. Bedrock carries SOC, GDPR, HIPAA, and FedRAMP High certifications. If your team has spent years building AWS compliance frameworks, those controls extend to AI workloads automatically — no need to rebuild them for a standalone Anthropic relationship.
But that same convenience is also architectural lock-in. Once you’re building AI agents inside the AWS governance layer — IAM policies, VPC configurations, compliance logging tied to Bedrock AgentCore — the switching cost is not just the model API. Agent definitions, memory, and tool integrations are stored in AWS-native services. There’s no cross-cloud portability.
Three lock-in types activate at once: API dependency, agent framework capture, and ecosystem entanglement (billing, monitoring, audit logging all in your AWS accounts). Custom silicon adds a fourth layer. Gartner projects 40% of enterprise applications will include task-specific agents by 2026. Organisations building those on AgentCore are embedding their architecture into AWS’s stack in ways that compound over time.
The vertical integration dynamic will deepen, not reverse. Accepting some lock-in in exchange for governance simplicity is a rational choice. The question is whether you’re making it deliberately, with architectural guardrails in place, or just letting it happen by default.
Here are four things you can do now:
1. Isolate AI model calls behind an internal abstraction layer. A thin API wrapper means swapping the underlying model requires configuration changes, not code rewrites. It’s the most effective architectural defence against lock-in — and you don’t have to give up any of Bedrock’s governance benefits to do it.
2. Avoid building agent orchestration logic directly into cloud-native frameworks. Bedrock Agents and Vertex AI Agent Builder are where the deepest framework capture occurs. Cloud-neutral tools like open-source LangChain and LlamaIndex offer comparable capabilities with cross-cloud portability. The orchestration layer is the highest-switching-cost component, so it’s where you want the most flexibility.
3. Negotiate data portability before signing enterprise agreements. Insist on data export in open formats — training data, fine-tuned weights, inference logs — and service continuity terms if the vendor fails. This is negotiable now, in the pre-IPO window.
4. Use that window. An Anthropic IPO is viable in 2026. Multi-year agreements negotiated now may lock in pricing stability and capacity guarantees before public market revenue pressure shifts the terms.
The outcome to avoid is the Microsoft-OpenAI model — locked in at the model, infrastructure, and governance layers at the same time. Amazon-Anthropic and Google-Anthropic are building toward an equivalent outcome. Define your agent architecture strategy before production workloads are running on AgentCore or Vertex AI Agent Builder. Enterprises that haven’t defined their agent architecture strategy are already making a lock-in decision — just not a conscious one.
For the ROI implications of this investment scale and whether the returns justify the spend, the enterprise AI payback analysis covers that in detail. For context on $725 billion in hyperscaler capex and the full scope of what these deals are part of, the arms race overview covers the complete picture.
Yes. The investments are compute and equity commitments, not exclusive partnerships. Anthropic maintains model access across AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry as a deliberate differentiator from OpenAI’s near-exclusive Azure relationship. Neither investor holds exclusivity over model distribution.
The $25 billion (part of $33B total) is equity — a financial stake that gives Amazon a return if Anthropic’s valuation rises. The $100 billion compute commitment is a separate agreement where Anthropic commits to spending that amount on AWS over 10 years. The compute commitment is the operative lock-in mechanism; the equity stake aligns incentives.
Project Rainier is a jointly developed compute cluster running on AWS Trainium2 chips — over one million chips, one of the largest AI training clusters in the world. This is not standard cloud tenancy but co-engineered infrastructure: the integration goes below the API layer to the hardware itself.
When Claude is deployed through AWS Bedrock, AI workloads automatically inherit the enterprise’s existing AWS IAM policies, compliance certifications, data residency, and audit logging — no separate vendor negotiation required. For organisations with mature AWS compliance frameworks, this reduces deployment overhead significantly, and deepens dependency on the AWS governance layer.
Frontier AI training consumes compute at a scale no single vendor can guarantee. Relationships with Amazon (Trainium), Google (TPUs), and CoreWeave (Nvidia GPUs) hedge against supply disruption. Supply-side diversification coexists with the deep financial and governance integration on the demand side.
No. The compute commitment is a capacity pre-commitment: Anthropic secures guaranteed infrastructure supply, Amazon secures Anthropic’s training and inference traffic. The equity investment and the compute spend are separate instruments, not a recycling of the same capital.
Microsoft’s OpenAI relationship is near-exclusive: OpenAI’s infrastructure is deeply integrated with Azure, and GPT models are not natively available on AWS or GCP. Claude remains available on all three major clouds — structurally different. But switching costs are highest on AWS Bedrock and Google Cloud Vertex AI, where custom silicon and platform-level governance are both bundled in.
Custom silicon means chips built specifically for AI workloads: Amazon’s Trainium, Google’s TPUs, Microsoft’s Maia. Unlike Nvidia GPUs, custom silicon is optimised for a specific platform’s architecture. Anthropic’s commitment to Trainium 2-4 and Google TPU capacity means its model architecture will be tuned to those chips over time — hardware-layer dependency stacked on top of model-layer and governance-layer lock-in.
Anthropic’s valuation reached $350 billion in April 2026, with run-rate revenue surpassing $30 billion. An IPO is possible in 2026. Multi-year agreements negotiated before then may lock in terms that become less favourable once public market revenue pressure kicks in.
Neither extreme is optimal. Bedrock reduces governance overhead and consolidates billing — rational if you have mature AWS compliance frameworks. A direct Anthropic API relationship preserves optionality and creates a fallback path. The sensible approach: production workloads through Bedrock where governance justifies it, direct API access maintained for others, and abstraction layers built in so switching costs stay manageable.
CoreWeave and the $21 Billion Meta Deal — The Rise of the Neocloud and What It Means for AI ComputeCoreWeave started life as a crypto miner out of a New Jersey garage. By April 2026, it had signed the largest AI cloud deal in history — $21 billion from Meta — plus a separate multi-year deal with Anthropic, both within 48 hours of each other. The neocloud market grew 223% year-over-year in Q4 2025, generating over $25 billion annually and heading toward $400 billion by 2031.
Then, one week after the Meta announcement, CoreWeave’s Q2 2026 guidance came in below Wall Street consensus. The stock fell roughly 10%. A new question entered the room: counterparty risk.
This article is part of the AI infrastructure arms race cluster. The focus here is narrower — what a neocloud actually is, why Meta chose one over its own infrastructure, and how to think through the neocloud versus hyperscaler decision, including the risk signals most coverage ignores.
A neocloud is a purpose-built GPU cloud provider. It rents high-performance GPU compute — almost exclusively — to AI companies. It’s not smaller than a hyperscaler like AWS, Google Cloud, or Azure. It’s narrower. One thing, done well: GPU compute at high density, low latency, and materially lower cost.
The category exists because hyperscalers virtualise their GPU resources. They add a software layer so multiple tenants can share the same hardware. That works fine for plenty of workloads. But for distributed AI training at frontier scale — where thousands of GPUs need to exchange gradient updates with sub-microsecond synchronisation — that virtualisation overhead is the enemy.
Neoclouds give you bare-metal access: your workloads run directly on physical GPU hardware with no hypervisor in the way. CoreWeave runs an InfiniBand networking fabric that connects GPUs at cluster scale, and NVIDIA SHARP to accelerate gradient synchronisation during distributed model training. Hyperscalers don’t offer this on comparable terms.
The pricing difference is real and documented. On-demand H100 pricing as of May 2026: AWS at $6.88 per GPU-hour, Azure at $6.98, Google Cloud at $11.01. Neocloud alternatives: Lambda Labs at $3.99, Crusoe Cloud at $3.90, Nebius at $2.95. Lock in reserved capacity for twelve months or longer and neocloud savings push 40–60% below on-demand hyperscaler rates.
CoreWeave isn’t the only game in town either. It leads the market, but FluidStack, Lambda Labs, and Crusoe make up a solid second tier. Multiple viable providers, multiple price points.
The CoreWeave-Meta agreement, announced April 9, 2026, commits Meta to roughly $21 billion in GPU cloud spend through December 2032. Largest single AI cloud deal in history at the time of signing.
The structure matters here. This is a take-or-pay contract — Meta commits to a minimum level of compute spend whether or not it actually uses the full capacity. If AI training workloads come in lighter than projected in a given quarter, CoreWeave still gets paid.
The hardware covers initial deployments of the NVIDIA Vera Rubin platform. The 2032 end date spans multiple Nvidia hardware generations — Meta is buying capacity continuity, not a snapshot of today’s silicon.
This deal sits inside CoreWeave’s $99.4 billion total revenue backlog — up from $66.8 billion the prior quarter — alongside a $22.4 billion OpenAI commitment and the Anthropic multi-year deal signed less than 24 hours after the Meta announcement.
One important thing to understand: the $21 billion doesn’t replace Meta’s own infrastructure investment. The two approaches serve different workload profiles. Which is exactly the point.
Meta is one of the world’s largest infrastructure operators. It’s also the neocloud sector’s largest single customer. That’s not a contradiction — it’s workload segmentation.
Meta builds and owns data centres for inference: running its LLaMA models in production, where its own hardware economics are compelling. It uses CoreWeave for the GPU-dense burst capacity needed for frontier model training, where owned infrastructure simply can’t ramp fast enough. GPU procurement lead times run six to eighteen months. Neoclouds with pre-purchased inventory can provision training capacity in hours or days. That speed is what you’re actually paying for.
What Q1 2026 earnings revealed about AI capex shows Meta isn’t alone in this. The CoreWeave deal sits alongside separate Meta commitments to Nebius (up to $27 billion for European capacity) and Amazon. European data residency requirements push Meta toward European-region providers even for structurally similar workloads.
Nvidia’s CUDA software ecosystem is the structural moat here — not the hardware itself. Switching to non-CUDA hardware means rewriting kernels, retraining developers, and abandoning NVIDIA tooling. That migration is measured in months. Nvidia holds an equity position in CoreWeave, which compounds into a real supply chain advantage: priority H100 inventory in 2023, first deployment of GB200 NVL72 in 2024, first-shipment Vera Rubin positioning. Hyperscalers can’t replicate that.
This isn’t a binary choice. It’s a decision across workload profile, time horizon, and risk tolerance.
Choose a neocloud when:
Choose a hyperscaler when:
The hybrid model is what Meta, Anthropic, and most AI organisations at scale actually run — neoclouds for GPU-intensive training, hyperscalers for inference and managed services. If your engineering team is above around 50 people, it’s worth evaluating this split explicitly.
One more thing to watch: egress costs. Most pricing comparisons understate them. Hyperscalers charge per-gigabyte egress fees that compound quickly for large training pipelines. Verify neocloud egress terms before you sign any committed capacity deal.
And if your team is building on Anthropic’s Claude — available on AWS Bedrock, Google Cloud Vertex AI, and Azure — or if workloads target AWS Trainium or Google TPUs, the managed integration case for hyperscalers gets considerably stronger.
On May 8, 2026, CoreWeave disclosed Q2 guidance of 2.45–2.6 billion against a consensus of $2.69 billion. The stock dropped roughly 10%.
Every investor article frames this as an investor concern. It’s your concern too.
What the guidance miss actually signals: CoreWeave is raising its capex floor faster than new bookings are arriving. The backlog is real and contracted. But new bookings are slowing. A large backlog and a slowing booking rate tell different stories about the same business, and you need to hold both of those stories in your head at once.
Customer concentration is the structural vulnerability. Microsoft alone represented an estimated 62% of CoreWeave’s 2025 revenue. Even after adding Meta and Anthropic, the top three customers likely account for more than 80% of 2026 revenue. A meaningful demand reduction from any anchor customer could stress CoreWeave’s ability to service its debt.
The question isn’t whether CoreWeave will fail. The question is: if your neocloud provider enters financial stress, what happens to your workloads? If CoreWeave can’t provision contracted capacity, you have a legal claim — but potentially no running infrastructure.
If you’re evaluating a multi-year neocloud commitment, here’s your counterparty risk checklist:
The guidance miss doesn’t invalidate the neocloud model. The signal is that counterparty due diligence is now a real practice, not a theoretical concern. The GPU-collateralised debt risk and the BIS concern about neocloud financing goes into this in detail.
CoreWeave financed its GPU fleet through an $8.5 billion Blackstone-led facility where NVIDIA H100 and H200 hardware serves as collateral — the first GPU-collateralised facility to achieve investment-grade ratings. It matures in March 2032.
Think of it like aircraft-backed financing: the asset depreciates; if the borrower defaults, lenders recover against hardware whose market value has dropped. Nvidia GPU rental rates have already declined 70–90% from their 2023 peak.
If CoreWeave’s revenue trajectory weakens and debt service becomes stressed, Blackstone has first claim on the GPU hardware running your workloads. That’s the structural risk. Understand it before you sign a multi-year take-or-pay commitment.
Neoclouds offer real advantages, and the AI infrastructure arms race shows no sign of slowing. But signing a multi-year neocloud commitment in 2026 means taking a position on your vendor’s debt structure. Informed due diligence — not avoidance — is the right response.
It’s a cloud provider focused almost entirely on renting high-performance GPU hardware for AI workloads. Unlike AWS or Google Cloud, it doesn’t offer databases, identity management, or broad application services — just GPU compute at high density, low latency, and lower cost. Bare-metal access is the primary technical differentiator.
The $21 billion CoreWeave-Meta agreement was the largest AI cloud deal in history at the time of signing. It sits alongside a $22.4 billion OpenAI deal and an undisclosed Anthropic multi-year deal inside CoreWeave’s $99.4 billion contracted backlog.
The stock dropped on weak Q2 2026 guidance (2.45B–2.6B vs. $2.69B consensus), not because of the backlog. Forward booking pace was slowing while capex requirements were rising — two indicators telling different stories about the same business.
No. Lambda Labs (3.99/GPU − hr), CrusoeCloud(3.90/GPU-hr, with green-energy differentiation), and Nebius ($2.95/GPU-hr, strong European footprint) are all active alternatives with meaningfully lower pricing at sub-CoreWeave scale.
You commit to pay for a minimum level of compute whether or not you use it, in exchange for significant pricing discounts. Only sign one when you have at least eighteen months of demand forecasting confidence and have verified the provider’s financial stability — debt service coverage ratio and customer concentration specifically.
Your workloads run directly on physical GPU hardware without a virtualisation layer. Hyperscalers share hardware across multiple tenants using virtualisation, which introduces latency and reduces GPU utilisation efficiency. Bare-metal access is the primary technical reason neoclouds outperform hyperscalers for frontier model training.
All major AI training frameworks — PyTorch, TensorFlow, JAX — optimise for CUDA first. Switching to non-CUDA hardware requires rewriting kernels and retraining developers, measured in months. Both neoclouds and hyperscalers run CUDA-compatible hardware, but CUDA lock-in advantages providers like CoreWeave with preferential Nvidia supply relationships.
Secured lending that uses physical GPU hardware as collateral. CoreWeave’s $8.5 billion Blackstone facility is the first to achieve investment-grade ratings this way. The key risk: GPU useful life is two to four years, but the facility matures March 2032 — collateral value at maturity will be substantially lower than at origination.
Microsoft was roughly 62% of CoreWeave’s 2025 revenue. Even after Meta and Anthropic, the top three customers likely account for over 80% of 2026 revenue. A meaningful pullback by any anchor customer creates service delivery risk for all customers.
Probably not as a primary infrastructure provider. Use a hyperscaler for general infrastructure, with selective neocloud usage for GPU-intensive workloads — model fine-tuning, batch inference at scale — where cost differences are significant and demand is predictable enough to justify reserved capacity.
Revenue backlog (99.4billion)isallcontractedfuturecommitments.Quarterlyrevenue(2.1 billion in Q1 2026) is what was actually invoiced. A large backlog with moderating quarterly growth means new contract signings are decelerating even as existing contracts execute — which is exactly what the Q2 guidance miss reflected.
The BIS flagged GPU-collateralised debt and take-or-pay contracts as systemic risks. CoreWeave’s Blackstone facility and the Meta take-or-pay contract are specific instances. The BIS concern about neocloud financing and the $725 billion AI capex picture provide further context.
What the Q1 2026 Hyperscaler Earnings Actually Say About the $725 Billion AI Infrastructure BetThe $725 billion headline is striking. But the number that actually tells you something is 77% — the year-on-year growth rate in AI infrastructure spending. That one figure tells you whether this is a sustained cycle or a one-year spike. Right now, it says: still accelerating.
The $725 billion is the combined capital expenditure guidance from four hyperscalers — the small group of companies that operate cloud computing at planetary scale: Microsoft, Alphabet, Amazon, and Meta, with Oracle sometimes added to reach the upper bound. Capex is money spent on long-term infrastructure like data centres and servers, not day-to-day operations.
No single earnings release tells the full story. So in this article we’re going to synthesise all four Q1 2026 reports: what the number actually means, why investor reactions to nearly identical announcements diverged sharply, and where the money is actually going — which is very different from what most coverage suggests. This article is part of our comprehensive series on the AI infrastructure arms race, where we explore every dimension of the $725 billion buildout from compute architecture to geopolitical consequence.
The $725 billion is not a single disclosed total. It’s the upper-bound consensus from combining full-year 2026 capex guidance across four hyperscalers, as compiled by the Financial Times from Q1 2026 earnings releases.
Amazon announced approximately $200 billion. Alphabet guided to $180–190 billion. Microsoft raised its figure to roughly $190 billion. Meta set a range of $125–145 billion. Without Oracle, you’re looking at $650–700 billion. The $725 billion figure is the upper bound, Oracle included.
Combined 2025 capex for these four was $381 billion. GMO now estimates hyperscaler capex at approximately 1.6% of US GDP for 2026 — approaching the annual economic output of Sweden. The Goldman Sachs “Tracking Trillions” research series benchmarks this buildout against prior technology infrastructure booms, and that is exactly the right lens for understanding what this cycle means at a macro level.
Because 77% growth from $410 billion in 2025 tells you the cycle is still expanding — not moderating.
Goldman Sachs provides useful context here. Consensus capex estimates were proved too low for two consecutive years: analysts projected roughly 20% growth at the start of both 2024 and 2025, and actual growth exceeded 50% in both years. The forecasters have consistently underestimated the pace.
Goldman’s GDP benchmark is the right anchor. AI capex currently sits at roughly 0.8% of US GDP, versus peak levels of 1.5% or more during prior technology infrastructure booms. Goldman analyst Ryan Hammond noted that AI hyperscaler capex would need to reach around $700 billion in 2026 to match the late 1990s telecom peak — meaning we are only now reaching historical precedent, not exceeding it. The dot-com fibre overbuild gets cited as the cautionary parallel, but on the Goldman GDP lens the cycle has not yet reached those overextension levels. The question is not “when will this stop?” It is “what would cause it to slow?” For the complete AI infrastructure arms race overview, including how compute geography, financing risk, and returns modelling fit into this picture, see our full resource hub.
Here is what each hyperscaler actually disclosed — the synthesis you won’t get from a single source.
Google / Alphabet
Google Cloud delivered $20 billion in quarterly revenue, growing 63% year-over-year — the fastest rate of the four. Alphabet raised full-year capex guidance to $180–190 billion. Google Cloud’s backlog — committed future revenue already under contract — nearly doubled to over $460 billion, more than twice trailing twelve-month cloud revenue.
CEO Sundar Pichai put it plainly: “We are compute constrained in the near term. Our cloud revenue would have been higher if we were able to meet the demand.” Compute constrained means capacity is the limiting factor, not customer appetite. Shares rose 7%.
Amazon / AWS
AWS grew 28% year-over-year — the slowest percentage rate among the four, but the fastest AWS pace in 15 quarters. Free cash flow collapsed from $26 billion in Q1 2025 to $1.2 billion. Amazon is projecting negative free cash flow of up to $28 billion for the full year. CEO Andy Jassy on the ROI question: “We have high confidence this will be monetized well, as we already have customer commitments for a substantial portion of it.”
Microsoft / Azure
Azure grew approximately 40% year-over-year. Microsoft’s AI business is running at a $37 billion annualised revenue rate, up 123% year-over-year. An $18 billion commitment to AI infrastructure in Australia signals that this buildout is not US-centric.
Meta Platforms
Meta raised its 2026 capex range to $125–145 billion. Revenue grew 33% to $56.3 billion, beating estimates. Shares fell approximately 6% after the capex announcement.
Both Google and Meta announced large capex increases in the same earnings cycle. One stock rose 7%. The other fell 6%.
It is not the size of the commitment that matters — it is the narrative connecting spending to revenue.
Alphabet paired its capex with 63% cloud growth, “compute constrained” language, and a $460 billion backlog. That is demand-led spending with a clear story. Meta’s capex came with open-ended guidance, no ceiling, and no cloud revenue story to validate it. Meta does not operate a third-party cloud service — the payback is internal and hard to verify quarter-by-quarter. One asset manager called it a “capital-intensive incinerator.”
Goldman Sachs noted that stock price correlation across AI hyperscalers dropped from 80% to 20% between June 2025 and the Q1 2026 earnings period. Investors are no longer moving in a pack. See the BIS warning about shadow borrowing for the financing risk dimension.
Here is the counterintuitive part: 60–70% of total AI compute demand in 2026 is inference — running existing models to serve users — not training. Most coverage frames the AI buildout as training-driven. It is not.
Training is a one-time or periodic event: weeks on large GPU clusters, then done. Inference is continuous: every user request, every AI feature in production adds to the load.
Token costs have fallen 280-fold in two years, and yet total spending keeps rising. Cheaper tokens drove adoption at scale; more adoption means more inference queries; more queries mean more compute demand — even as each individual query costs less. Volume growth has outpaced cost reduction by a wide margin.
Inference and training also have different hardware requirements, so the $725 billion is not going to identical equipment. And the Nvidia GPU depreciation cycle — roughly 2.5–3 years — is the ROI clock for the entire buildout. Goldman Sachs “Tracking Trillions” identifies silicon useful life as the most influential single variable in cumulative AI capex projections. For more on the returns question, see the $7 trillion by 2030 projection.
The $725 billion buildout is not a neutral backdrop. It creates specific consequences for any organisation using cloud infrastructure, and there are three things you should be factoring into your planning right now.
Capacity availability. When hyperscalers are compute-constrained, enterprise customers queue behind their biggest contracts. That means potential delays, waitlists for specific instance types, and premium pricing during peak demand.
Cost trajectory. As AI features move from prototype to production, inference costs replace training costs as the dominant line item. Token deflation is real — but volume growth is faster. Budget against usage growth, not per-token price.
Vendor pricing incentives. Hyperscalers are financing this buildout with debt — Alphabet has issued a 100-year bond, Amazon is projected at negative free cash flow for 2026, and Morgan Stanley expects total hyperscaler borrowing to exceed $400 billion this year. Companies carrying that debt load have a structural incentive to keep cloud prices high enough to service it. Do not assume prices will fall.
McKinsey and the World Economic Forum project cumulative AI infrastructure investment reaching approximately $7 trillion by 2030. Full sourcing and stress-testing of those assumptions is in the $7 trillion by 2030 projection. The short version: 2026 looks like one phase of a multi-year cycle. For a complete overview of all dimensions of $725 billion in AI infrastructure spending — from financing structures to national compute geography — see our AI infrastructure arms race resource hub. Plan accordingly.
GMO estimates hyperscaler capex at approximately 1.6% of US GDP for 2026 — approaching the 1.5%+ peak of the 1990s telecom boom. The figure rivals Sweden’s annual economic output. Goldman Sachs “Tracking Trillions” is the primary analytical framework for placing this in historical context.
A hyperscaler operates cloud computing infrastructure at planetary scale. The four primary ones are Microsoft, Google/Alphabet, Amazon, and Meta. Oracle is sometimes included to reach the upper-bound $725 billion estimate.
Training is building a model — a one-time or periodic investment on large GPU clusters. Inference is running the model to serve users — continuous and scaling with every user request. In 2026, inference accounts for approximately 60–70% of total AI compute demand. Spending grows with usage, not just model development.
Open-ended capex guidance with no ceiling and no cloud revenue story to validate it. Google announced a similar capex increase on the same day but paired it with 63% cloud growth and “compute constrained” language. Meta lacks a third-party cloud service, making its AI ROI harder to verify quarter-by-quarter.
AWS operates on a far larger base — $37.6 billion in Q1 versus Google Cloud’s $20 billion. Faster percentage growth from a smaller base is standard. Both report capacity-constrained conditions. The divergence reflects base effects and go-to-market differences.
It means cloud revenue was limited by available server capacity, not by customer demand. When Sundar Pichai said Google Cloud was “compute constrained,” he meant revenue would have been higher if Alphabet had built more infrastructure sooner. Paradoxically, it is a positive signal: demand exceeds supply.
Operating cash flow plus debt. Alphabet has issued a 100-year bond. Amazon is projected at negative free cash flow of up to $28 billion for 2026. Morgan Stanley expects total hyperscaler borrowing to exceed $400 billion in 2026.
Not yet, on the Goldman Sachs GDP benchmark. AI capex is approaching — but not yet at — the 1.5% of GDP level that characterised the late 1990s telecom peak. The cycle has not reached historical overextension levels.
Volume growth has outpaced cost reduction. Cheaper tokens drove adoption; more adoption means more inference queries; more queries mean more compute demand even as each individual query costs less.
McKinsey and the World Economic Forum project cumulative AI infrastructure investment reaching approximately $7 trillion by 2030. Full sourcing and analysis are in the $7 trillion by 2030 projection.
Backlog is committed future revenue already under contract — customers have signed agreements to purchase cloud services but the revenue has not yet been recognised. A large and growing backlog is a forward indicator of demand-led growth. Google Cloud’s backlog nearly doubled to over $460 billion in Q1 2026.
Nvidia GPUs are the dominant compute hardware inside AI data centres. Their useful life — roughly 2.5–3 years given the pace of hardware advancement — is the ROI clock for the entire buildout. Goldman Sachs “Tracking Trillions” identifies AI silicon useful life as the single most influential variable in cumulative AI capex projections.
The 50 Percent Success Rate — Why Current Defences Are Not EnoughAccording to the International AI Safety Report 2026 (UK DSIT), cited by Vectra AI, a persistent attacker attempting prompt injection against systems with safeguards in place still succeeds 50% of the time. Not against unprotected systems. Against systems where some defences are already active. The word is “some.” For the full picture, see the full industrial-scale prompt injection threat in production 2026.
The 50% figure comes from International AI Safety Report 2026 (UK DSIT), Figure 14. Vectra AI cites it in the context of production GenAI deployments with safeguards active — it’s Vectra AI’s reported figure citing the DSIT report, not independently verified primary data from Vectra AI’s own research. The DSIT policymakers summary does independently confirm that “although developers have made it harder to bypass safeguards, attackers still succeed at a moderately high rate.”
The “over multiple attempts” qualifier is the most important part of that statistic. This isn’t a per-attempt coin flip. It’s a cumulative persistence rate — attackers iterate payloads until one gets through. A layered defence stack can reduce attack success from 73.2% to under 10%. But for organisations running partial deployment — input filter active, output filtering absent, no retrieval isolation — the 50% rate is a conservative baseline. Partial deployment leaves that gap wide open.
A single statistic from one source, with methodological caveats, is a weak foundation for anything. Independent corroboration is what changes the picture.
Siemba’s ROAR 4 report (Q1 2026, practical red team testing) found 1 in 3 AI-integrated applications has directly exploitable LLM vulnerabilities. “Directly exploitable” means testers extracted system prompts without advanced techniques. Their conclusion: “System prompts are not security controls. Security lives in scoped permissions and validated outputs.”
The HouYi Framework academic research adds a third data point: 86% of 36 real-world LLM applications tested were vulnerable. Cisco’s State of AI Security 2026 adds a fourth: prompt injection weaknesses in 73% of audited production AI deployments.
Government safety report. Commercial red team. Academic framework testing. Independent security audit. Different methods, same conclusion. The OWASP classification of what still gets through current defences explains why the same pattern keeps showing up across all of them.
Single-layer defences fail because the attack surface is bigger than any single control point can cover.
Digital Applied’s security analysis — from approximately 200 production audits — identifies ten distinct attack classes by delivery vector: direct user input, indirect via content, tool outputs, memory, RAG sources, collaborative agents, document attachments, email bodies, API responses, and shared user sessions. Input filtering addresses class 1. Nine of those ten classes arrive through trusted channels an input filter never even sees.
In 2026, indirect prompt injection accounts for over 55% of observed attacks, with 20–30% higher success rates than direct injection. The structural root cause: LLMs process system instructions and user inputs in the same text format with no enforced execution boundary. AWS explicitly acknowledges this for Bedrock Guardrails: “There is no single control that can remediate indirect prompt injections.”
In multi-tenant SaaS deployments, shared inference infrastructure creates cross-tenant attack pathways that simply don’t exist in single-tenant environments. A Cybersecurity Journal analysis found 12 of 18 LLM vulnerabilities are amplified in multi-tenant versus single-tenant deployments. An attacker working at a 50% bypass rate against a shared platform doesn’t need to target any single tenant — they can automate attempts across the whole system.
Gravitee’s 2026 survey found 80.9% of organisations have agents in active testing or live production, yet only 14.4% have full security approval. Tenant isolation controls (Secure Multi-Tenant Architecture / burn-after-use session patterns) achieve a 92% defence success rate against cross-tenant leakage — at a 15–30% throughput cost that many operators keep deferring. For context on how these attack patterns have scaled, see how injection attacks have industrialised in 2026.
Defence-in-depth is a documented production architecture, not a theoretical concept. The Digital Applied four-layer framework looks like this:
Each layer addresses a different attack surface. Commercial tools like Lakera and open-source tools like LLM Guard implement specific layers within this framework. They’re not substitutes for the framework itself.
For budget-constrained organisations, the minimum viable baseline is three layers: input validation (direct injection), output filtering (exfiltration), and retrieval isolation (indirect injection via RAG). Open-source tools — Garak, PyRIT, Promptfoo, LLM Guard — reduce cost per layer. Agentic systems need a behavioural monitoring layer on top of that.
The real question for organisations that have already bought AI security products: does “we have something” mean “we have coverage”?
IBM Institute for Business Value found only 24% of ongoing GenAI projects consider security, despite 82% saying secure AI is crucial. The gap between procurement threshold (a product is deployed) and architecture threshold (deployed controls actually cover the full attack surface) is precisely where the 50% bypass rate lives.
OWASP LLM01:2025, MITRE ATLAS AML.T0051, and NIST AI 600-1 classify prompt injection at the highest severity level across every major security taxonomy. MITRE ATLAS distinguishes direct injection (AML.T0051.000) and indirect injection (AML.T0051.001) as requiring different controls — a single guardrail addressing only one vector leaves the other unaddressed by design.
Addressing the gap means mapping deployed controls against the four attack channels and working out which ones have no coverage. Red team tools — Garak, PyRIT, Promptfoo — provide empirical measurement where gap analysis falls short. For the specific product gaps the 50% success rate exposes, see the specific product gaps the 50% success rate exposes. For a complete view of how these attack patterns have industrialised and what organisations are doing about them across the full stack, see the full industrial-scale prompt injection threat in production 2026.
Yes. The International AI Safety Report 2026 (UK DSIT) confirms “attackers still succeed at a moderately high rate” even with safeguards deployed. The “over multiple attempts” qualifier means this is a cumulative persistence rate — an attacker iterating payloads needs only one success.
Vectra AI cites DSIT’s International AI Safety Report 2026 Figure 14 — prompt injection attack success rates by model release date. “Over multiple attempts” means cumulative attacker persistence. Treat it as a reported figure, not independently verified primary data from Vectra AI’s own research.
A guardrail is a single vendor product layer at specific input and output points. AWS explicitly states no single control can remediate indirect prompt injections. Defence-in-depth applies the Digital Applied four-layer framework across multiple independent controls at different system points. One product is one layer of a four-layer minimum.
Indirect prompt injection embeds malicious instructions in external content the model retrieves — documents, emails, tool outputs, database records. Input filters monitor the user input channel (attack class 1 of 10). Indirect injection arrives via the other nine classes, all through trusted channels the filter doesn’t inspect.
That depends on the defence architecture deployed, not the model. The 50% bypass rate, Siemba’s 1-in-3 directly exploitable finding, and HouYi’s 86% vulnerability rate collectively indicate most production AI deployments have material exposure. The question is whether the deployment has coverage at input, retrieval, output, and (for agentic systems) behavioural layers.
In multi-tenant deployments, successful injection in one tenant context can affect others via shared infrastructure. Tenant isolation controls achieve a 92% defence success rate against cross-tenant leakage, at a 15–30% throughput cost many operators defer.
Three layers: input validation (direct injection), output filtering (exfiltration), and retrieval isolation (indirect injection via RAG). Open-source tools — Garak, PyRIT, Promptfoo, LLM Guard — reduce cost per layer. Agentic systems require an additional behavioural monitoring layer.
Gravitee’s 2026 survey found 80.9% of organisations have agents in production, yet only 14.4% have full security approval. IBM found only 24% of ongoing GenAI projects consider security despite 82% saying it’s crucial. The gap between “we have a filter” and “we have a defence strategy” is where the 50% bypass rate lives.
The DSIT report notes developers have made safeguards harder to bypass — but attackers still succeed at a moderately high rate. Model selection is not a substitute for architectural controls.
Prompt injection is ranked LLM01:2025 — the top LLM application risk. Unlike SQL injection, “no equivalent guaranteed defence exists.” MITRE ATLAS distinguishes direct (AML.T0051.000) and indirect (AML.T0051.001) injection as requiring different controls. NIST AI 600-1 corroborates the top-severity classification.
Map deployed controls against the Digital Applied four attack channels: direct input, indirect retrieval, output exfiltration, and (for agentic deployments) autonomous behaviour. Identify which channels have no control. “Adequate” means coverage at multiple independent layers, not presence of a single product.
Enterprise Defence Ships — Microsoft Entra, Google Workspace, and What They Actually BlockUntil early 2026, if you were a security procurement team asking “what enterprise product defends against prompt injection?”, you had no meaningful answer. Then three months changed everything: Google Workspace mitigation shipped April 2, Unit 42 Frontier AI Defence launched April 21, and Microsoft Entra prompt injection protection followed April 27 — three enterprise platforms in a seven-week window. This article covers what each platform actually does, where each one stops, and what remains unprotected without additional controls. For broader threat context, see the 2026 production injection threat landscape these products respond to.
Before 2026, prompt injection was treated as a developer’s problem. You were expected to harden your own prompts, not buy a product for it. The OWASP LLM Top 10 had ranked prompt injection (LLM01) at number one for two consecutive editions, but vendor timelines lagged attacker timelines by years.
What changed was the agent deployment wave. When AI systems gained the ability to take irreversible actions — sending email, deleting records, executing code — injection stopped being a data leakage nuisance and became an infrastructure risk. Unit 42 documented real-world cases of attackers entering environments and immediately querying internal LLMs for reconnaissance data.
The governance context makes it worse: 80% of Fortune 500 organisations now deploy active AI agents, but only 10% have a clear strategy to govern them. Products arrived into that governance vacuum, which shapes what they can realistically enforce.
Microsoft shipped two distinct prompt injection tools that people routinely mix up.
Microsoft Entra prompt injection protection works at the network and identity layers via the Entra AI Gateway. It enforces universal network-level policies — blocking adversarial prompts and jailbreak attempts before they reach AI models, across any device or browser. It ships with built-in detection profiles for all major LLM providers.
Microsoft Prompt Shields is a separate product — Azure AI Content Safety — that scans inputs and outputs for injection payloads. It’s an API-based content safety service, closer to an inline filter than an identity control. Both products benefit from spotlighting, Microsoft’s technique for marking trusted system instructions as distinct from untrusted retrieved content before the model sees them — more on that below.
Entra Agent ID is a third distinct layer: it gives AI agents a cryptographic identity in the Entra directory, so the same authorisation policies you apply to human users apply to agents too. It’s one of four hyperscaler agent identity frameworks now on the market, alongside AAuth (Google), Google Agent Identity, and AWS AgentCore.
Google Workspace is best understood as a continuous mitigation effort rather than a single product launch. Google’s April 2026 publication described a layered defence for Gemini in Workspace: content classifiers that filter documents and emails containing malicious instructions; security thought reinforcement reminding the LLM to disregard adversarial content; and markdown sanitisation plus suspicious URL redaction to prevent data exfiltration. The GeminiJack attack — hidden commands in Google Docs or Calendar invites causing Gemini’s RAG pipeline to exfiltrate data via image tags — is exactly what this mitigation is designed to stop.
One editorial note: the precise coverage scope of the April 2 mitigation is not fully documented in available primary sources. If you’re relying on Google Workspace as a primary control, get direct vendor documentation before you make your procurement decision.
Unit 42 Frontier AI Defence launched April 21, converting Palo Alto’s in-the-wild IPI research directly into a commercial product. It monitors agent traffic and tool calls for injection signatures at the platform and network layer. Two partnerships provide model-level telemetry: Anthropic’s Project Glasswing and OpenAI’s Trusted Access for Cyber.
The enterprise platform launches address injection at the platform, network, and identity layers. There’s a separate detection layer that operates at the application level — deployed in the code that calls the LLM rather than as a standalone enterprise product.
Lakera Guard (acquired by Check Point in 2025) is the commercial benchmark: 98%+ accuracy, sub-50ms latency, 100+ language coverage, API-based. LLM Guard (open-source, from Protect AI) provides 15 input scanners including a dedicated PromptInjection scanner — self-hosted, so the cost is infrastructure rather than licence fees.
NVIDIA NeMo Guardrails takes a different approach entirely. Rather than detecting injection payloads, it implements safety policies in Colang — a domain-specific language that specifies what the model should and should not do. That is runtime policy enforcement, not payload detection. Lakera and LLM Guard detect and block injection attempts; NeMo Guardrails constrains what the model can be instructed to do regardless of injection. They’re complementary layers, not substitutes.
Three pre-deployment testing tools also come up in procurement conversations: Garak (NVIDIA; 37+ probe modules), Promptfoo (157 plugins mapped to OWASP LLM Top 10), and PyRIT (Microsoft; crescendo attacks; Tree of Attacks with Pruning). These find vulnerabilities before deployment. They are not runtime protection.
Spotlighting addresses the core problem that makes prompt injection possible: LLMs process developer instructions and retrieved content in the same context window without any cryptographic separation.
The mechanism marks trusted content (system instructions) and untrusted content (retrieved documents, user input) with explicit delimiters, so the model can apply different levels of trust to different parts of its context. Instruction hierarchy extends this structurally: system instructions are positioned above retrieved content at context assembly time, reducing the weight adversarial instructions can carry.
Both are patterns you can implement without buying anything — one of the few concrete, cost-free mitigation layers available regardless of which enterprise product you choose.
The limitations are real, though. Spotlighting is not cryptographically enforced. Sophisticated payloads, multi-turn crescendo attacks, and tool output injections can all bypass it. Researchers demonstrate bypasses within weeks of new guardrails being deployed — the asymmetry favours attackers, who need only find one technique that works.
No Q1–Q2 2026 enterprise product — individually or combined — provides complete prompt injection defence. The gaps between layers remain exploitable. Here’s where you’re still exposed.
Tool output injection is the leading gap. All four enterprise platforms focus primarily on document and user-input injection. When an agent calls an external API and the response contains adversarial instructions, that output is processed in a more trusted context than direct user input. Digital Applied‘s 2026 taxonomy identifies tool output injection as the fastest-growing attack class — 9 of 10 attack classes arrive through trusted channels, not direct input.
Human-in-the-loop (HITL) controls are absent from every enterprise product in this cohort. Requiring human approval before an agent takes an irreversible action is the most reliable defence against injection-driven agent misuse. It’s a process control, not a product feature, and you have to build it at the application level.
Multi-turn crescendo attacks evade per-message detection. When each individual message is benign but injection emerges from accumulated context, products evaluating messages in isolation provide limited protection.
The agent governance gap is structural. Products can only enforce policies that exist. Where no policy defines what an agent is authorised to do, there is nothing to enforce.
The upshot: map products to specific layers within a known architecture and don’t rely on any single product as a complete solution. For what happens when injection succeeds against an agent with excessive permissions, see what unblocked injection enables — from data leakage to remote code execution. For the quantified residual exposure even defended systems carry, see why even defended systems retain significant exposure. This series is part of our coverage of the 2026 production injection threat landscape these products respond to.
Is Microsoft Prompt Shields the same as Entra prompt injection protection? No. Prompt Shields is Azure AI Content Safety — input/output scanning for injection payloads. Entra prompt injection protection is a network-level policy layer blocking prompts before they reach AI models. Entra Agent ID is a third distinct layer governing agent identity. All three operate at different layers.
What did Unit 42 Frontier AI Defence launch with in April 2026? AI-specific attack surface assessment and detection of IPI techniques from Palo Alto’s in-the-wild research, with model-level telemetry from Anthropic (Project Glasswing) and OpenAI (Trusted Access for Cyber) partnerships. It operates at the platform/network inspection layer, not inside models.
What does Google Workspace’s April 2026 mitigation actually protect? Content classifier filtering for malicious instructions in emails and documents; security thought reinforcement; markdown sanitisation; suspicious URL redaction. The precise coverage boundary is not fully documented publicly — get direct vendor documentation before making procurement decisions.
What is Lakera Guard and how does it compare to LLM Guard? Lakera Guard (Check Point, 2025 acquisition): commercial API, 98%+ accuracy, sub-50ms latency, 100+ languages. LLM Guard (Protect AI): open-source, 15 input scanners, self-hosted. Your choice depends on whether you want a commercial SLA or self-hosted cost control.
Does spotlighting actually prevent prompt injection? Spotlighting reduces injection success rates by marking trusted vs. untrusted content for the model. It is probabilistic, not cryptographically enforced — researchers demonstrate bypasses regularly. Multi-turn attacks and tool output injections can get around it.
What is the agent governance gap? 80% of Fortune 500 deploy AI agents; only 10% have a clear governance strategy. Defence products enforce policies — where no policy exists, nothing can be enforced. An agent with excessive permissions and no documented scope gives you no basis for detecting violations.
Why isn’t HITL covered by current enterprise products? Human-in-the-loop is a process control requiring human approval before irreversible agent actions. None of the Q1–Q2 2026 enterprise products includes built-in HITL gates — it requires architectural decisions at the application level.
How does Entra Agent ID fit into enterprise AI security architecture? It gives AI agents a cryptographic first-class identity in the Entra directory, subject to the same authorisation policies as human users. It’s one of four hyperscaler agent identity frameworks alongside AAuth (Google), Google Agent Identity, and AWS AgentCore.
What is NeMo Guardrails and how is it different from Lakera? NeMo Guardrails defines behavioural boundaries via Colang policy — what the model is permitted to do. Lakera detects injection attempts. They operate at different layers and can be combined.
What pre-deployment testing tools cover prompt injection? Garak (NVIDIA; 37+ probe modules), Promptfoo (157 plugins; OWASP LLM Top 10 mapped), and PyRIT (Microsoft; crescendo and Tree of Attacks with Pruning). These are testing tools — not runtime protection.
What is a defence-in-depth architecture for an LLM application? Multiple independent control layers: input sanitisation, tool restriction (least-privilege), output validation (PII detection, destination allowlisting), and human review (HITL for irreversible actions). No single enterprise product covers all four layers.
Supply Chain Vector — How Developer Tooling Became an Injection Delivery SystemPrompt injection is no longer a chatbot problem. In Q1 2026, two incidents confirmed that injection vectors have moved upstream into the tools engineers use to build software — the IDE plugins, AI coding agents, and open-source proxy libraries most teams added to increase velocity, not threat surface.
The OpenClaw/Cline GitHub issues attack (Clinejection) showed that a single maliciously crafted GitHub issue title could kick off a full supply chain compromise. The LiteLLM/Mercor breach showed that a trusted AI infrastructure library with 3.4 million daily downloads could be weaponised in forty minutes, hitting over a thousand SaaS environments before PyPI could quarantine the package.
Together, these incidents mark the developer tooling layer as a structural attack surface for prompt injection. If you want the full scope of the 2026 injection attack surface, the picture is broader still.
This article covers how both attacks worked, how they map to OWASP’s formal classification, and the single preventive control that would have stopped the LiteLLM breach.
The defining pattern of 2026 supply chain injection is that the attack surface has moved into the developer tooling stack.
These tools ingest external content — repository issues, package dependencies, cached build artefacts — automatically. When an AI coding assistant reads GitHub issues for context, every issue is a potential instruction surface. The attack surface spans three categories most engineering teams adopted without treating as security perimeter components: IDE extensions such as Cline, AI proxy libraries such as LiteLLM, and the CI/CD automation connecting them.
The Clinejection attack chain was named by security researcher Adnan Khan — private GHSA submitted 1 January 2026, public disclosure 9 February after no vendor response, patched within one hour. Advisory: GHSA-9ppg-jx86-fqw7.
The entry point was Cline’s automated issue-triage workflow, built on anthropics/claude-code-action@v1 with allowed_non_write_users: "*" — any GitHub user could trigger Claude with Bash, Write, and Edit permissions just by submitting a public issue.
Here’s what the attack chain looked like:
publish-nightly.yml) consumed the poisoned cache, exposing VSCE_PAT, OVSX_PAT, and NPM_RELEASE_TOKEN.[email protected], whose postinstall script silently installed OpenClaw on an estimated 4,000+ developer machines in an approximately 8-hour window.OpenClaw — formerly Clawdbot/Moltbot — is not inherently malicious, but its architecture makes it a high-value payload: a persistent background daemon with broad system permissions and CVE-2026-25253 (operator privilege escalation without authentication). StepSecurity detected the anomaly within 14 minutes via the absence of OIDC provenance attestations.
The LiteLLM/Mercor breach took a different path but followed the same structural pattern: a trusted AI infrastructure component became the initial access point.
TeamPCP exploited a pull_request_target vulnerability in Trivy — a vulnerability scanner sitting in LiteLLM’s CI/CD pipeline — to steal maintainer credentials. On 24 March 2026, malicious LiteLLM versions 1.82.7 and 1.82.8 hit PyPI via the stolen PYPI_PUBLISH token. PyPI quarantined the package roughly 40 minutes later, though cache propagation extended effective exposure to three hours in some environments. LiteLLM is present in an estimated 36% of all cloud environments (Wiz Research). Mandiant at RSAC 2026 put cascading effects across 1,000+ SaaS environments.
The breach (CVE-2026-33634) cascaded to Mercor — an AI training data startup sourcing proprietary data for OpenAI, Anthropic, and Meta — via TeamPCP’s collaboration with [Lapsus](https : //en.wikipedia.org/wiki/Lapsus), which claimed 4 TB of stolen data. Meta paused all Mercor contracts on 3 April 2026.
LiteLLM held SOC 2 and ISO 27001 certifications from Delve Technologies. Neither detected the unpinned CI/CD dependency — compliance certifications don’t require cryptographic dependency pinning, and the breach made that gap visible at scale.
The same dynamic extends to the Model Context Protocol (MCP), which introduces two distinct supply chain threat classes.
Class D1 — Malicious MCP Servers: an adversary-controlled server injects adversarial instructions via tool outputs — analogous to a poisoned npm package, but targeted at agent tool-call chains rather than build pipelines.
Class D2 — Rug Pull Mutations: a post-installation attack where an MCP server’s behaviour changes server-side after trust is established — tool definitions silently redefined without triggering re-authorisation. Catalogued in the OWASP MCP Security Cheat Sheet.
D1 is detectable at install time through provenance verification. D2 requires runtime monitoring — a control most current MCP implementations simply don’t have. Flowise CVE-2025-59528 illustrates the intersection: an attacker-controlled MCP server serving as both injection delivery and RCE vector (Flowise CVE-2025-59528 analysis).
OWASP LLM03:2025 (Supply Chain Vulnerabilities) covers vulnerabilities from dependencies, plugins, pre-trained models, and deployment infrastructure for LLMs. Both incidents map to it: both attacks entered through trusted supply chain components, not direct model interfaces.
LLM03 is distinct from LLM01:2025 (Prompt Injection): LLM01 covers attacks on the model’s input interface; LLM03 covers attacks on the infrastructure that reaches it. Clinejection sits across both — indirect injection delivered through the developer tooling layer. Where the OWASP LLM03 classification fits within the broader LLM taxonomy is covered in the companion OWASP LLM01 article.
Formal CVE assignments confirm institutional recognition: CVE-2026-33634 (LiteLLM), CVE-2026-25253 (OpenClaw), GHSA-9ppg-jx86-fqw7 (Clinejection). Tracked at the same level as traditional software vulnerabilities.
Dependency pinning — locking packages to specific, cryptographically-hashed versions via lockfiles — would have stopped the LiteLLM/Mercor breach outright. Organisations with LiteLLM pinned got an automatic hash mismatch rejection on 1.82.7 and 1.82.8 before any code executed. It’s a CI/CD configuration change any team can make without a security specialist.
It doesn’t cover everything, though. Clinejection exploited CI/CD configuration and AI agent permissions, not a dependency update path. MCP rug pull mutations are invisible to it. Direct credential compromise before pinning is established remains an open exposure.
Complementary controls fill the gaps: OIDC/Trusted Publishing catches anomalous publication events; SBOM and AIBOM provide inventory visibility; AI agent permission scoping limits blast radius. Any unpinned AI infrastructure dependency is an open exposure in the same class as the one that hit Mercor.
For how injection expanded beyond chatbot interfaces into the full scope of the 2026 injection attack surface, the series hub covers the complete landscape. For enterprise defence product mitigations, see the IDE-level and supply chain mitigations available in enterprise defence products.
What is Clinejection and who discovered it?
Clinejection is security researcher Adnan Khan’s name for the attack targeting Cline’s AI issue-triage workflow — indirect prompt injection via a GitHub issue title, chained through cache poisoning to steal publication credentials and ship OpenClaw. Private GHSA submitted 1 January 2026, public disclosure 9 February after no vendor response, patched within one hour. Advisory: GHSA-9ppg-jx86-fqw7.
What is indirect prompt injection and how is it different from direct prompt injection?
Indirect prompt injection is when attacker-controlled content in an external source — a GitHub issue, a document, a web page — is ingested by an AI agent and executed as instructions, without the attacker touching the AI directly. Direct injection means the attacker crafted the input themselves. Clinejection is the clearest documented example of indirect injection chained into a full supply chain compromise.
How did GitHub Actions cache poisoning work in the Clinejection attack?
Injected instructions directed Claude to deploy Cacheract, flooding the GitHub Actions cache past the 10 GB LRU eviction limit and replacing legitimate entries with poisoned content, consumed by the higher-privilege release workflow to gain access to publication secrets.
Why was OpenClaw the payload, and is it inherently malicious software?
OpenClaw installs a persistent daemon with broad system permissions. Not inherently malicious, but CVE-2026-25253 (CVSS 8.8) lets an attacker gain full operator-level access via a crafted WebSocket handshake. It survives reboots and persists after Cline is removed.
What is CVE-2026-33634 and which versions of LiteLLM are affected?
CVE-2026-33634 covers LiteLLM versions 1.82.7 and 1.82.8 (published 24 March 2026 via a compromised Trivy GitHub Action). Organisations with lockfile pinning were protected regardless of specified versions.
Who is TeamPCP and what is their relationship to Lapsus$?
TeamPCP (PCPcat/ShellForce/DeadCatx3) ran the Trivy → LiteLLM → Mercor chain, collaborating with Lapsus$, which claimed the 4 TB Mercor theft. Mandiant cited cascading effects across 1,000+ SaaS environments at RSAC 2026.
What data was stolen in the Mercor breach?
Approximately 4 TB: AI training source code, contractor PII, and video interviews. Mercor sourced data for OpenAI, Anthropic, and Meta. Meta paused all Mercor contracts on 3 April 2026.
What is the difference between a malicious MCP server and an MCP rug pull mutation?
A malicious MCP server (Class D1) injects adversarial instructions from the point of first installation. An MCP rug pull mutation (Class D2) changes a legitimate server’s behaviour server-side after trust is established — tool definitions silently redefined without re-authorisation. D1 is detectable at install time; D2 requires runtime monitoring.
Does a SOC 2 or ISO 27001 certification protect against supply chain injection?
No. LiteLLM held both; neither detected the unpinned CI/CD dependency. Treat compliance as a baseline, not supply chain security assurance.
How quickly can a malicious npm or PyPI package reach developer machines?
[email protected] hit an estimated 4,000+ developers in roughly 8 hours. Malicious LiteLLM versions were live for about 40 minutes before quarantine. With auto-update behaviour and no cooldown checks, propagation outpaces human incident response.
What is OWASP LLM03:2025 and how does it relate to these incidents?
OWASP LLM03:2025 (Supply Chain Vulnerabilities) covers vulnerabilities from dependencies, plugins, and deployment infrastructure in LLM systems. Both incidents map to it — entered through trusted supply chain components, not direct model interfaces. It gives you the vocabulary to brief boards and compliance teams.
What is an SBOM or AIBOM and why is it relevant after these breaches?
An SBOM is a machine-readable inventory of all software components and dependencies. An AIBOM extends this to AI-specific components — models, datasets, proxies, agent frameworks. CycloneDX and SPDX 3.0 have been extended post-breach to cover AI infrastructure, giving you the dependency visibility to spot exposure before a breach notification arrives.
From Prompt to Shell — How Injection Escalates to Remote Code ExecutionFor the first three years of the generative AI era, prompt injection sat in the “data leakage” column of most risk registers. Attackers could manipulate outputs, extract information the model had been told to withhold, cause some embarrassing behaviour. Real damage, but bounded. The attacker could not own the system.
CVE-2026-26030 changed that. The vulnerability — a flaw in Microsoft’s Semantic Kernel Python SDK — documents the full path from an injected text string to remote code execution on the host running the agent. Flowise CVE-2025-59528 followed with the same outcome via a different mechanism. The risk category has shifted: prompt injection isn’t a content problem anymore, it’s a system compromise pathway. This article is part of our series covering how prompt injection moved from research to production in 2026.
This article looks at how the escalation works, why multi-tenant SaaS makes every stage worse, and why the instinct to “sanitise the input” doesn’t apply here.
Editorial note: Primary Microsoft source documentation for CVE-2026-26030 is available via the Microsoft Security Blog and GitHub Advisory GHSA-xjw9-4gw8-4rqx. The mechanism described here draws on those sources and corroborating secondary coverage.
CVE-2026-26030 reclassified prompt injection from an output integrity problem to an execution problem. The vulnerability affects Microsoft Semantic Kernel — 27,000+ GitHub stars, used widely across enterprise deployments. CVSS 9.9, Critical. This is not a marginal edge case.
The structural finding matters more than any single CVE: the LLM is not a security boundary. It parses language into tool parameters exactly as designed. In the vulnerable path, a model-generated filter expression gets interpolated into a Python lambda expression. The model generates the filter value; the filter value runs as code.
OWASP LLM01 (Prompt Injection) and LLM06 (Excessive Agency) are the two halves of the problem. LLM01 is the entry point: attacker-controlled text reaches the model’s context. LLM06 is the amplifier: the agent has permissions that turn a manipulated string into a meaningful execution event. Together, they produce RCE. OWASP has ranked prompt injection the number one LLM risk for three consecutive years — not because the vulnerability is new, but because agents now have the tool access to make it consequential.
In Microsoft’s own proof-of-concept — a hotel finder agent — a single adversarial hotel listing was enough. The agent retrieved the listing during normal operation. It contained a crafted filter value exploiting Python’s class hierarchy traversal to reach the built-in import mechanism and execute arbitrary code. Result: calc.exe launched on the host device. No browser exploit, no malicious attachment, no memory corruption. The agent did exactly what it was designed to do.
A companion vulnerability, CVE-2026-25592, demonstrates a second path. In the .NET SDK’s SessionsPythonPlugin, the DownloadFileAsync method was accidentally decorated as a callable tool — officially advertising it to the model. Inject a prompt that creates a payload inside an Azure Container Apps sandbox, invoke the download helper, write to the Windows Startup folder, achieve host compromise on next sign-in. The container boundary existed. The problem was that a trusted bridge across it had been handed to the model.
Both paths share the same structural logic: trust in model-generated parameters propagates to execution without validation.
This isn’t a Semantic Kernel quirk. In February 2026, researcher johnstawinski showed that Anthropic’s Claude Code Action had an equivalent path: a malicious pull request title enters Claude’s prompt, the injection overwrites the bun executable with an attacker payload. Every repository using it in default configuration was exposed. As the researcher put it: “If you give an LLM access to a knife, then anyone who influences that LLM controls the knife.”
Flowise is an open-source LLM application builder. CVE-2025-59528 documents RCE via CustomMCP configuration — different mechanism from CVE-2026-26030, same structural logic. Flowise had an allowlist protecting certain commands: python, npm, npx. OX Security researchers bypassed it by injecting commands through allowed commands’ arguments. At discovery, 12,000–15,000 Flowise instances were exposed online; active exploitation was reported April 7, 2026.
The Model Context Protocol (MCP), originally developed by Anthropic, is the standard for connecting AI agents to external tools and services. Every MCP connection is a trust boundary. In April 2026, Ox Security disclosed a STDIO command-injection flaw across MCP SDKs affecting 7,000+ servers and 150 million package downloads. The same month, CVEs landed across LiteLLM, Agent Zero, Windsurf, DocsGPT, Upsonic, and Flowise — all the same pattern: unsanitised input reaching execution sinks.
Two separate codebases, two separate mechanisms, same injection-to-RCE path. This is a consequence of how agent frameworks get built, not one vendor’s mistake.
SQL injection was solved at the architectural level: parameterised queries enforce a hard boundary between code and data at the protocol level. The database engine never parses attacker-controlled text as SQL syntax. Structural, deterministic. That same fix is not available for LLMs.
SQL injection is syntactic — the fix enforces structural separation at the protocol level. Prompt injection is semantic. The model must process instructions and data in the same natural language format to function. There is no protocol-level separation to enforce. Block “ignore all previous instructions” and the attacker uses a different phrasing, Unicode characters, or a role-play scenario. Natural language has infinite variation. You can’t maintain a blocklist.
Microsoft’s “Spotlighting” technique marks untrusted content with structural delimiters — the closest analogy to parameterisation. Researchers have demonstrated up to 100% evasion success against Azure Prompt Shield and Meta’s Prompt Guard. Instruction hierarchy research from OpenAI, Anthropic, and Google improves resistance similarly. Neither provides a structural guarantee.
The implication: “sanitise the input” is a category error. Input filtering addresses what the agent reads. The real defence addresses what the agent can execute. OWASP LLM01 has stayed top-ranked for three years precisely because the fix is architectural — minimal tool exposure, capability gating, human-in-the-loop for irreversible actions — not textual. The documented attack patterns that precede escalation in production environments are in the Unit 42 analysis.
In a single-tenant deployment, a successful injection compromises one customer’s agent. In multi-tenant SaaS, the blast radius is qualitatively different. Research found 12 of 18 LLM vulnerabilities are amplified in multi-tenant versus single-tenant deployments — cross-tenant data exfiltration and knowledge base poisoning show the highest amplification.
RAG poisoning makes this concrete: PoisonedRAG demonstrates a 97% attack success rate with only 5 malicious documents in a million-document shared knowledge base. A single bad actor tenant can corrupt responses served to every other tenant on the platform. Shared RAG indices are incompatible with multi-tenant security at this threat level.
PROMPTPEEK is a KV-cache timing side-channel attack that exploits standard multi-tenant performance optimisations to reconstruct other tenants’ system prompts and proprietary instructions — without any access to those accounts. The mitigation cuts throughput by 15–30%.
When an agent achieves code execution in a multi-tenant environment, it may reach shared credentials, databases, and network paths spanning the entire customer estate. If you’ve classified prompt injection as a low-priority “data leakage” item, this is a different threat model — and the full scope of the industrial injection threat makes clear why.
Injection achieves RCE only when the agent has the permissions to execute code, write files, or invoke privileged APIs. OWASP LLM06 (Excessive Agency) is the structural amplifier: agents can reach tools beyond task scope, those tools run with broader privileges than necessary, and high-impact actions proceed without human confirmation. The principle is direct: the tools you expose to an agent define the maximum blast radius of any injection that succeeds.
The Vertex AI Double Agent case (March–April 2026) illustrates this in production. Agents in Vertex AI inherited excessive default permissions through Google-managed service accounts. Exploiting injection in them enabled credential extraction and privilege escalation across Google Cloud. The default permissions were the real vulnerability; injection was the mechanism for reaching them.
The Mexican Government Breach (late December 2025 through January 2026) demonstrated the same dynamic at national scale. A single attacker used Claude Code and GPT-4.1 to compromise nine Mexican government agencies, exfiltrating approximately 150 GB of sensitive data including 195 million taxpayer records. The agents’ excessive data access converted injection-enabled entry into a multi-agency breach — the first documented nation-state-adjacent AI workflow attack.
Patch CVE-2026-26030 — upgrade to Semantic Kernel Python SDK 1.39.4 and .NET SDK 1.71.0 now if you haven’t. That closes the specific eval() and DownloadFileAsync paths, but not the underlying architectural condition. The mitigation guidance and patching options now available are in the enterprise defence article. Durable defence is architectural: minimal tool exposure, capability gating, per-tenant isolation, human confirmation for irreversible actions. The injection entry point is hard to eliminate by design. The blast radius is not. For the full picture of the 2026 state of production AI attacks, including how this escalation pathway fits within how prompt injection moved from research to production in 2026, the series hub maps every dimension of the industrialisation.
CVE-2026-26030 is a critical (CVSS 9.9) RCE vulnerability in Semantic Kernel Python SDK before 1.39.4, via unsanitised model-controlled parameters in InMemoryVectorStore. CVE-2026-25592 is a companion .NET SDK vulnerability (before 1.71.0) enabling sandbox escape via an exposed DownloadFileAsync tool. Both are patched — upgrade immediately and consult MSRC advisory GHSA-xjw9-4gw8-4rqx.
Yes. Anthropic’s Claude Code Action (Feb 2026, CVSS 7.7), Flowise CVE-2025-59528, and a batch of April 2026 CVEs across LiteLLM, Agent Zero, Windsurf, DocsGPT, Upsonic, and Flowise all share the same pattern: unsanitised input reaching execution sinks. This is not a single vendor’s error.
A Crescendo attack spreads the adversarial instruction across multiple benign-looking conversation turns; the injection only emerges from accumulated context. Standard per-message filters don’t catch it — detection requires conversation-level analysis.
Vertex AI agents inherited excessive default permissions through Google-managed service accounts. Exploiting injection in them enabled credential extraction and privilege escalation across Google Cloud. The lesson: “what can the agent do” matters as much as “what payloads can reach the agent.”
Five malicious documents in a million-document shared knowledge base achieve a 97% manipulation success rate. In multi-tenant SaaS, a single bad actor tenant can corrupt every other tenant’s responses. Fix: strict per-tenant knowledge base segmentation.
No. Patching closes the specific eval() and DownloadFileAsync paths. Any other path where model-controlled parameters reach execution sinks without structural validation remains vulnerable. Patching is necessary; architectural defence is required.
Five stages: Initial Access → Privilege Escalation → Persistence → Lateral Movement → Actions on Objective. It reframes injection as a structured campaign with distinct intervention points, not a one-shot exploit.
A KV-cache timing side-channel: by measuring inference response timing in a shared KV-cache environment, an attacker can reconstruct other tenants’ system prompts and proprietary instructions. Mitigation requires per-tenant KV-cache isolation at a 15–30% throughput cost.