The global memory shortage is not just about supply and demand. Geopolitics is actively redrawing the map of who gets to make what, and where. U.S. export controls are blocking China from accessing advanced memory technology. Proposed tariffs on Korean and Taiwanese chips threaten to jack up costs for everyone else. And China’s biggest DRAM maker, ChangXin Memory Technologies (CXMT), is ramping up conventional DRAM production but remains years away from manufacturing the high-bandwidth memory (HBM) that AI actually needs.
Here is the thing you need to understand: coverage of Chinese memory production routinely conflates conventional DRAM with HBM. They are not the same thing. This article separates what CXMT can realistically deliver from what it cannot, and gives you the geopolitical context for the global AI memory shortage without requiring a trade policy degree.
CXMT is based in Hefei, produces DDR5 and LPDDR5 at scale, and is now the world’s fourth-largest DRAM manufacturer — holding roughly 15% of global output.
But here is what matters: all of that is conventional DRAM. CXMT does not produce HBM, which is the memory type in shortest supply for AI workloads. Technologically, it sits about two process generations behind Samsung and SK Hynix.
Then there is Yangtze Memory Technologies (YMTC), China’s NAND flash leader, which is constructing a third fab in Wuhan with around half its planned output earmarked for DRAM. But YMTC’s DRAM ambitions are even further from commercial scale than CXMT’s HBM plans.
HP, Dell, Acer, and Asus are actively qualifying CXMT DRAM for non-U.S. devices. That is a legitimisation signal, but HP will reportedly only put CXMT chips in devices sold outside the United States — a deliberate compliance firewall. So CXMT is a real player, but its relevance to the Samsung/SK Hynix/Micron competitive position is confined to conventional DRAM in specific markets.
Short answer: worse.
U.S. Bureau of Industry and Security (BIS) export controls restrict the transfer of advanced semiconductor technology to China — including HBM stacking processes and packaging equipment from ASML, KLA, and Applied Materials. The rules limit tool sales for sub-18nm DRAM processes and target advanced packaging tech relevant to HBM.
The practical effect is a technology ceiling. CXMT simply cannot get the equipment it needs to manufacture HBM at competitive yield. And because Chinese AI companies like Huawei cannot access Korean or U.S. HBM either, domestic demand concentrates on whatever CXMT can produce. CXMT’s growth stays inside China. It does not relieve global HBM shortages one bit.
It is worth understanding the distinction between export controls and tariffs. Controls restrict China’s ability to produce competitive alternatives. Tariffs increase the cost of allied supply. Both make things worse for enterprise buyers, but the question of whether Chinese growth changes the shortage resolution timeline depends entirely on which lever you are looking at.
CXMT is also on the U.S. Department of Defense “Chinese military company” list. It is not banned outright, but the designation creates compliance review requirements for any Western company even considering CXMT as a supplier.
One more thing worth knowing. Samsung and SK Hynix both operate large fabs inside China that are now frozen in capability — running, but unable to upgrade. As Chinese domestic production expands, both companies may eventually exit the Chinese market entirely. That is an underappreciated supply variable that most analyses skip over.
In the near term (2026–2027), CXMT provides some conventional DRAM relief for consumer devices and domestic Chinese buyers. It is targeting HBM3 by end of 2026 and has delivered HBM3 samples to Huawei. But samples and volume production are very different things.
The challenge is mastering through-silicon via (TSV) stacking — bonding multiple DRAM dies into a single HBM package, where a single misstep ruins the whole stack. Chinese equipment makers like Naura Technology are developing domestic tooling, but none of it is confirmed ready for mass production.
So do not plan your procurement around Chinese HBM. It is not going to deliver at volume in the 2026–2027 timeframe. Looking further out to 2028 and beyond, Chinese HBM at scale is plausible but not guaranteed. Counterpoint Research expects Korean firms to maintain dominance.
New fab investments are underway. Micron is building in Taiwan and New York. SK Hynix is investing in Indiana. Expansions are happening in Japan and Singapore. These are partly enabled by the U.S. CHIPS Act, and Taiwan has committed approximately $318 million in subsidies to Micron for HBM R&D. For full context on how the memory shortage is reshaping the entire tech sector, see the global AI memory shortage overview.
But new fabs come online in the 2027–2030 window. For the current shortage, diversification is a strategic hedge, not a supply solution.
Then there is the tariff threat. U.S. Commerce Secretary Howard Lutnick has warned Samsung and SK Hynix they could face 100% tariffs. His exact words at Micron’s New York fab groundbreaking: “Everyone who wants to build memory has two choices: They can pay a 100% tariff, or they can build in America.”
And here is the part that gets missed in most coverage: long-term agreements (LTAs) between hyperscalers and memory makers lock up existing capacity for AI workloads. New fab output may already be committed before it reaches the open market. Diversification does not put more memory on the spot market for enterprise buyers anytime soon.
This is the question that matters most for capital planning.
Deloitte’s 2026 semiconductor outlook flags that the industry may need to prepare for a demand correction. If enterprises cannot demonstrate returns on AI investment, the hyperscaler capex driving HBM demand could slow down. The DeepSeek efficiency narrative fed into this — if models get more efficient, maybe memory demand falls.
But committed spending tells a different story. Microsoft, Google, Amazon, and Meta have committed record AI infrastructure spending. Stargate — the $500B OpenAI/SoftBank/Oracle joint venture — represents demand spanning years. Hyperscalers are placing open-ended memory orders, accepting any HBM volume at any price. That does not look like demand about to fall off a cliff.
The reconciliation is inference scaling. More efficient models get deployed more broadly. AI workloads have expanded from training into inference and decision-making — and each stage adds memory demand at the system level. Efficiency per model can actually increase total consumption. It is a risk scenario, not a forecast — but it is worth stress-testing your procurement assumptions against.
Chinese DRAM provides minimal near-term relief for Western enterprise buyers. CXMT does not serve the global spot market at scale, and the DoD blacklist creates compliance requirements most procurement teams are not set up for. Qualifying CXMT’s chips does not mean OEMs will automatically order from them. It is a contingency measure for non-U.S. markets. Enterprise-grade supply for Western buyers remains controlled by Samsung, SK Hynix, and Micron — and understanding how those three makers locked up supply with hyperscalers is essential context for any sourcing negotiation.
So what should you actually do? Check whether your supply chain has indirect CXMT exposure through contract manufacturers. Build sourcing strategies that account for export control tightening and tariff escalation as separate scenarios. And secure supply commitments with established suppliers before the tariff situation resolves — waiting carries price spike risk regardless of which geopolitical path unfolds.
The geopolitical layer of the broader memory supply crisis does not change your near-term options much. But it changes the medium-term calculus considerably, and it is worth understanding before you lock in multi-year sourcing strategies.
Not at meaningful scale. CXMT primarily serves the domestic Chinese market. Enterprise-grade memory for Western markets remains dominated by Samsung, SK Hynix, and Micron.
CXMT is on the U.S. DoD “Chinese military company” list, which restricts certain transactions. It is not an outright ban, but you should conduct a formal export control compliance review before incorporating CXMT components into any U.S.-destined product.
CXMT is targeting HBM3 production by end of 2026, but volume production at competitive yield is more realistically a 2028+ prospect. The technology gap in TSV stacking and domestic equipment maturity are the primary constraints.
Export controls block China’s access to manufacturing equipment. Tariffs increase the cost of memory from allied nations like South Korea and Taiwan. They are distinct policy levers with different supply-side effects, but both make the shortage worse.
As of early 2026, it has not been enacted. But the threat alone is already influencing procurement decisions and pricing expectations. Plan for multiple tariff scenarios.
More efficient models reduce memory demand per model, but they get deployed more broadly. Inference scaling tends to sustain or increase total memory consumption at the aggregate level.
They continue operating at their current technology level but cannot upgrade due to export control rules. The long-term future of these assets is unclear and represents an underappreciated variable in supply projections.
Not within the 2026–2027 window. New fabs operate on 3–5 year construction and ramp timelines. Diversification reduces geographic concentration risk long term but does not provide near-term relief.
The U.S. CHIPS Act subsidises domestic semiconductor manufacturing, supporting fab investments by Micron and SK Hynix on U.S. soil. It accelerates geographic diversification but does not change the timeline for new capacity coming online.
Yes. As OEMs qualify CXMT memory, components sourced through contract manufacturers may include CXMT DRAM. Audit your supply chain for indirect exposure, particularly for products with U.S. market distribution.
How the AI Memory Crunch Is Wrecking the PC Smartphone and Gaming GPU MarketsFor the first time in roughly 30 years, NVIDIA isn’t releasing a new gaming graphics card. Not because anything went wrong in engineering. Because the memory those cards need is worth more to AI accelerator customers.
The AI infrastructure buildout has turned semiconductor wafer capacity into a zero-sum competition — and consumer electronics are losing. IDC projects PC shipments down 11.3% and smartphone shipments down 12.9% in 2026. LPDDR5x mobile memory prices are up 90% quarter-on-quarter. Counterpoint Research calls it “the steepest on record.”
This article maps the damage across four markets and identifies which brands and buyers are most exposed. For the background on how we got here, the AI memory shortage and its causes are covered in our pillar overview.
NVIDIA has shipped new gaming graphics cards every year for roughly three consecutive decades. As of early 2026, TrendForce confirmed both the planned RTX 50-series Kicker refresh and the RTX 60-series launch have been paused — making 2026 the first year in modern NVIDIA history with no new gaming GPU.
The economics are pretty straightforward. Gaming GPU revenue was approximately 35% of NVIDIA’s total revenue in 2022. By late 2025 that had collapsed to around 8%. NVIDIA’s AI compute segment runs at approximately 65% operating margins versus roughly 40% for gaming graphics. So SK Hynix, Samsung, and Micron — who together account for more than 90% of global memory production — have reallocated accordingly. They’re prioritising HBM for AI accelerators over the GDDR6 and GDDR7 that gaming cards require.
There’s a geopolitical angle here too. US export controls restrict Chinese researchers from buying NVIDIA’s advanced AI chips. One workaround that emerged was buying GeForce gaming GPUs for AI training and inference instead. Pausing production closes that channel — a second-order effect that goes well beyond consumer gaming.
The PC industry entered 2026 expecting a tailwind. Microsoft ending support for Windows 10 was supposed to drive one of the largest enterprise refresh cycles in years. Instead, peak memory pricing arrived at exactly the same moment as the Windows 10 EOL deadline. Terrible timing.
The cost mechanics are stark. HP‘s CFO disclosed that memory now accounts for 35% of a PC’s build cost — up from 15–18% the prior quarter. Dell‘s COO reported DRAM at $2.39 per gigabit, a 5.5x increase over six months. Lenovo, Dell, HP, Acer, and ASUS have all warned of 15–20% unit price hikes. Gartner projects PC prices rising 17% in 2026. IDC projects shipments falling 11.3% — “more negative than even our most pessimistic scenarios suggested just a few months ago.”
Gartner estimates higher memory prices are causing enterprise refresh cycles to lengthen by 15%. For organisations that can’t defer, the math is painful: you’re paying 15–20% more per device in a supply environment where vendors aren’t guaranteeing prices beyond two to three weeks. IDC expects actual PC shortages to start in Q2 2026 as smaller vendors struggle to secure components. If you want to understand the price mechanics underlying these market declines, the DRAM contract pricing trajectory explains both the scale and the duration.
The smartphone market is experiencing what Counterpoint Research calls “the sharpest decline on record.” Global shipments are projected to fall 12% in 2026 — down to volumes not seen since 2013. IDC projects a 12.9% decline.
The driver is LPDDR5x — the mobile memory variant powering mid-range and flagship smartphones. Prices have risen approximately 90% quarter-on-quarter. That’s the steepest in the technology’s history. The reason is direct competition from AI infrastructure: NVIDIA’s most powerful AI rack systems contain 54 terabytes of LPDDR5x each — the same variant that goes into your phone. As Counterpoint’s Tarun Pathak put it: “A lot of these memory companies are asking smartphone vendors to stand in line behind the hyperscalers.”
Smartphone average selling prices are expected to rise 3–8% in 2026 (IDC) or as much as 14% (Gartner). The problem is that the buyers most exposed — those shopping in the sub-$400 tier — are also the most price-sensitive. When a $250 Android phone needs to be priced at $290 to maintain margins, unit volumes collapse faster than the price increase percentage suggests. Qualcomm is already absorbing the downstream damage as OEM customers order fewer application processors.
The AI PC story heading into 2026 was coherent. Intel integrated Neural Processing Units into its Core Ultra line. AMD followed with Ryzen AI. Microsoft defined Copilot+ PC — requiring a minimum of 16GB DRAM — as the hardware standard for on-device AI. Lenovo, Dell, HP, Acer, and ASUS all launched Copilot+ lines.
Then the memory cost shock arrived.
The Copilot+ specification requires 16GB DRAM minimum; 32GB is recommended for serious AI inference workloads. Under current DRAM pricing, those 32GB configurations have become “economically punishing” — IDC’s framing, not ours: “Just as the industry is seeing a need to add more RAM, it has become prohibitively expensive to do so, even if they can get supply. This will result in higher prices, lower margins, or a potential downmix in the amount of RAM in new systems at the worst possible time.”
Here’s the bind that creates. If OEMs ship Copilot+ at 16GB minimum rather than 32GB recommended, those devices can’t run the AI workloads that justify the Copilot+ premium. You pay more for a device that delivers less. Gartner confirmed enterprises will continue investing in AI PCs but at a slower rate, “likely to purchase devices with reduced memory.” The momentum isn’t gone — it’s been pushed back by at least 12–18 months. For actionable guidance on evaluating AI PC procurement against current memory pricing, the cost framework requires more than a simple timing call.
The memory shortage doesn’t affect all brands equally. The clearest illustration: Apple versus Android mid-range OEMs.
Apple: well-positioned. Apple secures memory supply through long-term agreements covering 12–24 months in advance. Its 2026 flagship iPhone models are expected to hold at 12GB RAM rather than increasing to 16GB, keeping BOM exposure manageable. Apple’s premium pricing means even a 3–5% cost pass-through doesn’t threaten demand.
Android mid-range OEMs: exposed. Honor, Vivo, Oppo, Xiaomi, TCL, Transsion, and Realme operate on thin margins with little supply hedging and large portfolios in the sub-$400 tier. Counterpoint Research forecasts these brands will be hit hardest.
Samsung: the dual-role position. As a vertically integrated memory manufacturer, Samsung produces the LPDDR5x it uses in its own phones — supply security that no pure-play OEM can match. SK Hynix and Samsung together surpassed the combined market cap of Alibaba and Tencent in early 2026. That tells you who’s winning from this shortage.
PC OEMs: scale matters. Lenovo’s scale gives it supplier leverage that smaller regional OEMs simply can’t match. Gartner has said directly that “consolidation isn’t off the map here — it’s survival of the fittest.” For broader context on the broader memory supply crisis driving these dynamics, the structural supply constraints explain why this asymmetry isn’t closing quickly.
The memory shortage translates directly into decisions that need to be made in the next 90 days: PC refresh timing, AI PC adoption, smartphone fleet management, server hardware planning.
Here’s what you’re working with:
Memory prices won’t stabilise until 2027 at the earliest. The top four cloud providers are projected to spend $600 billion on AI infrastructure in 2026 — 70% more than 2025 — sustaining demand pressure well into next year.
Hardware costs are 15–20% higher than 12 months ago. Dell’s data ($2.39/Gbit DRAM, up 5.5x over six months) and HP’s BOM disclosure (memory at 35% of build cost, up from 15–18%) set the baseline. If your budget was built on 2025 hardware cost assumptions, it’s wrong.
The Windows 10 EOL decision has no good timing. Organisations that can’t run Extended Security Updates are refreshing at peak memory prices. That’s the constraint — not the strategy.
A practical framework to work through:
Assess your true Windows 10 EOL exposure. Can you purchase Extended Security Updates to buy 12–18 months while prices normalise?
Evaluate standard PC versus Copilot+ AI PC. For most buyers, the Copilot+ premium is hard to justify at current memory pricing when normalisation isn’t expected before 2027.
Tier your vendor selection by supply chain position. Prioritise Lenovo, HP, Dell — suppliers with established long-term agreements and scale. IDC expects shortages to hit smaller vendors in Q2 2026.
Build a 15–20% hardware cost buffer into your 2026–2027 technology budget.
For the full response strategy covering AI workload cost management during the shortage, the framework goes beyond hardware procurement. And for when pricing will actually normalise, the memory shortage end-date analysis covers what the production data suggests about the 2027–2028 horizon.
Gaming GPU revenue collapsed from approximately 35% of NVIDIA’s total revenue in 2022 to around 8% in 2025. The GDDR memory gaming cards require is more profitably allocated to HBM for AI accelerators, where operating margins run at approximately 65% versus 40% for gaming. TrendForce confirmed the pause covers both the RTX 50-series Kicker refresh and the next-generation RTX 60-series launch — making 2026 the first year in approximately 30 years with no new NVIDIA gaming GPU.
HP disclosed that memory now accounts for 35% of a PC’s build cost, up from 15–18% the prior period. Dell’s COO cited DRAM at $2.39 per gigabit, up 5.5x over six months. Gartner projects PC average selling prices rising 17% in 2026; IDC projects ASP increases of 4–8%.
Semiconductor fabrication lines that previously produced standard DRAM for laptops and phones have been retooled to produce HBM for AI data centre chips. Every wafer allocated to an HBM stack is a wafer not available for the LPDDR5x in a smartphone or the DDR5 in a laptop, creating simultaneous supply shortfalls across all consumer memory types.
Gartner advises that memory prices will not stabilise until 2027. If Windows 10 EOL compliance requires a refresh, deferral may not be possible. If it can be deferred, waiting 12–18 months for prices to normalise would reduce hardware costs meaningfully.
LPDDR5x — the mobile memory variant — has seen prices rise approximately 90% quarter-on-quarter, the steepest in the technology’s history. Smartphone manufacturers either absorb the cost through margin compression or pass it to buyers through ASP increases of 3–8% (IDC) to 14% (Gartner). Memory companies are prioritising hyperscaler AI orders over smartphone vendor allocations.
High Bandwidth Memory (HBM) is DRAM chips stacked vertically and placed adjacent to the GPU die — delivering very high data transfer rates, essential for AI accelerators. SK Hynix, Samsung, and Micron, who collectively account for more than 90% of global memory production, are diverting fabrication capacity from conventional DRAM to HBM, creating the shortage across all consumer memory types.
Apple and Samsung are hedged via long-term supply agreements covering 12–24 months in advance. Apple’s 2026 flagship models are expected to hold at 12GB RAM rather than increasing to 16GB, limiting BOM exposure. Budget Android OEMs — TCL, Transsion, Realme, Honor, Oppo, Vivo, Xiaomi — face the worst margin pressure and are most likely to raise prices sharply, reduce memory configurations, or exit lower-end segments.
Yes. Microsoft Copilot+ AI PCs require a minimum of 16GB DRAM, and 32GB configurations — recommended for serious AI inference workloads — are significantly more expensive under current pricing. The shortage has raised the price floor for AI PCs at the exact moment the market was building adoption momentum.
Microsoft’s end of Windows 10 support was expected to trigger a large enterprise PC refresh wave in 2026. Instead, memory costs are causing businesses to extend refresh cycles by approximately 15% (Gartner). Organisations that cannot delay face paying peak memory prices for hardware that costs 15–20% more per unit than 12 months ago.
In terms of price velocity, yes. DRAM and NAND prices are projected to rise 130% year-on-year by end 2026. The 2026 shortage is distinctive because it is being deliberately sustained by economically rational wafer reallocation decisions — not COVID disruptions or factory fires — making it more persistent and predictable.
Yes, and this is already occurring in China, where US export controls restrict access to NVIDIA’s advanced AI chips. Chinese researchers have been using GeForce gaming GPUs for AI training and inference. NVIDIA’s pause therefore has a geopolitical dimension well beyond consumer gaming.
Server DRAM has risen sharply alongside consumer memory variants. The top four cloud providers are projected to spend $600 billion on AI buildouts in 2026 — 70% more than 2025. For CTOs evaluating on-premises AI infrastructure versus cloud deployment, the full cost framework is covered in the companion article.
How to Run AI Workloads and Manage Infrastructure Costs During a Memory ShortageDRAM prices have doubled. HBM is sold out through at least 2027. And unless you’ve got the kind of procurement leverage that comes with being a hyperscaler, you’re wearing the full impact. Every infrastructure line item — cloud GPU instances, on-prem servers, endpoint devices — costs more than it did six months ago. And it’s staying that way through the end of 2027.
You need to work two tracks at the same time:
This article walks through both, with a clear recommendation at the end of each section. For the full context of the AI memory shortage, start there.
Let’s talk numbers. DRAM contract prices surged 90–95% quarter-on-quarter in Q1 2026 — revised upward from an initial 55–60% forecast. DDR5 32GB modules have crossed the $500 mark. Samsung bumped their 32GB DDR5 module prices from $149 to $239 — a 60% increase, with contract pricing surging more than 100%.
That flows through to everything you buy. Enterprise PC and laptop prices are expected to rise 17% in 2026 because memory now makes up 23% of the total bill of materials, up from 16% in 2025. Cloud GPU instance pricing is climbing too — AWS raised EC2 Capacity Block prices for ML instances by around 15% in January 2026.
The shortage hits both CapEx and OpEx. There’s no cost-free path right now. And it hits smaller organisations harder — the hyperscalers lock in pricing through Long-Term Supply Agreements with multi-year, high-volume commitments. If you’re buying on shorter contracts or spot pricing, you’re absorbing the worst of the volatility.
Gartner projects a 130% year-on-year rise in DRAM prices in 2026, with prices not expected to normalise through the end of 2027. Intel CEO Lip-Bu Tan put it bluntly: “There’s no relief until 2028.”
There are actually two distinct shortages happening at once. HBM production eats roughly three times the wafer capacity of standard DRAM per gigabyte, and SK Hynix (~50% market share), Samsung (~40%), and Micron (~10%) have it sold out through at least 2027. Then the DDR5/NAND shortage affects everything else as manufacturers shift capacity toward HBM. For the price data to use in budget modelling and for why AI applications need so much memory, those pieces cover the foundations.
The recommendation: Budget for sustained elevated pricing through 2027. Model your infrastructure spend across cloud GPU instances, on-prem server DRAM, and enterprise endpoint devices — understand your exposure in each.
Neither is free. Cloud providers are passing HBM cost increases through. On-prem server DRAM has doubled. The right answer depends on five things: your existing infrastructure investment, your team’s operational capability, the type of workload (training vs inference), how much contract flexibility you need, and whether you prefer CapEx or OpEx.
Cloud has some real advantages during the shortage: no upfront capital for hardware that may depreciate faster than expected, the ability to blend spot and reserved instances, and access to alternative silicon like AWS Trainium for compatible workloads. On-prem has its own: predictable costs once you’ve bought the hardware, better economics at sustained high utilisation, and no exposure to future cloud price increases.
Be honest about the TCO comparison. A proper apples-to-apples model needs a 3-year horizon that includes hardware amortisation, staffing costs to manage the infrastructure, power, networking, and opportunity cost. Cloud looks expensive on a per-hour basis but it includes all of that. On-prem looks cheap on paper until you add the hidden costs — data egress fees, idle resource costs, and the people to keep it running. And don’t assume cloud availability is a given — as hyperscalers prioritise scarce GPU capacity, other workloads can experience throttling or degraded performance.
But utilisation makes or breaks on-prem. Nearly half of enterprises waste millions on underutilised GPU capacity. Industry data shows inference GPU utilisation as low as 15–30% without active management. If your GPUs are sitting idle, cloud is cheaper. It really is that simple.
Here are the rough breakpoints:
Spot instances cut your cloud GPU costs by 60–70% below on-demand pricing, but they can be interrupted. For training: implement checkpoint/resume patterns (saving model state to S3 at regular intervals) and interruptions become manageable. For production inference: use reserved capacity for baseline demand, spot for overflow. Running a mixed fleet across availability zones improves spot availability. Orchestration frameworks like Kubeflow or Ray automate instance selection and failover.
The recommendation: Model the decision using your actual utilisation data. If you don’t have utilisation data, start on cloud until you do.
That covers the hardware-side decisions. But there’s a parallel track that’s faster to implement and doesn’t require any new procurement at all — the software side.
Software-side mitigation requires no new procurement. You can start today.
Start with quantisation. 4-bit quantisation cuts memory footprint by approximately 75% versus 32-bit models; 8-bit cuts it by approximately 50%. Weight-only quantisation (A16W4) works best for memory-bound workloads with small batch sizes — that’s the typical inference pattern for organisations running their own models. Activation quantisation (A8W8) suits high-throughput serving with large batch sizes.
Dropbox Dash is a good reference point. Their engineering team deployed FP8, INT8, and INT4 quantisation along with KV cache quantisation across their AI product, and they documented concrete cost and performance results. Worth looking at if you want to see what this looks like in production rather than just benchmarks.
For models under 13B parameters on workstation GPUs: start with INT4 weight-only. Larger models on data centre GPUs (A100, H100): FP8 preserves more accuracy while cutting memory roughly in half. INT8 is a solid general-purpose choice. Newer formats like MXFP and NVFP4 are worth watching as hardware support broadens. The accuracy tradeoff depends on the task — summarisation tolerates quantisation well, precise numerical reasoning less so. Test on your actual workload.
The KV cache is the model’s short-term memory — it stores intermediate attention results during inference so the model doesn’t have to recompute them. Every new token needs to “look back” to every previous token. Longer context windows mean proportionally larger KV caches. If you’re running a 128K context window when 16K would do, you’re burning memory for nothing.
PagedAttention, implemented in vLLM, treats GPU memory like virtual memory — it eliminates fragmentation and enables prefix sharing across requests. If you’re running LLM inference in production without vLLM or something similar, that’s your next move.
The concept tying all of this together is the memory wall. As Sha Rabii, co-founder of Majestic Labs, puts it: “Your performance is limited by the amount of memory and the speed of the memory that you have, and if you keep adding more GPUs, it’s not a win.” Majestic Labs is building a 128TB inference system that deliberately avoids HBM — showing that memory-efficient architecture is a legitimate design direction, not just a workaround. For the technical foundation: memory wall and KV cache, that piece covers why AI applications need so much memory and why adding more GPUs alone doesn’t solve the problem.
The memory wall is also why the remaining techniques matter. Speculative decoding uses a smaller draft model to generate candidate tokens that the main model verifies in parallel — you get better throughput without a proportional memory increase. Disaggregated serving separates the prefill phase from the decode phase, running each on different hardware tiers. Red Hat‘s llm-d project implements this and reports 30–50% cost reductions. Knowledge distillation trains permanently smaller models from larger ones — it takes weeks to months, but the results are durable.
Rank by implementation effort: quantisation (days), vLLM/PagedAttention (framework selection), speculative decoding (configuration tuning), disaggregated serving (architecture change), distillation (medium-term training investment). Start at the top.
The recommendation: Implement quantisation this week. Evaluate vLLM this month. Queue the rest after you’ve captured the quick wins.
You can’t get the Long-Term Supply Agreements the hyperscalers use. That’s a market structure issue, not a negotiation skill gap. Samsung recently signed a “non-cancellable, non-returnable” contract with a key server customer — supply earmarked for that customer provides zero relief for the rest of the market. These agreements require volume commitments and multi-year horizons that simply aren’t available at smaller scales.
Here’s the tactical checklist.
Multi-vendor sourcing. Engage SK Hynix, Samsung, and Micron (or their distributors) at the same time. Competitive tension between suppliers improves pricing even at modest volumes. You won’t get hyperscaler terms, but you can avoid being captive to a single vendor’s allocation decisions.
Pre-load inventory. If you know you need hardware in the next 12 months, buying now locks in current pricing. Deloitte projects further DRAM price increases of ~50% are possible. Gartner’s Ranjit Atwal puts it directly: “Buy now, or wait, because whatever you’re getting at the moment is going to be the best price.”
Work with procurement aggregators who combine demand from multiple organisations to negotiate enterprise-tier pricing. This is the closest thing you’ll get to hyperscaler leverage.
Lock extended quote validity. Vendors are guaranteeing prices for only two or three weeks. Push for 60–90 day windows.
Choose contract duration carefully. Six-month contracts preserve flexibility but offer no price protection. Annual contracts provide modest discounts but risk overpayment if prices stabilise faster than expected. Match your commitment length to your business certainty, not to the vendor’s sales cycle. Cloud GPU spot instances also work as a hedge — if on-prem hardware becomes uneconomical, spot capacity gives you a fallback without long-term commitment.
The recommendation: Source from multiple vendors, pre-load known-needed hardware, engage a procurement aggregator. For the price data to use in budget modelling, that piece has the specific numbers.
15% of organisations are extending PC refresh timelines. But deferral doesn’t save money if prices keep rising. And OEMs are reducing default RAM and SSD configurations to manage headline pricing — so you might end up paying the same price for a lower-spec device later.
Windows 10 End-of-Life (October 2025) adds a forcing function for compliance-related endpoints. Windows 11 has higher RAM and SSD requirements, and that’s hitting at exactly the wrong time. If you’re thinking about AI PCs, those need a minimum of 16GB RAM (the Microsoft Copilot+ PC spec), making memory even more of a budget factor.
The right approach is category-by-category, not all-or-nothing.
Accelerate refresh for compliance-related endpoints — anything still on Windows 10 that handles sensitive data or faces regulatory requirements. Lock pricing on approved quotes immediately. Accelerate developer workstations too — these are productivity multipliers, and skimping on developer hardware costs you more in lost output than you save on the purchase price.
Extend general office hardware 6–12 months. Standard office endpoints running Windows 11 can wait. There’s no functional reason to replace them at inflated prices if they’re doing the job. Deprioritise meeting room hardware and shared devices entirely.
For AI infrastructure specifically, fab expansion timelines suggest meaningful relief is unlikely before Q4 2027. If the workload can stay on cloud in the interim, that’s the safer path.
The recommendation: Prioritise by category. Lock pricing on compliance and developer hardware. Extend everything else. Budget for 15–20% cost inflation on all endpoint hardware through 2027.
Point estimates won’t hold. IDC‘s December 2025 “most pessimistic scenario” was already exceeded by February 2026 — the speed at which pricing has moved has shocked everybody, including the analysts. Model three scenarios instead.
Base case: prices plateau at current levels. DRAM stays at roughly current pricing. Cloud GPU instances hold at current rates. This is your planning floor.
Downside case: further increases through 2027. Supply continues tightening and prices climb another 30–50%. DRAM capex from memory makers is expected to rise only 14% in 2026, and NAND capex only 5% — the manufacturers aren’t rushing to fix the shortage. Stress-test against it.
Upside case: relief begins Q4 2027. New fab capacity from SK Hynix (Cheongju, 2027), Micron (Singapore, 2027; Idaho, 2027–2028), and Samsung (Pyeongtaek, 2028) begins delivering. Prices start declining in early 2028. This is your planning horizon for deferred purchases.
Project costs across all three budget categories for each scenario: cloud AI compute, on-prem server hardware, and enterprise endpoint refresh. The practical allocation: commit 70% of budget against the base case, reserve 20% as downside contingency, and identify which purchases can be deferred to capture the upside.
Focus your optimisation effort where it counts. Inference accounts for 80–90% of total AI spend. Every dollar saved through quantisation or KV cache optimisation reduces your exposure across all three scenarios. Build ongoing visibility through FinOps practices — track cost-per-inference as your primary unit metric. Tools like OpenCost give you real-time cost data across namespaces, pods, and nodes, so you can measure the actual impact of each optimisation rather than guessing.
The recommendation: Build all three scenarios into your infrastructure budget. Allocate against the base, reserve for downside, identify deferral candidates for upside. Track cost-per-inference quarterly. Revisit every quarter — conditions are moving faster than the forecasts.
For a complete overview of the memory supply crisis driving these costs — including the root causes, market dynamics, and what comes next — see the full context of the AI memory shortage.
4-bit quantisation cuts your memory footprint by roughly 75% compared to FP32 models. 8-bit brings it down by about 50%. These ratios hold across most transformer-based LLMs. If you’re running typical inference workloads with small batch sizes, weight-only quantisation (A16W4) is the place to start.
It depends on scale and utilisation. For 1–8 GPUs, cloud is usually the better deal. At 8–64 GPUs with sustained utilisation above 70%, on-prem starts to make sense over a 3-year TCO horizon. Neither option escapes the memory cost inflation.
The KV cache stores intermediate attention computation results during LLM inference so the model doesn’t have to re-compute context for every generated token. Longer context windows need proportionally larger KV caches, which directly increases your memory demand and cost.
New fab capacity from SK Hynix, Samsung, and Micron won’t deliver meaningful supply relief before Q4 2027 at the earliest. Budget for sustained elevated pricing through at least 2027 — meaningful price declines aren’t likely before early 2028.
vLLM is an open-source LLM inference framework that implements PagedAttention, continuous batching, speculative decoding, and quantisation support. It’s become the de facto standard for production LLM serving. If you’re running LLM inference in production, vLLM should be your first evaluation point.
Spot instances cut cloud GPU costs by 60–70% but they can be interrupted. They work well for training with checkpoint/resume patterns. For production inference, use reserved capacity for your baseline demand and spot for overflow. A mixed fleet across availability zones improves spot availability.
AWS Trainium (Trn1/Trn2 instances) offers meaningful cost reductions compared to P5 (H100) instances for compatible workloads. It requires the AWS Neuron SDK and isn’t universally compatible. Worth evaluating if your workloads run on AWS and you can tolerate some framework lock-in.
Accelerate for compliance-related endpoints and developer workstations. Extend general office hardware by 6–12 months. OEMs are reducing default configurations to manage pricing, so deferred purchases may get you lower-spec devices at similar prices.
Disaggregated serving separates the prefill phase (processing input) from the decode phase (generating output). Prefill runs on high-performance GPUs while decode runs on cheaper hardware. The result is 30–50% infrastructure cost reductions by matching hardware to what each phase actually needs.
Start with cost-per-inference as your primary metric. Layer in GPU utilisation tracking and cloud spend breakdowns by workload type. Measure each optimisation individually so you know what’s actually moving the needle. OpenCost and cloud provider tools handle the instrumentation.
The memory wall is the point where adding more GPUs without proportional memory bandwidth doesn’t actually improve performance. That’s why software-side optimisations like quantisation and KV cache management are structurally valuable — they reduce memory demand rather than throwing more hardware at the problem.
Samsung SK Hynix Micron and the Hyperscalers Who Locked Up All the MemoryThree companies make all the world’s high-bandwidth memory. Samsung, SK Hynix, and Micron. That’s it. HBM is the specialised DRAM that every AI accelerator needs, and the big cloud providers — AWS, Google, Microsoft, Meta — have used long-term supply agreements to lock in capacity well before production runs even start. If you’re not one of them, you get the spot market. Volatile pricing, no guarantees.
The supply was spoken for before most buyers even knew there was a shortage.
This article maps out where the three memory makers stand, explains how those long-term agreements structurally favour hyperscalers, and shows why AI inference demand keeps the pressure building. For the full picture of the AI memory shortage, start with our pillar overview.
There are only three companies producing HBM at scale. As of Q2 2025, SK Hynix held 62% of the HBM market, Micron held 21%, and Samsung held 17%. There is no fourth player waiting in the wings.
So what actually is HBM? It’s a stacked DRAM architecture — up to 12 thinned-down DRAM dies piled on top of each other and connected by through-silicon vias (TSVs), sitting within a millimetre of a GPU. It costs roughly three times as much as conventional DRAM per bit, and SemiAnalysis estimates it accounts for 50 percent or more of the cost of a packaged GPU. That makes HBM the single biggest cost item in AI accelerator hardware.
Every major AI GPU needs it. Nvidia’s Blackwell B300 uses eight 12-die HBM3E stacks. AMD’s MI350 uses the same setup. Nvidia’s next-generation Rubin GPU comes with up to 288 gigabytes of HBM4. HBM isn’t optional — it’s the bottleneck component.
And the oligopoly isn’t going anywhere. New fabs cost $15 billion or more, take years to build, and require decades of process expertise. Nobody new is walking into this market to put downward pressure on prices. For the technical case for HBM production economics, see our companion article on the supply chain.
Samsung was the world’s largest memory maker by volume. Then SK Hynix beat it in annual operating profit for the first time in 2025 — 47.2 trillion won versus Samsung’s 43.6 trillion won. Now, that comparison needs some context. Samsung is a diversified conglomerate with fingers in consumer electronics, foundry services, and displays. But SK Hynix, focused purely on memory, outearned Samsung’s entire company.
The technical reason boils down to packaging. SK Hynix uses MR-MUF (mass reflow molded underfill), which handles heat better when you’re dealing with that layer cake of stacked silicon. Samsung’s NCF (Non-Conductive Film) approach ran into quality and yield problems with HBM3E. When you’re stacking 12 dies on top of each other, thermal management and yield rates are what make or break commercial viability.
SK Hynix turned that process advantage into a deep relationship with Nvidia, securing over two-thirds of HBM supply orders for Nvidia’s next-generation Vera Rubin products. Samsung still holds around 60 percent of the HBM on Google’s TPUs, but that’s a narrower position to be in.
Samsung is fighting back with its Pyeongtaek P5 fab, but mass production will not begin until 2028. As Counterpoint Research director MS Hwang put it: “We expect Samsung to show a significant turnaround with HBM4 for Nvidia’s new products, moving past last year’s quality issues.”
While SK Hynix and Samsung fight it out for the top spot, Micron made a completely different kind of bet. It restructured its entire business around AI.
Micron’s HBM and cloud memory revenue grew from 17 percent of its DRAM revenue in 2023 to nearly 50 percent in 2025. Then in December 2025, the company discontinued its consumer-facing Crucial brand and walked away from consumer DRAM entirely. All wafer capacity redirected to AI and enterprise memory.
Think about what that signals. A company making good money selling memory to PC builders decided the AI opportunity was so big and so permanent that consumer memory wasn’t worth the wafer space anymore. CEO Sanjay Mehrotra projects the HBM market growing from $35 billion in 2025 to $100 billion by 2028 — hitting that number two years earlier than Micron had previously expected. For how to manage infrastructure costs when supply is this constrained, see our actionable guide for SMB CTOs.
And it’s still not enough. As Micron business chief Sumit Sadana put it: “We’re sold out for 2026.” Even with all that redirected capacity, the company can only meet two-thirds of medium-term requirements for some customers.
Long-term supply agreements (LTAs) are multi-year capacity reservation contracts. A hyperscaler commits capital 12 to 24 months in advance to lock in guaranteed memory allocation from a manufacturer. The memory is spoken for before it’s even produced.
How seriously do they take this? Google reportedly fired a procurement executive who failed to sign LTAs ahead of the shortage. When Microsoft executives visited SK Hynix headquarters and were told their conditions would be “difficult” to meet, one of them stormed out of the meeting.
Hyperscalers now station dedicated procurement executives in South Korea. Google posted a job for a “Global Memory Commodity Manager.” Meta is hiring “Memory Silicon Global Sourcing Managers.” This isn’t routine purchasing — it’s a strategic function.
The mechanism is straightforward: hyperscale cloud providers secure supply through long-term commitments, capacity reservations, and direct fab investments, getting lower costs and guaranteed availability. Everyone else fights over what’s left.
You can’t replicate that level of capital pre-commitment, volume guarantee, or dedicated procurement infrastructure. As semiconductor analyst Manish Rawat at TechInsights noted: “This imbalance creates a dual constraint for the mid-market: higher input costs and longer delivery timelines.” For what this means for enterprises who cannot match hyperscaler LTAs, see our article on managing infrastructure costs.
AI training requires a temporary burst of compute and memory. AI inference — serving users in real time — requires continuous memory allocation that grows with every new user and every new deployment.
During inference, a large language model stores its state in the key-value cache (KV cache). Think of it as the model’s working memory. It sits in HBM during active sessions and gets pushed to slower system memory when idle. Storing these precomputed caches reduces the compute required for extended sessions, but it chews through memory.
Demand compounds through three factors: concurrent users, context window length, and number of deployed models. As hyperscalers roll out more AI models to more users with longer context windows, their HBM appetite grows continuously. And as AI moves towards agent-based systems, those agents need frequent access to extensive vector databases for retrieval-augmented generation (RAG), adding yet another persistent demand driver.
Hyperscaler LTA volumes renew and grow with each deployment cycle. Even if fab capacity increases, demand may keep outpacing supply for the foreseeable planning horizon.
All three memory makers are building new capacity and spreading it across multiple countries — geographic diversification that doubles as a geopolitical hedge.
SK Hynix has an HBM fab in Cheongju, South Korea, targeted for 2027, and a facility in West Lafayette, Indiana, for 2028. Samsung’s Pyeongtaek P5 targets 2028. Micron is building an HBM fab in Singapore for 2027, retooling a fab in Taiwan for second-half 2027, and broke ground on a DRAM fab complex in New York that won’t hit full production until 2030.
Multiple industry sources point to 2028 as the earliest date for meaningful supply relief. As Intel CEO Lip-Bu Tan put it: “There’s no relief until 2028.”
Multiple industry sources point to 2028 as the consensus date for meaningful supply relief — and even then, hyperscalers with existing LTAs will have priority access to new capacity before it reaches the open market.
With LTA-secured supply going to hyperscalers, everyone else is on the spot market. And the prices tell the story. Samsung raised prices for 32GB DDR5 modules to $239 from $149 — a 60% increase. DDR5 contract pricing has surged past $19.50 per unit, up from around $7 earlier in 2025.
Gartner forecasts a 47% DRAM price increase in 2026. IDC expects only 16% year-on-year DRAM supply growth — well below demand. IDC puts it plainly: this is “a potentially permanent, strategic reallocation of the world’s silicon wafer capacity” — not a temporary pricing spike.
The question for your organisation is how to plan procurement, budgets, and infrastructure strategy around a supply chain where hyperscalers hold structural priority. For strategies for managing infrastructure costs during the shortage, see our actionable guide. For whether Chinese domestic DRAM changes the competitive picture, we cover the geopolitical factors separately. And for the complete view, read our deep-dive into the memory crisis.
HBM uses 3D die stacking — up to 12 thinned DRAM dies connected by through-silicon vias — to deliver far higher bandwidth than DDR5. It costs about 3x more per bit to produce and accounts for over half the cost of a packaged AI GPU.
Building a memory fab costs upwards of $15 billion, takes years, and demands decades of process learning. Samsung, SK Hynix, and Micron collectively hold 100% of the HBM market. Chinese manufacturers like CXMT are expanding but still face large technology gaps and export control restrictions.
An LTA is a capacity reservation contract where a buyer pre-commits capital — typically 12 to 24 months ahead — to guarantee memory allocation directly from a manufacturer. Hyperscalers use LTAs to secure supply before it reaches the open market.
DDR5 contract pricing went from about $7 to $19.50 per unit. Samsung raised 32GB DDR5 module prices from $149 to $239. TrendForce estimates Q1 2026 DRAM contracts surged 90-95% quarter-on-quarter, and Gartner forecasts a 47% overall DRAM price increase for the year.
SK Hynix’s MR-MUF packaging technology delivers better thermal performance and higher yields than Samsung’s NCF approach. That process advantage, combined with early and deep supply contracts with Nvidia, gave SK Hynix roughly 62% of HBM market share versus Samsung’s 17%.
Micron shut down its Crucial consumer brand in December 2025 and redirected all wafer capacity to AI and enterprise memory. AI-related DRAM revenue went from 17% to nearly 50% of total DRAM revenue between 2023 and 2025, and the company is fully committed for 2026.
Industry consensus points to 2028 at the earliest. New fabs from all three manufacturers — SK Hynix in Cheongju (2027) and West Lafayette (2028), Samsung in Pyeongtaek (2028), Micron in Singapore and Taiwan (2027) — will take years to reach full output. Intel CEO Lip-Bu Tan summed it up: “There’s no relief until 2028.”
It’s an AI model’s working memory during inference. Precomputed attention states are held in HBM while a session is active. Memory requirements scale with concurrent users, context window length, and the number of deployed models — so demand compounds rather than levelling off.
TrendForce projects combined CapEx of the top eight cloud service providers at over $710 billion in 2026. A large share of that flows directly into memory procurement through LTAs with Samsung, SK Hynix, and Micron.
No. LTAs require the kind of capital pre-commitment, volume, and dedicated procurement staff that mid-market and SMB companies simply don’t have. Hyperscalers station procurement executives at memory fabs and have created roles like “Memory Silicon Global Sourcing Manager” — that level of investment is out of reach for smaller organisations.
Chinese production — notably CXMT — is growing, but it faces significant HBM technology gaps and export control constraints. Whether Chinese manufacturers can meaningfully change the three-player oligopoly is covered in our companion article on geopolitical factors shaping memory supply.
Training uses a temporary burst of memory and compute. Inference runs continuously, serving concurrent users with expanding context windows, which creates compounding memory demand that grows with each new deployment rather than reaching a fixed ceiling.
DRAM Prices in 2026 Have Doubled and the Numbers Are Getting WorseSix months ago, DRAM cost $0.43 per gigabit. Dell’s COO says it recently hit $2.39. That’s a 5.5x increase in half a year, and it shows no sign of letting up.
This isn’t a single-product blip. DRAM, LPDDR5x, and NAND flash are all getting hammered with 90–95% quarter-on-quarter price surges in Q1 2026. The structural driver is AI infrastructure — and that’s what makes this fundamentally different from the 2021 COVID-era shortage and much harder to plan around.
We’ve pulled together every major analyst forecast on memory pricing in one place. Bookmark it, send it to your stakeholders. For the broader memory shortage story, we’ve put together a full overview of how AI’s appetite for memory is reshaping the tech industry.
The headline number: TrendForce’s February 2026 survey revised its Q1 2026 forecast for conventional DRAM contract prices to 90–95% quarter-on-quarter. That’s the largest quarterly increase on record. Their earlier January estimate was 55–60% QoQ — the forecast nearly doubled within weeks.
PC DRAM specifically is tracking over 100% QoQ increases in some contract categories.
And it’s not just conventional DRAM. LPDDR5x contract prices surged roughly 90% QoQ — what TrendForce calls “the steepest increases in their history.” NAND flash prices rose 55–60% QoQ, with enterprise SSD pricing up 53–58% QoQ. That’s three simultaneous price shocks across different memory types, all landing at once.
Gartner’s research director Ranjit Atwal put it plainly: “The speed at which the memory pricing has increased has shocked everybody.” Gartner projects a 130% year-on-year DRAM price rise for 2026.
Here’s where things get genuinely strange. Counterpoint Research reports that DDR4 spot prices have hit $2.10 per gigabit — which now exceeds advanced HBM3e at $1.70 per gigabit. Old memory costs more than the cutting-edge stuff. That’s a legacy price inversion, and it tells you just how distorted this market has become.
The gap between contract and spot pricing matters here too. Enterprise buyers locked into quarterly contracts have some insulation. Those buying on the spot market are wearing the full brunt. If your procurement is on spot, you’re paying more for DDR4 than hyperscalers pay for HBM3e.
For why prices rose so sharply — the production mechanics — we’ve covered that in detail separately.
TrendForce projects the global memory market at $551.6 billion in 2026, rising to $842.7 billion in 2027. DRAM revenue alone is up 144% year-on-year, hitting $404.3 billion. NAND flash revenue is up 112% YoY at $147.3 billion.
Those numbers tell a specific story — revenue is surging because of price, not because more memory is shipping to consumer markets. The volume going to PCs and smartphones is actually shrinking. It’s the price per unit that’s driving everything.
Samsung regained the number one DRAM revenue position in Q4 2025. SK Hynix leads in HBM production. Micron’s HBM and cloud-related memory revenue grew from 17% of its DRAM revenue in 2023 to nearly 50% in 2025. Micron CEO Sanjay Mehrotra projects the HBM market will grow from $35 billion in 2025 to $100 billion by 2028.
Here’s the thing to keep in mind: what’s profit for Samsung, SK Hynix, and Micron is cost for everyone else. Every record revenue quarter for memory makers is a record bill-of-materials quarter for OEMs.
When IDC, Gartner, and Counterpoint Research all converge on the same direction, the signal is pretty clear: the PC and smartphone markets are contracting.
IDC projects the PC market will shrink 11.3% in 2026. Gartner forecasts 10.4%. For smartphones, Counterpoint Research projects shipments down 12% — “the sharpest decline on record.” IDC has smartphones down 12.9%; Gartner at –8.4%.
HP reported in its Q1 earnings call that memory now accounts for 35% of the cost to build a PC, up from 15–18% the previous quarter. That makes memory the single largest cost driver in PC manufacturing. Let that sink in — one component, 35% of the build cost.
At the product level, Lenovo, Dell, HP, Acer, and ASUS have all flagged 15–20% price hikes and are shortening quote validity windows. Some OEMs are shipping lower-SSD configurations just to keep headline prices down — less storage for more money.
The timing couldn’t be worse for enterprise IT. Windows 10 end-of-life in October 2025 triggered refresh cycles that are now colliding head-on with the shortage. And AI PCs require a minimum 16GB RAM — that’s Microsoft’s Copilot+ standard — so devices need more memory at precisely the moment memory is most expensive.
Not everyone is equally exposed, though. Apple and Samsung have long-term supply agreements that secure memory 12–24 months ahead. Budget Android OEMs don’t have that luxury and have no choice but to pass costs straight to consumers.
For a deeper look at how these price increases are hitting the PC and smartphone markets, we’ve covered the device-level impact separately.
The 2021 shortage was a supply-side disruption. Factories closed, logistics chains broke, and pandemic demand for work-from-home gear spiked. Hyperscalers bought up huge memory inventories, prices surged, then supply normalised.
The 2026 shortage is demand-driven. AI infrastructure demand for HBM is consuming wafer capacity that would otherwise produce conventional DRAM and NAND. That distinction matters: supply disruptions resolve when logistics normalise. Demand-driven shortages persist as long as the demand source keeps growing — and AI demand isn’t slowing down.
The setup traces back to the 2022–2023 memory bust. Samsung cut production by roughly 50%. After that bust, all three major memory companies pulled back on investment. IEEE Spectrum’s Thomas Coughlin put it bluntly: “There was little or no investment in new production capacity in 2024 and through most of 2025.”
Then the AI demand wave hit, and there was simply no spare capacity to absorb it.
By the numbers, 2026 is worse. Q1 2026 is delivering 90–95% QoQ increases — there’s no comparable single-quarter spike in the historical record. IDC characterises the situation as “not just a cyclical shortage driven by a mismatch in supply and demand, but a potentially permanent, strategic reallocation of the world’s silicon wafer capacity.”
McKinsey projects $7 trillion in data centre spending through 2030, with $5.2 trillion AI-focused. That demand concentration is why AI is driving this price surge.
Here are the data points from each firm that we haven’t covered above.
IDC: DRAM supply growth in 2026 is projected at just 16% YoY and NAND at 17% YoY — well below historical norms. IDC uses the phrase “structural reset” to describe what’s happening. That’s not the kind of language you want to hear from analysts.
Gartner’s Atwal: “Price is not only increasing in the short term… it’s going to remain high almost through to the end of 2027.”
Counterpoint Research places the earliest inflection point at Q4 2027.
Intel CEO Lip-Bu Tan, speaking at the Cisco AI Summit, was blunter: “There’s no relief until 2028.”
The fab construction timelines explain why. New capacity is coming — Micron Singapore (2027), SK Hynix Cheongju (2027), Samsung Pyeongtaek (2028), SK Hynix West Lafayette (end of 2028), Micron’s Onondaga County megafab (full production 2030). But building a fab and getting it to volume production takes 18 months or more. That structural lag is why new capacity can’t relieve the current shortage quickly.
For the full timeline analysis of when relief arrives, we cover the fab-by-fab breakdown in detail.
Yes. The data points in one direction for the next 18 months.
The only near-term supply lever is incremental process node upgrades within existing fabs. TrendForce notes that “gains in output can be achieved only through incremental process upgrades, making short-term capacity tightness difficult to resolve.” And it gets worse — NAND flash manufacturers are reallocating production lines to DRAM because it’s more profitable, which further squeezes NAND capacity. The two shortages are feeding each other.
Meanwhile, demand keeps climbing. NVIDIA’s B300 GPU uses eight HBM chips, each containing 12 DRAM dies. AMD’s MI350 follows the same architecture. Every new GPU generation ratchets up per-unit memory consumption.
There’s also a supply lock-up problem. North American hyperscalers — Meta, Google, Microsoft, Amazon — are negotiating long-term DRAM agreements with memory suppliers. Counterpoint Research’s Tarun Pathak put it directly: “A lot of these memory companies are asking smartphone vendors to stand in line behind the hyperscalers.”
And here’s the part that matters for your budget planning: memory prices historically fall more slowly than they rise. Economist Mina Kim of Mkecon Insights told IEEE Spectrum: “In general, economists find that prices come down much more slowly and reluctantly than they go up. DRAM today is unlikely to be an exception.”
Given the outlook, you’re going to want to keep tabs on this. Here are the primary sources we’ve referenced throughout.
TrendForce (trendforce.com) publishes quarterly DRAM and NAND contract price forecasts through its DRAMeXchange platform. Their February 2, 2026 press release is the definitive Q1 2026 pricing source.
IDC (idc.com) publishes quarterly PC and smartphone market trackers. Their December 2025 “Global Memory Shortage Crisis” report is the key public reference document.
Counterpoint Research (counterpointresearch.com) covers smartphone and memory market analysis with public-facing research summaries.
Gartner (gartner.com) publishes enterprise IT spending and semiconductor forecasts. The February 2026 data on shipment declines was reported through Computerworld.
IEEE Spectrum (spectrum.ieee.org) provides technical analysis and expert commentary. Samuel K. Moore’s “How and When the Memory Chip Shortage Will End” is the most comprehensive single source for the COVID-era comparison and fab timeline data.
NPI Financial (npi.ai) offers enterprise IT procurement advisory with pricing guidance for hardware buyers navigating the shortage.
For a complete picture of what the AI memory shortage means for tech companies — the root causes, who is winning and losing, and what to do about it — see the broader memory shortage story.
AI infrastructure demand for High-Bandwidth Memory (HBM) is eating up wafer capacity at Samsung, SK Hynix, and Micron — capacity that would otherwise produce the conventional DRAM that goes into PCs and smartphones. TrendForce reports DRAM prices surged 90–95% QoQ in Q1 2026 as a result.
Yes. Counterpoint Research puts the earliest inflection point at Q4 2027. Intel CEO Lip-Bu Tan stated “there’s no relief until 2028.” New fab capacity from Micron and SK Hynix doesn’t reach volume production until 2027 at the earliest.
Analyst consensus points to gradual relief beginning in late 2027 as new fab capacity comes online. But memory prices historically fall more slowly than they rise, so normalisation will lag behind supply additions. Gartner expects prices to remain high through to the end of 2027.
Memory is now the single largest cost component in many devices. HP reports memory represents 35% of PC bill of materials. With DRAM prices up 90–95% and NAND up 55–60%, OEMs including Lenovo, Acer, and ASUS have warned of 15–20% device price increases.
The 2021 COVID-era shortage was supply-side — factory closures, logistics breakdowns, pandemic-driven demand. The 2026 shortage is demand-driven — AI infrastructure is consuming wafer capacity. Demand-side shortages are harder to resolve because the demand source keeps growing.
HBM (High-Bandwidth Memory) stacks multiple DRAM dies vertically. Each NVIDIA B300 GPU requires eight HBM chips with 12 DRAM dies each. Manufacturing HBM consumes the same wafer capacity as conventional DRAM — every wafer allocated to HBM is a wafer that doesn’t go to a smartphone or laptop.
Based on fab construction timelines, meaningful new capacity arrives in 2027–2028 (Micron Singapore, SK Hynix Cheongju, Samsung Pyeongtaek). Full-scale production at major new fabs like Micron’s Onondaga County facility isn’t expected until 2030.
Contract prices are negotiated quarterly between memory suppliers and large buyers through long-term agreements (LTAs). Spot prices reflect what’s available on the open market. During shortages, spot prices spike faster and higher. Counterpoint Research noted DDR4 spot prices ($2.10/Gbit) now exceed HBM3e contract prices ($1.70/Gbit).
TrendForce projects the global memory market at $551.6 billion in 2026, with DRAM revenue up 144% year-on-year. The market is forecast to reach $842.7 billion in 2027.
Cloud service providers are themselves major memory buyers — North American hyperscalers are negotiating LTAs as of January 2026 — so memory cost inflation is baked into their infrastructure costs. Whether AWS, Azure, and GCP pass those costs through directly is an open question worth keeping an eye on.
NPI Financial advises locking in pricing early, as OEMs are shortening quote validity windows. Gartner’s Atwal recommends buying now: “Whatever you’re getting at the moment is going to be the best price.” No meaningful relief is expected before late 2027.
NAND flash prices rose 55–60% QoQ in Q1 2026, with enterprise SSD pricing up 53–58% QoQ per TrendForce. NAND market revenue is up 112% YoY. And NAND manufacturers are reallocating production lines to DRAM because it’s more profitable, which further limits NAND capacity.
How AI Killed the Memory Supply Chain and Why Everything Else Is Paying for ItIn early 2026, hyperscaler capital expenditure commitments locked in memory allocation priorities — and conventional DRAM supply began its steepest collapse in a decade. The problem isn’t cyclical. It’s structural. AI needs a specialised form of memory called HBM (high-bandwidth memory), and making it cannibalises the supply of the DRAM that goes into everything else — your PCs, your smartphones, your enterprise servers.
Here’s the ratio you need to know: every HBM stack eats roughly three times the silicon wafer capacity of conventional DRAM. As IDC put it: “every wafer allocated to an HBM stack for an Nvidia GPU is a wafer denied to the LPDDR5X module of a mid-range smartphone or the SSD of a consumer laptop.” Three manufacturers control virtually all global memory production. They’ve all made the same allocation decision.
This article traces the full causality chain: from LLM architecture through fab economics to why your next hardware refresh costs more. If you want the full scope of the AI memory shortage, we’ve put together a broader overview.
Large language models are memory-bandwidth-limited. Not compute-limited — memory-bandwidth-limited. That distinction matters because it changes what you need to throw money at to make AI work.
The “memory wall” is the point where GPU compute throughput outpaces how fast data can be transferred from memory. Your GPU sits there ready to go, but data can’t reach it fast enough. Micron’s Sumit Sadana put it plainly: “the processor spends more time just twiddling its thumbs, waiting for data.”
Think of it as a motorway feeding into a single-lane road. You can add as many lanes to the motorway as you want — more GPUs, faster GPUs — but if the road into town stays one lane wide, traffic doesn’t move any faster.
This is why LLMs changed the game. Earlier AI architectures like CNNs were compute-hungry but didn’t ask that much of memory. LLMs load billions of parameters, attention weights, and intermediate states, and they need a sustained high-bandwidth data flow to keep it all moving. AI systems have evolved from large-model training to systems that combine inference, memory, and decision-making, and that evolution has pushed memory demand well past what anyone planned for.
Conventional DDR5 DRAM delivers roughly 50–60 GB/s of bandwidth per channel. HBM delivers 1–2 TB/s. That’s a 20–40x gap. You’re not bridging that with incremental improvements. HBM isn’t optional for AI at scale. It’s architecturally mandatory.
HBM takes multiple thinned-down DRAM dies — currently 8 to 12, moving to 16 — stacks them vertically, connects them with thousands of microscopic through-silicon vias (TSVs), and mounts the whole stack right next to the GPU inside the same package.
The stacking is what delivers the bandwidth. Data flows through thousands of parallel vertical connections simultaneously rather than through a narrow external bus. It works. But it comes at a cost that’s reshaping the entire memory market.
Here’s the ratio that matters most. Micron’s Sumit Sadana confirmed it: “As we increase HBM supply, it leaves less memory left over for the non-HBM portion of the market, because of this three-to-one basis.” A 12-die HBM stack uses 12 wafers’ worth of DRAM dies to produce one memory unit, compared to a single die for a conventional module. Every wafer that goes to HBM is a wafer denied to DDR5, LPDDR5X, and everything else.
Samsung, SK Hynix, and Micron are all doing the same maths. HBM costs roughly three times more per gigabyte and makes up over 50% of the packaged cost of an AI GPU according to SemiAnalysis. The margins are better. The customers are richer. If you’re running a fab, the allocation decision is obvious. If you’re buying laptops, less so.
And it’s about to get worse. HBM4 moves to 16-die stacks. Nvidia’s Rubin GPU comes with up to 288GB of HBM4 per chip, with the NVL72 server rack combining 72 of those GPUs — over 20TB of HBM per rack. For a deeper look at who is profiting from HBM demand, we cover the manufacturer strategies separately.
This wasn’t a gradual trend. It was a tipping point.
Combined cloud provider CapEx is expected to exceed $710 billion in 2026, with AI infrastructure as the dominant allocation. That locked in HBM demand at a scale that forced all three memory makers to make irreversible wafer allocation decisions at the same time.
SK Hynix reported its HBM, DRAM, and NAND capacity was “essentially sold out” for 2026. Micron’s Sadana was equally blunt: “We’re sold out for 2026.” Micron exited the consumer memory market entirely to focus on enterprise and AI.
The timing was a perfect storm. Windows 10 end-of-life drove PC refresh demand, AI PC specs required 16–32GB RAM, and supply contracted — all at once.
The price data tells the story. DRAM prices rose 80–90% in Q1 2026 according to Counterpoint Research. Enterprise DDR5 64GB RDIMMs more than doubled, from roughly $7/unit to $19.50/unit. Samsung raised 32GB consumer module prices 60%, from $149 to $239.
TrendForce analyst Tom Hsu called the Q1 2026 price increases “unprecedented.” Intel CEO Lip-Bu Tan was more direct: “There’s no relief until 2028.” Gartner’s Ranjit Atwal noted the speed at which memory pricing has increased “has shocked everybody”, with an expected 130% year-on-year rise in 2026.
For the price data quantifying this shift, we break down the numbers in detail.
Training is a one-time memory allocation. Inference is where the compounding demand lives, and it comes down to something called the KV cache.
During inference, the model stores intermediate attention results in the KV (key-value) cache — essentially the model’s “short-term memory.” During an active session, this KV cache sits in HBM. When the session goes idle, it gets pushed to slower memory. But while it’s active, it’s sitting in the most expensive, most contested memory on the planet.
The KV cache grows linearly with context window length. Frontier models now reach 128K to 1M tokens of context. Each user session maintains its own cache in GPU memory. So the maths is straightforward: more concurrent users multiplied by longer context windows equals more HBM consumed per deployment. Usage, context windows, and user counts are all growing at the same time — the demand is multiplicative.
It gets worse with agentic AI. RAG, tool use, and long-running agent conversations all extend effective context and therefore memory footprint. This is why hyperscalers keep ordering more. It’s not hoarding. Actual usage growth consumes every gigabyte delivered.
Building a new semiconductor fab takes 3–5 years from ground-breaking to volume production. You cannot buy your way out of physics.
Here’s what’s in the pipeline. Micron has a Singapore fab reaching production in 2027, a Taiwan fab in H2 2027, and a New York megafab not in full production until 2030. SK Hynix has a South Korea fab completing in 2027 and Indiana facilities by end of 2028. Samsung plans a new Pyeongtaek plant in 2028. Billions of dollars of investment. None of it provides meaningful relief before 2027 at the earliest.
Even existing fabs face yield problems. Drilling thousands of TSVs per die and achieving sub-micron alignment across 12–16 stacked dies pushes manufacturing tolerances to their limits. Advanced packaging is a separate bottleneck — even if you had more HBM dies, packaging them using techniques like CoWoS requires specialised capacity that’s also constrained.
After the 2022 bust, memory companies were wary of expanding. As memory expert Thomas Coughlin noted, “there was little or no investment in new production capacity in 2024 and through most of 2025.” That caution is now colliding with a demand shock nobody prepared for.
Micron predicts the total HBM market will grow from $35 billion in 2025 to $100 billion by 2028 — larger than the entire DRAM market in 2024. For a detailed look at when new fabs can resolve the supply crunch, we cover the timelines separately.
The downstream numbers are already in. PC shipments will fall 10.4–11.3% in 2026 according to Gartner and IDC. Smartphone shipments are forecast to decline 8.4–12.9%. PC prices are expected to rise 17%, with memory now accounting for 23% of PC bill-of-materials cost, up from 16% in 2025.
HP said memory now accounts for 35% of the costs to build a PC, up from 15–18% the previous quarter. Dell COO Jeff Clarke warned “I don’t see how this will not make its way into the customer base” — and followed through, raising PC prices in January after cutting them just months earlier.
Lenovo, Dell, HP, Acer, and ASUS have warned clients of 15–20% price hikes and contract resets. Enterprise refresh cycles are extending by 15% as organisations delay purchases — but deferring just creates a demand backlog that amplifies future price pressure when those purchases can’t wait any longer.
On the smartphone side, LPDDR5x prices are forecast to jump roughly 90% quarter-on-quarter — the steepest increase in their history according to TrendForce. Rising memory costs could make low-end devices economically unsustainable, with some budget models pulled from the market altogether.
The procurement landscape has shifted structurally. Hyperscalers lock in supply through long-term reservations and direct fab investments. Mid-market firms rely on shorter contracts and spot sourcing. Gartner’s Ranjit Atwal summed it up: “Vendors aren’t guaranteeing prices for long now. They’re saying this is the price and it’s available for two or three weeks.” His advice: buy now, because whatever you’re paying today is likely the best price you’ll see for a while.
For our overview of the memory crisis, we put it all in context.
No. The COVID shortage was demand surging across many chip categories at the same time. The 2026 shortage is structurally different: it’s a deliberate wafer reallocation within the memory industry from conventional DRAM to HBM, driven by AI economics. The constraint is internal to the memory market, not external.
SemiAnalysis estimates HBM costs roughly three times more per gigabyte. It also represents over 50% of the total packaged cost of an AI GPU — the single most expensive component in AI accelerator hardware.
DDR5 delivers roughly 50–60 GB/s per channel. HBM delivers 1–2 TB/s. LLMs need sustained high-bandwidth data flow to keep GPU compute units busy. Using DDR5 would leave GPUs idle most of the time, making the compute investment largely wasted.
No meaningful price relief is expected before mid-2027 at the earliest, with structural pressure continuing through 2028. Intel CEO Lip-Bu Tan put it bluntly: “There’s no relief until 2028.” Deferring risks higher prices and longer lead times as deferred demand creates a backlog.
The KV cache stores intermediate computation results from LLM attention mechanisms during inference. It grows linearly with context window length and has to be maintained separately for each concurrent user session. As context windows expand and user counts grow, the KV cache becomes a major driver of HBM consumption.
Three companies — Samsung, SK Hynix, and Micron — manufacture virtually all of the world’s DRAM. Their wafer allocation decisions directly determine supply availability for every device category globally.
Chinese manufacturers such as ChangXin Memory Technologies are catching up but remain years behind. U.S. export controls restrict access to advanced lithography and packaging equipment needed for HBM production. No meaningful supply relief is expected in the 2026–2028 timeframe.
Cloud providers are themselves major HBM buyers with rising infrastructure costs. While long-term supply agreements partially insulate hyperscalers from spot volatility, increased memory costs will eventually flow through to cloud service pricing.
Advanced packaging — techniques like CoWoS and hybrid bonding — integrates HBM stacks and GPU dies into a single package. This capacity is limited and requires specialised equipment. Even if more HBM dies were manufactured, packaging constraints limit how quickly finished modules reach the market.
DRAM prices rose 80–90% in Q1 2026 alone (Counterpoint Research). Enterprise DDR5 64GB RDIMMs more than doubled, from roughly $7/unit to $19.50/unit. Samsung raised 32GB consumer modules 60%, from $149 to $239.
Rubin uses up to 288GB of HBM4 per chip, with the NVL72 rack combining 72 GPUs — over 20TB of HBM per rack. HBM4 moves to 16-die stacks, which means even more wafer consumption per unit. Each new GPU generation deepens structural HBM demand.
Some companies are exploring alternatives. Majestic Labs is designing an inference system with 128TB of memory using non-HBM architectures. But these involve significant performance trade-offs and aren’t proven at scale. For mainstream AI training and inference, HBM remains a hard architectural requirement.
What Is the AI Governance Gap and How Do You Close ItThe percentage of companies integrating AI into at least one business function surged to 72% in 2024, up from 55% the year before. Governance has not kept pace. Only 25% of organisations have fully operational AI governance programmes, and 76% say AI is moving faster than their governance can handle.
The distance between those two realities — rapid AI adoption on one side, immature governance on the other — is the AI governance gap. It creates exposure across data security, regulatory compliance, and operational accountability.
This page is a structured overview of the governance gap: what it is, where the exposure concentrates, and how to close it. The sections below cover shadow AI and why it differs from shadow IT, mid-market exposure patterns, the gap between policy and execution, what mature operating models look like, how to measure governance effectiveness, and what regulators now require. Each section links to a dedicated article where you can go deeper.
The AI governance gap is the measurable distance between your organisation’s rate of AI adoption and the maturity of the governance frameworks managing that AI. AI tool adoption surged from 55% to 72% of organisations between 2023 and 2024. Governance has not kept pace — nearly 74% of organisations report only moderate or limited coverage in their AI risk and governance frameworks. The gap creates real exposure across data security, regulatory compliance, and operational accountability.
The gap matters now because governance has shifted from aspiration to enforcement. For most of the last decade, AI governance was treated as a matter of intent — write a policy, signal good faith, move on. That stopped working in 2025 when regulators moved from guidance to enforcement.
The distinction to understand is between a policy and a framework. 75% of organisations have a written AI policy, but only 36% have adopted a formal governance framework. A policy is a document. A framework is an operational system with enforcement, accountability, and monitoring. The distance between those numbers — 75% and 36% — is where most organisations currently sit: documented intent without operational execution. That produces what governance practitioners call “governance theatre” — checkbox compliance that generates paperwork without reducing risk.
The most visible symptom of this gap is what shadow AI is and why it differs from shadow IT — and why this new threat is structurally harder to contain than its predecessor.
For a deeper look at the scale of the governance gap problem, see Shadow AI vs Shadow IT — What Makes the New Threat Harder to Govern. The difficulty of governing shadow AI is compounded when organisations lack the structural resources to detect it.
Shadow AI — AI tools used without organisational knowledge or approval — is harder to govern than traditional shadow IT because the exposure is less visible and potentially irreversible. A shadow SaaS tool creates an integration risk you can unwind. A shadow AI tool can ingest, analyse, and generate content from sensitive data in a single interaction. The data leaves your perimeter the moment the prompt is submitted.
71% of office workers use AI tools without IT approval, and OpenAI accounts for 53% of all shadow AI usage — more than the next nine platforms combined. At smaller firms, the density is even higher: companies with 11–50 employees average 269 unsanctioned AI tools per 1,000 employees.
Prohibitive bans do not work. As IBM Distinguished Engineer Jeff Crume puts it, “saying no doesn’t stop the behaviour, it just drives it underground”. Governance has to enable sanctioned use while containing unsanctioned exposure — which means publishing an approved tool list that gives people enterprise-grade alternatives to the tools they are already using.
For the full evidence base and practical containment strategies, read Shadow AI vs Shadow IT — What Makes the New Threat Harder to Govern.
Smaller organisations face disproportionate shadow AI risk for structural reasons, not because they are less competent. Accountability for AI governance is fragmented — CIOs hold it at 29% of firms, CDOs at 17%, CISOs at 14.5% — with no clear mandate for the technical executive running delivery. Governance tooling designed for enterprises with dedicated compliance staff does not translate to resource-constrained teams.
Companies with 11–50 employees show the densest shadow AI usage relative to headcount, yet only 23% of small organisations have a dedicated team driving generative AI adoption, compared to 52% of large enterprises. The smaller the firm, the larger the gap relative to capacity. Understanding how mid-market companies face disproportionate shadow AI exposure — and why the CTO accountability gap is most acute at this scale — is the starting point for right-sizing governance to the actual organisation.
For the full mid-market analysis, see Shadow AI in Mid-Market Companies — Why the Exposure Is Disproportionate.
An AI policy is a written statement of rules. An AI governance framework is the operational system that makes those rules real — roles, controls, monitoring, and measurement. Having a policy without a framework is the most common state: 75% of organisations have a written AI policy, but only 36% have adopted a formal governance framework. The distance between those numbers is the governance execution gap, and it is where most organisations currently sit.
The execution gap has four failure modes: role ambiguity (nobody owns enforcement), policy staleness (rules written for last year’s tooling), measurement absence (no way to know whether controls are working), and governance theatre (documentation that looks like control without providing it). An AI governance policy without enforcement mechanisms is a wish list.
Structurally, governance operates across five interlocking domains — Strategy, Compliance, Operations, Ethics, and Accountability. A strategy decision to prioritise a high-risk use case triggers compliance review, operations readiness, ethics evaluation, and accountability assignment simultaneously. What moves governance from policy to programme is an oversight function — an AI governance committee with clear RACI assignments, decision rights, and an operating cadence. That committee turns written rules into daily practice. It does not need to be large. It needs to be accountable. The practitioner’s execution playbook for moving from AI policy to AI practice covers every step of this transition in detail.
For the practitioner’s execution playbook, read From AI Policy to AI Practice — How to Build Governance That Actually Executes.
A mature AI operating model embeds governance into the daily mechanics of AI development and deployment. Governance leaders are 2.5x more likely to embed AI as a core pillar of business strategy. Their operating models include executive ownership proximity, a structured oversight function, risk-tiered use-case management, and continuous monitoring — not an annual audit. Laggards, by contrast, have written policies with diffuse or absent responsibility and no direct CEO oversight of AI governance at 72% of organisations.
Only 7% of organisations have fully embedded AI governance. Most sit at the Ad Hoc or Developing stages of a five-level maturity progression (Ad Hoc, Developing, Defined, Managed, Optimising). The pattern that produces the largest governance gaps is high adoption maturity with low governance maturity — you are deploying AI widely but governing it loosely.
Three governance model archetypes exist: centralised (one oversight body sets policy across the enterprise), federated (local governance per business line, coordinating centrally), and hybrid (central policy with federated execution). The hybrid model is the default for scaling organisations because it balances consistency with operational flexibility. How governance leaders differ from laggards comes down to how deliberately they have designed this operating model, not just how much they have documented.
For the strategic architecture of governance leadership, read The AI Operating Model — What Separates Governance Leaders from Laggards.
Governance that employees follow is governance that makes the right path the easy path. Start with an AI tool inventory, then publish an approved AI tool list that provides enterprise-grade alternatives to the shadow tools already in use. Role-based access controls create the technical enforcement layer. Distributed enablement — AI champions embedded in teams — creates the cultural adoption layer, bridging the gap between central policy and frontline practice.
The communication gap is wide: 78% of organisations have not communicated a clear plan for AI integration, and 58% of employees have not received formal training on safe AI use. Governance that nobody knows about is governance that nobody follows.
An intake-to-value mechanism — a structured approval process for new AI use cases — keeps governance proportional to organisational capacity. Instead of blanket rules, each proposed use case is assessed against risk tier, data sensitivity, and regulatory scope, then routed to the appropriate approval path. This keeps governance from becoming a bottleneck while maintaining oversight where it counts. How to build AI governance that actually executes — with role-based controls, lightweight approval workflows, and shadow AI detection — is covered step-by-step in the dedicated execution guide.
For the step-by-step execution guide, read From AI Policy to AI Practice — How to Build Governance That Actually Executes.
Fewer than 20% of organisations track well-defined GenAI KPIs. Without measurement, governance effort cannot be verified, improved, or demonstrated to regulators or boards. As IBM Distinguished Engineer Jeff Crume notes, “it’s pretty hard to know if you’re succeeding if you’ve never even defined the benchmarks”.
A working governance programme tracks four core indicators: Policy Compliance Rate (what percentage of AI use cases are governed by approved policies), Incident Response Time (how quickly governance failures are contained), Use Case Review Cycle Time (how efficiently new AI deployments are approved), and Model Coverage (what percentage of production AI systems are fully documented).
The shift in 2025–2026 is from policy to proof. Regulators, enterprise buyers, and insurers are now asking for demonstrated governance, not stated intent. Governance that cannot produce evidence of its own operation — audit trails, model cards, incident logs — will not satisfy a regulatory or due-diligence inquiry. Continuous monitoring is what separates governance from periodic audit compliance. Building a governance measurement framework that tracks both operational effectiveness and regulatory evidence is the next step once execution foundations are in place.
For the full measurement framework, read How to Measure Whether Your AI Governance Is Actually Working.
The EU AI Act is legally binding and applies to any organisation placing AI on the EU market — regardless of where that organisation is headquartered. For SaaS companies serving EU customers, any AI-powered feature used by EU-based users is potentially in scope. High-risk provisions are fully in force by August 2026. US states have begun regulating AI in parallel, creating overlapping obligations for multi-state SaaS companies.
The EU AI Act classifies AI systems into four risk tiers: Unacceptable (prohibited), High (conformity assessments, audit trails, human oversight), Limited (disclosure requirements), and Minimal (few requirements). Where your deployments land on that scale depends on use case — most mid-market SaaS features will sit in the Limited or High-risk tiers.
In the US, states have begun regulating AI in the absence of federal legislation. Colorado, Texas, Illinois, and California all have laws taking effect in 2026, creating overlapping obligations for multi-state SaaS companies. Governance frameworks like NIST AI RMF and ISO/IEC 42001 serve as the operational backbone for satisfying multiple regulatory requirements simultaneously. NIST AI RMF provides the operational structure; ISO/IEC 42001 certification provides portable regulatory evidence. Understanding what the EU AI Act and US state laws require now — including the enforcement timeline and penalty exposure — is the external forcing function that makes governance investment non-negotiable.
For the full regulatory analysis, read The Regulatory Forcing Function — What EU AI Act and US State Laws Require Now.
Start with visibility. You cannot govern AI you cannot see. The first step is an AI tool inventory — a complete catalogue of all AI tools in use across the organisation, including tools employees have adopted without approval. From that inventory, classify tools by risk tier, establish an oversight function with defined decision rights, and publish an approved tool list. Governance that skips visibility and goes straight to policy produces enforcement without foundation.
From that baseline, the sequenced path looks like this:
The gap is widening. But governance is not a one-time project — it is an operational capability you build incrementally. The organisations that will be best positioned in 2026 are those that build evidence instead of narratives and normalise assurance instead of treating it as exceptional.
For the execution playbook, start with From AI Policy to AI Practice — How to Build Governance That Actually Executes. For the accountability infrastructure, read How to Measure Whether Your AI Governance Is Actually Working.
Shadow AI vs Shadow IT — What Makes the New Threat Harder to Govern — Why the most visible symptom of the governance gap is harder to contain than its predecessor, with evidence on how far it has already spread.
Shadow AI in Mid-Market Companies — Why the Exposure Is Disproportionate — How shadow AI risk accumulates differently at 50–500 employee companies and why accountability fragmentation is most acute at this scale.
The Regulatory Forcing Function — What EU AI Act and US State Laws Require Now — Why regulatory pressure is converting AI governance from aspiration to legal requirement and what the enforcement timeline means for your organisation.
From AI Policy to AI Practice — How to Build Governance That Actually Executes — The practitioner’s execution playbook: role-based access controls, distributed enablement, lightweight approval workflows, and shadow AI detection.
The AI Operating Model — What Separates Governance Leaders from Laggards — What the strategic architecture of mature AI governance looks like and the operating model choices that distinguish high-performing organisations.
How to Measure Whether Your AI Governance Is Actually Working — The measurement framework for governance execution quality: KPIs that matter, audit trail infrastructure, and how to demonstrate governance effectiveness.
Governance theatre is the failure mode where organisations generate documentation — policies, frameworks, reports — without building the operational controls that make governance real. The signals: high self-reported compliance alongside frequent AI-related incidents, and risk reviews completed on paper but never enforced in practice. You avoid it by assigning accountability to named people (not functions), deploying monitoring that runs continuously, and measuring operational outcomes rather than documentation volume.
Yes. The EU AI Act has extraterritorial scope — it applies to any organisation placing AI systems on the EU market, regardless of headquarters location. For SaaS companies serving EU customers, any AI-powered feature accessible to EU-based users is potentially in scope. High-risk AI provisions are fully in force by August 2026. See The Regulatory Forcing Function for the full analysis.
An AI policy is a written document stating what is permitted and prohibited. An AI governance framework is the operational system that makes the policy enforceable — enforcement mechanisms, accountability structures, monitoring capabilities, and measurement processes. Having a policy without a framework is the most common state: 75% of organisations have written policies, but only 36% have a governance framework.
NIST AI RMF is a voluntary US framework organised around four functions (Govern, Map, Measure, Manage) that provides a practical structure for AI risk management. ISO/IEC 42001 is the international standard for AI management systems, where certification provides documented evidence satisfying multiple EU AI Act requirements. They are complementary: NIST for operational structure, ISO/IEC 42001 for portable regulatory evidence.
Risk-tiered governance matches oversight controls to the criticality of each AI use case. Low-risk use cases (content drafting, scheduling) require minimal controls; high-risk use cases (hiring decisions, credit scoring) require conformity assessments, audit trails, and continuous monitoring. For resource-constrained organisations, tiering is what makes governance feasible — it allocates effort where it materially reduces risk rather than applying enterprise-level controls to everything.
A minimum viable programme — AI tool inventory, risk classification, oversight committee, approved tool list, and basic monitoring — can be operational within 90 days if treated as a structured project. More comprehensive programmes including full KPI tracking, model card documentation, and regulatory alignment typically require six to twelve months. The priority is reaching operational visibility before investing in measurement infrastructure. See From AI Policy to AI Practice for the execution sequence.
The Regulatory Forcing Function — What EU AI Act and US State Laws Require NowAI governance is no longer something you can slot into next quarter’s roadmap. It is a legal obligation with dated deadlines and real penalties — and some of those deadlines have already passed. The EU banned manipulative AI, social scoring, and workplace emotion detection in February 2025. That is done. It is law.
The EU AI Act’s staggered enforcement timeline means high-risk system requirements kick in from August 2026. Meanwhile, US states — California, Texas, Illinois, Colorado — have gone ahead and enacted their own AI laws. Most took effect on January 1, 2026. So you are now dealing with a multi-jurisdictional patchwork, and it is only going to get thicker.
This article maps the concrete obligations, timelines, and penalties across both jurisdictions — and lays out the business case for acting now rather than later. If you want the broader picture on the AI governance gap regulators are now targeting, start there.
The EU AI Act (Regulation EU 2024/1689) is the world’s first comprehensive AI regulation. It sorts AI into risk buckets: unacceptable (banned), high-risk (strict obligations), limited risk (transparency rules), and minimal risk (largely left alone).
There are four enforcement dates. The first one has already come and gone:
February 2, 2025 — Prohibited AI practices banned outright. Manipulative AI, social scoring, real-time biometric surveillance in public spaces, workplace emotion detection. AI literacy requirements now active for providers and deployers.
August 2, 2025 — GPAI model rules come into force. Providers need technical documentation, copyright policies, and training data summaries.
August 2, 2026 — High-risk AI obligations become fully enforceable. Conformity assessments, risk management systems, human oversight, post-market monitoring, EU Database registration, CE marking — all required before you can place a high-risk system on the market. These assessments typically take six to twelve months. If you have not started, August 2026 is already tight.
August 2, 2027 — Everything else kicks in.
The penalties are serious: up to EUR 35 million or 7% of global annual turnover for prohibited practice violations. EUR 15 million or 3% for other infringements. That is more than GDPR fines.
Two things worth knowing about scope. First, the Act applies extraterritorially — if your AI touches the EU market, you must comply, no matter where your company is incorporated. Second, the provider/deployer distinction matters. Most SaaS companies are deployers. But if your product involves employment decisions, creditworthiness assessment, health diagnostics, or biometric processing, it is almost certainly high-risk under Annex III. That means conformity assessments and human oversight — whether you built the model or not.
There is no comprehensive federal AI legislation. The states have filled the gap — over 1,000 AI-related bills were introduced in 2025 alone. Here are the ones that actually matter.
California SB 53 (effective January 1, 2026). Frontier model developers need to publish risk frameworks and report safety incidents. Penalties go up to $1 million per violation — though it targets developers pulling in revenue above $500 million.
California AB 2013 (effective January 1, 2026). Generative AI developers must disclose training data sources, types, and copyright status.
Texas TRAIGA (HB 149, effective January 1, 2026). Here is the one to pay attention to: compliance with the NIST AI Risk Management Framework constitutes an affirmative defence. Adopt a governance framework, get legal safe harbour. Penalties run $80,000 to $200,000 per violation.
Illinois HB 3773 (effective January 1, 2026). Bans discriminatory AI in employment decisions. And here is the kicker — it includes a private right of action. That is the only state AI law where plaintiffs can sue you directly. If you use AI anywhere in hiring or workforce decisions, this is where your litigation exposure lives.
Colorado SB 24-205 (effective June 30, 2026). The first comprehensive US state statute going after high-risk AI systems. Requires impact assessments and consumer disclosures. Penalties up to $20,000 per violation. Impact assessments take months — June 2026 is closer than it looks.
What about federal preemption? The Trump Administration’s December 2025 Executive Order signalled intent to challenge state regulation, but an executive order cannot overturn existing state law. The Senate already voted down a provision that would have barred states from enforcing AI regulations for ten years.
The practical advice: comply with the strictest applicable standard now rather than gamble on preemption that may never arrive.
How you talk about your AI practices is itself a regulatory surface. If your company claims to use AI responsibly or to have governance frameworks in place — and those claims are not backed by what you actually do — you have a problem.
The SEC treats AI claims in investor materials, filings, and marketing as material representations. After the landmark 2024 settlements against Delphia and Global Predictions, the SEC’s 2026 Examination Priorities specifically target AI disclosures. The FTC is in on it too, going after deceptive AI capability claims.
What this means in practice: governance is not just an internal exercise. It is an external disclosure risk. What you say about your AI practices has to be verifiable. The gap between governance claims and governance reality is now an enforcement target.
If you want to understand how measurement satisfies compliance obligations, that is where documentation becomes your defence.
SEC enforcement targets what you say about governance. Shadow AI undermines the governance you actually have.
Shadow AI — employees using AI tools without IT approval — introduces unmonitored data flows that compound your compliance obligations. An ISACA survey of 561 European professionals found 83% believe employees use AI without policy coverage. Only 31% have a formal AI policy in place. This is not some edge case. It is the norm.
GDPR requires documented data processing activities and a lawful basis for processing. Shadow AI tools processing personal data outside sanctioned channels violate these requirements without the organisation knowing. HIPAA‘s requirements for protected health information controls fall apart when employees use consumer AI tools to process patient data. SOC 2 trust principles assume known system boundaries — shadow AI pushes processing well beyond those boundaries.
The compounding effect is the real worry. Each unsanctioned tool multiplies the compliance surface without increasing your compliance capacity. You are not failing at one regulation — you are silently failing at several at once. Less than 47% of organisations have adopted formal AI risk management frameworks.
Your regulatory liability does not disappear because IT did not approve the tool. The company bears the liability regardless. That is why governance execution is now a legal requirement.
Governance debt is the gap between how fast you are adopting AI and how mature your governance actually is. Think of it like technical debt — except instead of slower deployments, you get regulatory penalties and financial consequences.
David Talby, CTO of John Snow Labs, puts it directly: “Governance debt will become visible at the executive level. Organisations without consistent, auditable oversight across AI systems will face higher costs, whether through fines, forced system withdrawals, reputational damage, or legal fees.”
The cost breaks into three parts.
Penalty exposure. The EU, California, Texas, Colorado, and Illinois penalties are additive across jurisdictions. One governance failure can trigger enforcement under several laws at the same time.
Remediation cost. Building governance after an enforcement action costs multiples of doing it proactively. Retroactive compliance means documenting and correcting decisions you have already made. That is always harder and always more expensive.
Operational cost. Incident response, legal engagement, crisis management, customer notification. The average data breach costs $4.45 million — and that is before reputational damage.
The financial argument is pretty straightforward: building governance now is a fraction of what governance debt costs when it comes due.
The regulatory forcing function gives you the strongest argument for governance investment you have ever had: external deadlines with quantified penalties that turn “we should” into “we must.”
Timeline pressure. EU AI Act high-risk obligations go live August 2, 2026. US state laws have been in effect since January 1, 2026. Colorado impact assessments are due June 30, 2026. These are enforceable dates.
Penalty quantification. Present the aggregate exposure across jurisdictions — the EU, US state, and SEC enforcement figures laid out above. Your legal team can calculate the specific exposure for your operational footprint.
Framework leverage. NIST AI RMF addresses multiple regulatory surfaces with one framework investment. Organisations aligned with NIST AI RMF find that cost avoidance from risk prevention often exceeds governance programme costs within two years.
Competitor positioning. In regulated industries, transparency and explainability is increasingly a market access requirement. Compliance failures under the EU AI Act can block product sales in entire markets. Governance becomes a market-access credential, not just a cost centre.
Frame it for your board as what governance costs now versus what deferred governance costs when an audit lands on your desk. Let the numbers do the talking. If you need to know how to build the governance execution that regulators require and how measurement satisfies compliance obligations, those are your next steps.
The regulatory landscape has shifted from voluntary frameworks to enforceable obligations with dated deadlines and real penalties. The EU AI Act and US state laws are not parallel — they are additive. If you operate across jurisdictions, your obligations compound, and you cannot address them piecemeal.
Governance debt is accumulating right now. The question is whether you invest proactively at a planned cost or reactively at penalty cost. The numbers favour doing it now.
Start with the gap between AI policy and AI practice, then move to how to build the governance execution that regulators require.
Yes. The EU AI Act has extraterritorial scope. If your AI systems are placed on the EU market or your AI outputs affect people within the EU, you must comply — regardless of where your company is headquartered. US-based SaaS, FinTech, and HealthTech companies with EU customers are in scope.
High-risk AI systems operate in regulated domains — employment, healthcare, credit assessment, law enforcement — and must meet strict obligations including conformity assessments, risk management systems, and human oversight. GPAI models are large-scale models adaptable to many tasks, and they come with different transparency and documentation requirements.
Yes. Regulatory obligations attach to the organisation, not to individuals. If an employee uses an unsanctioned AI tool that processes personal data in violation of GDPR, or makes employment decisions using AI in violation of Illinois HB 3773, the company bears the legal liability. IT not knowing about it does not change that.
If your product involves AI-driven decisions in employment, creditworthiness assessment, health diagnostics, biometric identification, or critical infrastructure management, it is likely high-risk under Annex III. Classification depends on what your product does, not on what technology stack it is built on.
NIST AI RMF 1.0 is a voluntary US federal framework with four functions: govern, map, measure, and manage. Compliance gives you an explicit affirmative defence under Texas TRAIGA — legal safe harbour in at least one jurisdiction while providing a foundation that maps well to EU AI Act obligations.
No. All enacted state laws remain enforceable until courts rule otherwise. The FTC was directed to issue a preemption statement by March 11, 2026, but that would not automatically invalidate existing state laws. Comply now.
AI washing means making unsubstantiated claims about AI capabilities or governance practices — think of it as greenwashing but for AI. The SEC treats AI claims in investor materials and marketing as material representations. If what you actually do does not match what you say, that is a potential securities law violation.
Colorado SB 24-205 (effective June 30, 2026) goes after high-risk AI systems, requiring reasonable care to prevent algorithmic discrimination, impact assessments, and consumer disclosures. It focuses specifically on algorithmic discrimination rather than the EU’s broader safety framework. Penalties are up to $20,000 per violation versus EUR 35 million or 7% of turnover under the EU Act.
Conformity assessments require technical documentation, risk management systems, data governance controls, and human oversight mechanisms. For a mid-sized company, expect six to twelve months — which means the August 2, 2026 deadline requires action now, not next quarter.
EU AI Act: up to EUR 35 million or 7% of global turnover for prohibited practice violations; EUR 15 million or 3% for other infringements. Colorado: up to $20,000 per violation. Texas TRAIGA: $80,000 to $200,000 per violation. California SB 53: up to $1 million per violation. Illinois HB 3773: civil liability through private right of action. These are additive across jurisdictions — one governance failure can trigger penalties under multiple laws.
How to Measure Whether Your AI Governance Is Actually WorkingHere is a number that should bother you: 72% of enterprises now formally measure Gen AI ROI. Measuring productivity? Sorted. But ask those same companies whether their AI governance is actually working — whether policies are being followed, whether anyone can prove compliance — and the room goes quiet. Only 32% operate at what Acuvity calls “measured effectiveness” on governance. The other 68% either track nothing at all or track whether reviews got completed, which tells you almost nothing about whether anything is actually being enforced.
This article lays out a practical measurement framework: what to track, how to figure out where you stand, and how to build lightweight monitoring without needing enterprise-scale resources. It covers the AI governance gap that measurement addresses and looks at the execution mechanisms this measurement framework is designed to assess. Governance is not a project with a completion date. It is an ongoing operational capability, and you need to measure it like one.
The Wharton finding is pretty stark. In their third-wave Gen AI adoption study, 72% of enterprise leaders formally measure Gen AI ROI. The productivity side has moved past experimentation into what they call “Accountable Acceleration.”
Governance is a different story. Acuvity’s 2025 State of AI Security survey found that nearly 40% do not have managed governance at all — they are running on ad hoc practices or literally nothing. IBM’s research lines up with this: nearly 74% of organisations report moderate or limited coverage in their AI risk and governance frameworks.
So companies know exactly how much value AI is generating but cannot tell you whether it is being used compliantly or within policy boundaries. That is a problem.
Part of it is governance theatre. Organisations tick checklists without actually verifying enforcement. Process metrics — policies written, reviews completed — create an illusion of control. Less than 47% have adopted formal risk management frameworks for AI. Half of business leaders say their organisation lacks the governance structures needed to manage Generative AI’s ethical challenges.
But the deeper cause is structural. Governance got treated as a one-time project: write a policy, form a committee, move on. Every undocumented deployment, every skipped review widens the gap between what you believe is governed and what actually is. As IBM’s Jeff Crume put it, “it’s pretty hard to know if you’re succeeding if you’ve never even defined the benchmarks.”
That gap between perceived governance and actual governance — that is the gap this measurement framework is designed to close.
The DX AI Measurement Framework, built from research across 184 companies and 135,000+ developers, structures measurement across three dimensions: Utilisation, Impact, and Cost. It was designed for engineering productivity, but the parallel to governance is direct.
Utilisation — is AI being used as intended within governance boundaries?
This is your compliance layer. Track policy compliance rate — the percentage of deployments that completed the required governance review steps. Track model coverage — the percentage of production AI systems with complete documentation, including model cards, audit trails, and risk assessments. Track governance participation rate — are the right people actually showing up? And track shadow AI detection rate. One in five organisations experienced breaches from unsanctioned AI usage, costing $670,000 more than traditional breaches. If you needed a business case for measuring shadow AI, there it is.
Impact — is governance producing measurable risk reduction and compliance improvement?
Impact metrics tell you whether governance is producing results you can actually point to. Track incident response time — how fast you identify and resolve AI-related incidents. Track use case review cycle time — if governance review takes too long, teams bypass it, and that is what drives shadow AI. Track risk reduction metrics — measurable decreases in compliance violations, bias incidents, or data exposure. And track governance maturity score progression over time.
Cost — what is governance consuming, and is it proportionate?
Governance costs money. The question is whether that cost is proportionate to the value AI delivers. Track governance overhead per deployment and the opportunity cost of governance delays. For context: organisations that treat governance as a strategic capability see a 30% ROI advantage over those treating it as a compliance afterthought.
O’Reilly’s DX Core 4 framework complements this structure for engineering-specific contexts where developer experience metrics overlap with governance participation.
Acuvity’s research defines four maturity levels. Most organisations sit lower than they think.
Level 1 — Ad Hoc. No systematic governance. AI deployments happen without formal review. Measurement capability: zero. You cannot measure what you do not track. The tell: nobody can say how many AI systems are in production. 39% of organisations operate below managed governance levels entirely.
Level 2 — Defined. Governance policies exist on paper. Some review processes are in place. You can measure process — policies written, reviews completed — but that is about it. The risk here is governance theatre. High process scores masking zero enforcement. A perfect Regulatory Compliance Score means nothing if teams circumvent governance because they find it burdensome.
Level 3 — Managed. Governance is enforced with tooling. Outcome metrics sit alongside process metrics — policy compliance rate, model coverage, incident response time. The milestone here: a governance scorecard exists and gets reported to leadership. This is where governance becomes visible to the people who fund it.
Level 4 — Optimised. Continuous monitoring, drift detection, and accountability reporting are automated. Leading indicators — rising undocumented deployments, declining participation rates — trigger proactive intervention before things go sideways. Only 32% of organisations operate here.
You do not have to climb each rung one at a time. You can leapfrog from Ad Hoc to Managed by adopting a measurement-first approach. As David Talby, CTO of John Snow Labs, put it: the most successful organisations are those that can explain, defend, and adapt their systems under sustained scrutiny.
Think of the maturity ladder as a diagnostic tool, not a compliance requirement. Use it to figure out where you are so you can prioritise where to invest.
What happens after your AI systems go live? For most organisations, the honest answer is: not much. And that is exactly where the risk concentrates.
38% of organisations identify runtime as their most vulnerable phase. Another 27% say it is where they are least prepared. Pre-deployment concerns rank far lower.
Most governance frameworks focus on pre-deployment review and then neglect post-deployment monitoring entirely. But AI models drift over time. Training data goes stale, usage patterns shift, edge cases pile up. Without continuous monitoring, governance documentation becomes outdated within months.
IBM’s CIO Matt Lyteson described it directly: “We’ve even seen instances where you put it out there and then, a week later, it’s producing different results than you initially tested for.” When their Ask IT support agent’s resolution rate dropped from 82% to 75%, they investigated immediately. That is what runtime measurement looks like in practice.
For teams without enterprise tooling, a lightweight monitoring layer still does the job. Scheduled model performance reviews — quarterly at a minimum. Automated alerts for usage anomalies. Quarterly policy currency checks with an emergency update process for new capabilities. And periodic shadow AI sweeps — the average enterprise runs over 320 unsanctioned AI applications, so scanning SaaS subscriptions and API integrations regularly is not optional. It is just something you have to do.
Agentic AI adds another layer. Measurement needs to extend to decision traceability, tool call logs, and human escalation triggers. And if you need a regulatory reason to care: regulators are signalling that documentation gaps themselves may constitute violations. That connects directly to the measurable evidence regulators now require.
Most 50-to-500-person companies do not have a dedicated governance team, enterprise GRC platforms, or six-figure tooling budgets. That is fine. Measurement scales down.
No tooling required — spreadsheet level.
Start here. AI inventory: list every AI system in production with owner, purpose, and last review date. If you cannot answer “how many AI systems are in production?” then you are at Level 1. Policy compliance rate: a manual quarterly audit of whether deployments followed governance steps. Governance participation rate: are the right people showing up to reviews? Declining attendance is the earliest signal that teams are treating governance as optional. Policy currency check: are your policies less than six months old? A dated document review takes one hour per quarter.
Basic instrumentation — existing observability stack.
Model coverage: track it via a Jira or Linear ticket that cannot close until the model card and risk assessment are attached. Use case review cycle time: tracked via your ticketing system — if average review time is climbing, teams are bypassing governance. Incident response time: measured from your existing incident management tooling. Shadow AI detection: a manual sweep of expense reports and browser extensions surfaces most unsanctioned AI tool usage.
Enterprise tooling — dedicated platforms.
Continuous drift detection and automated alerting. Real-time governance dashboards through platforms like IBM watsonX Governance and Credo AI. Automated audit trail generation and regulatory reporting.
The enterprise tier is optional. Governance measurement works at every budget level. As IBM’s Jeff Crume observed, “saying no doesn’t stop the behaviour, it just drives it underground” — measurement lets you govern with evidence, not blanket restrictions.
Here is a self-assessment you can run by Friday. Five questions: How many AI systems are in production right now? When was the last governance policy update? Who is accountable for AI risk in each business unit? How long does it take to approve a new AI use case? How many unsanctioned AI tools were discovered last quarter?
If you cannot answer those, you know where to start.
Governance is a continuous operational capability. The moment measurement stops, governance debt starts piling up.
The “governance project” failure mode is common. Organisations that treat governance as a discrete initiative — write policy, implement, declare success — fail slowly. Governance decay is invisible until an audit, an incident, or a regulatory inquiry surfaces it. As David Talby put it, “governance is no longer judged by policy statements, but by operational evidence.”
Continuous measurement acts as an immune system. It catches policy drift, rising shadow AI, declining participation, and emerging risks before they turn into compliance failures. And it sustains investment — boards fund what they can see working. IBM’s daily cost visibility per AI use case demonstrates the kind of granular accountability that keeps governance funded.
The regulatory trajectory makes this non-negotiable. The EU AI Act, NIST AI RMF, and ISO/IEC 42001 all assume continuous governance. The EU AI Act’s high-risk obligations become fully applicable in August 2026. Compliance is not a one-time assessment. It is an ongoing obligation.
The organisations that treat governance execution as permanent infrastructure — not a project deliverable — are the ones that will scale AI safely. The 32% that operate at measured effectiveness are not done. They are simply measuring continuously. That is the difference.
No. ROI measurement tracks business value — productivity gains, revenue, cost reduction. Governance measurement tracks whether AI is being used responsibly and within policy boundaries. The 72% (ROI) versus 32% (governance) split shows these are entirely separate disciplines.
Start with four. AI inventory completeness — do you know every AI system in production? Policy compliance rate — did each deployment follow your governance process? Policy currency — are your policies less than six months old? Governance participation rate — are the right people showing up to reviews? You do not need anything beyond a spreadsheet and a calendar for these.
The NIST AI RMF is structured around four functions — Map, Measure, Manage, and Govern. The Measure function defines what to track: risk metrics, performance thresholds, and stakeholder feedback. The Cybersecurity Profile layers security-specific measurement on top. Together they give you a government-endorsed measurement backbone you can adopt without building from scratch.
Process metrics measure activity: policies written, reviews completed, training sessions held. Outcome metrics measure impact: incidents prevented, compliance violations reduced, model drift detected before harm. Mature governance programmes lead with outcome metrics. Organisations stuck on process metrics alone risk governance theatre — high activity scores with no verified enforcement.
Focus on three translations. Model coverage becomes “percentage of AI systems under documented control.” Incident response time becomes “how quickly we contain AI-related problems.” Governance maturity score becomes “our progress on a four-level scale benchmarked against industry.” Boards respond to trend lines and benchmarks, not raw technical data.
Quarterly for formal review, with monthly pulse checks on leading indicators — shadow AI discoveries, participation rate changes, policy expiry dates. Organisations at maturity Level 4 run continuous automated monitoring. Match your review cadence to how fast you are deploying AI.
Five warning signs. Rising undocumented AI deployments. Declining attendance at governance reviews. Policies not updated in over six months. Increasing use case review cycle times — which suggests teams are bypassing governance. And nobody can name the person accountable for AI risk in their business unit.
Both. Basic metrics — AI inventory, policy currency, participation rates — need manual tracking or simple automation. Advanced metrics — drift detection, continuous compliance monitoring, automated audit trails — need dedicated tooling. Start manual, automate incrementally, and do not wait for perfect tooling to start measuring.
Agentic AI introduces new measurement dimensions. Decision traceability — can you reconstruct why the agent acted? Tool call logging — what external systems did the agent access? Human escalation frequency — how often did the agent defer to a human? Most existing frameworks were designed for traditional or generative AI, not autonomous agents. They need extending.
A governance scorecard aggregates four to six KPIs into a single view: model coverage, policy compliance rate, incident response time, governance maturity score, shadow AI count, and governance participation rate. Each metric shows current value, trend direction, and a red/amber/green status. The dashboard is what makes governance visible to leadership and keeps the funding flowing.