In early 2026, hyperscaler capital expenditure commitments locked in memory allocation priorities — and conventional DRAM supply began its steepest collapse in a decade. The problem isn’t cyclical. It’s structural. AI needs a specialised form of memory called HBM (high-bandwidth memory), and making it cannibalises the supply of the DRAM that goes into everything else — your PCs, your smartphones, your enterprise servers.
Here’s the ratio you need to know: every HBM stack eats roughly three times the silicon wafer capacity of conventional DRAM. As IDC put it: “every wafer allocated to an HBM stack for an Nvidia GPU is a wafer denied to the LPDDR5X module of a mid-range smartphone or the SSD of a consumer laptop.” Three manufacturers control virtually all global memory production. They’ve all made the same allocation decision.
This article traces the full causality chain: from LLM architecture through fab economics to why your next hardware refresh costs more. If you want the full scope of the AI memory shortage, we’ve put together a broader overview.
Why Does AI Need a Completely Different Kind of Memory?
Large language models are memory-bandwidth-limited. Not compute-limited — memory-bandwidth-limited. That distinction matters because it changes what you need to throw money at to make AI work.
The “memory wall” is the point where GPU compute throughput outpaces how fast data can be transferred from memory. Your GPU sits there ready to go, but data can’t reach it fast enough. Micron’s Sumit Sadana put it plainly: “the processor spends more time just twiddling its thumbs, waiting for data.”
Think of it as a motorway feeding into a single-lane road. You can add as many lanes to the motorway as you want — more GPUs, faster GPUs — but if the road into town stays one lane wide, traffic doesn’t move any faster.
This is why LLMs changed the game. Earlier AI architectures like CNNs were compute-hungry but didn’t ask that much of memory. LLMs load billions of parameters, attention weights, and intermediate states, and they need a sustained high-bandwidth data flow to keep it all moving. AI systems have evolved from large-model training to systems that combine inference, memory, and decision-making, and that evolution has pushed memory demand well past what anyone planned for.
Conventional DDR5 DRAM delivers roughly 50–60 GB/s of bandwidth per channel. HBM delivers 1–2 TB/s. That’s a 20–40x gap. You’re not bridging that with incremental improvements. HBM isn’t optional for AI at scale. It’s architecturally mandatory.
What Is HBM and Why Does It Consume Three Times the Wafer Capacity of Standard DRAM?
HBM takes multiple thinned-down DRAM dies — currently 8 to 12, moving to 16 — stacks them vertically, connects them with thousands of microscopic through-silicon vias (TSVs), and mounts the whole stack right next to the GPU inside the same package.
The stacking is what delivers the bandwidth. Data flows through thousands of parallel vertical connections simultaneously rather than through a narrow external bus. It works. But it comes at a cost that’s reshaping the entire memory market.
Here’s the ratio that matters most. Micron’s Sumit Sadana confirmed it: “As we increase HBM supply, it leaves less memory left over for the non-HBM portion of the market, because of this three-to-one basis.” A 12-die HBM stack uses 12 wafers’ worth of DRAM dies to produce one memory unit, compared to a single die for a conventional module. Every wafer that goes to HBM is a wafer denied to DDR5, LPDDR5X, and everything else.
Samsung, SK Hynix, and Micron are all doing the same maths. HBM costs roughly three times more per gigabyte and makes up over 50% of the packaged cost of an AI GPU according to SemiAnalysis. The margins are better. The customers are richer. If you’re running a fab, the allocation decision is obvious. If you’re buying laptops, less so.
And it’s about to get worse. HBM4 moves to 16-die stacks. Nvidia’s Rubin GPU comes with up to 288GB of HBM4 per chip, with the NVL72 server rack combining 72 of those GPUs — over 20TB of HBM per rack. For a deeper look at who is profiting from HBM demand, we cover the manufacturer strategies separately.
Why Did the Shortage Hit So Sharply in Early 2026?
This wasn’t a gradual trend. It was a tipping point.
Combined cloud provider CapEx is expected to exceed $710 billion in 2026, with AI infrastructure as the dominant allocation. That locked in HBM demand at a scale that forced all three memory makers to make irreversible wafer allocation decisions at the same time.
SK Hynix reported its HBM, DRAM, and NAND capacity was “essentially sold out” for 2026. Micron’s Sadana was equally blunt: “We’re sold out for 2026.” Micron exited the consumer memory market entirely to focus on enterprise and AI.
The timing was a perfect storm. Windows 10 end-of-life drove PC refresh demand, AI PC specs required 16–32GB RAM, and supply contracted — all at once.
The price data tells the story. DRAM prices rose 80–90% in Q1 2026 according to Counterpoint Research. Enterprise DDR5 64GB RDIMMs more than doubled, from roughly $7/unit to $19.50/unit. Samsung raised 32GB consumer module prices 60%, from $149 to $239.
TrendForce analyst Tom Hsu called the Q1 2026 price increases “unprecedented.” Intel CEO Lip-Bu Tan was more direct: “There’s no relief until 2028.” Gartner’s Ranjit Atwal noted the speed at which memory pricing has increased “has shocked everybody”, with an expected 130% year-on-year rise in 2026.
For the price data quantifying this shift, we break down the numbers in detail.
How Does AI Inference Keep Growing Memory Demand Even After Training Is Complete?
Training is a one-time memory allocation. Inference is where the compounding demand lives, and it comes down to something called the KV cache.
During inference, the model stores intermediate attention results in the KV (key-value) cache — essentially the model’s “short-term memory.” During an active session, this KV cache sits in HBM. When the session goes idle, it gets pushed to slower memory. But while it’s active, it’s sitting in the most expensive, most contested memory on the planet.
The KV cache grows linearly with context window length. Frontier models now reach 128K to 1M tokens of context. Each user session maintains its own cache in GPU memory. So the maths is straightforward: more concurrent users multiplied by longer context windows equals more HBM consumed per deployment. Usage, context windows, and user counts are all growing at the same time — the demand is multiplicative.
It gets worse with agentic AI. RAG, tool use, and long-running agent conversations all extend effective context and therefore memory footprint. This is why hyperscalers keep ordering more. It’s not hoarding. Actual usage growth consumes every gigabyte delivered.
Why Can’t Fabs Simply Make More HBM to Meet Demand?
Building a new semiconductor fab takes 3–5 years from ground-breaking to volume production. You cannot buy your way out of physics.
Here’s what’s in the pipeline. Micron has a Singapore fab reaching production in 2027, a Taiwan fab in H2 2027, and a New York megafab not in full production until 2030. SK Hynix has a South Korea fab completing in 2027 and Indiana facilities by end of 2028. Samsung plans a new Pyeongtaek plant in 2028. Billions of dollars of investment. None of it provides meaningful relief before 2027 at the earliest.
Even existing fabs face yield problems. Drilling thousands of TSVs per die and achieving sub-micron alignment across 12–16 stacked dies pushes manufacturing tolerances to their limits. Advanced packaging is a separate bottleneck — even if you had more HBM dies, packaging them using techniques like CoWoS requires specialised capacity that’s also constrained.
After the 2022 bust, memory companies were wary of expanding. As memory expert Thomas Coughlin noted, “there was little or no investment in new production capacity in 2024 and through most of 2025.” That caution is now colliding with a demand shock nobody prepared for.
Micron predicts the total HBM market will grow from $35 billion in 2025 to $100 billion by 2028 — larger than the entire DRAM market in 2024. For a detailed look at when new fabs can resolve the supply crunch, we cover the timelines separately.
What Does This Mean for Everything That Isn’t an AI Data Centre?
The downstream numbers are already in. PC shipments will fall 10.4–11.3% in 2026 according to Gartner and IDC. Smartphone shipments are forecast to decline 8.4–12.9%. PC prices are expected to rise 17%, with memory now accounting for 23% of PC bill-of-materials cost, up from 16% in 2025.
HP said memory now accounts for 35% of the costs to build a PC, up from 15–18% the previous quarter. Dell COO Jeff Clarke warned “I don’t see how this will not make its way into the customer base” — and followed through, raising PC prices in January after cutting them just months earlier.
Lenovo, Dell, HP, Acer, and ASUS have warned clients of 15–20% price hikes and contract resets. Enterprise refresh cycles are extending by 15% as organisations delay purchases — but deferring just creates a demand backlog that amplifies future price pressure when those purchases can’t wait any longer.
On the smartphone side, LPDDR5x prices are forecast to jump roughly 90% quarter-on-quarter — the steepest increase in their history according to TrendForce. Rising memory costs could make low-end devices economically unsustainable, with some budget models pulled from the market altogether.
The procurement landscape has shifted structurally. Hyperscalers lock in supply through long-term reservations and direct fab investments. Mid-market firms rely on shorter contracts and spot sourcing. Gartner’s Ranjit Atwal summed it up: “Vendors aren’t guaranteeing prices for long now. They’re saying this is the price and it’s available for two or three weeks.” His advice: buy now, because whatever you’re paying today is likely the best price you’ll see for a while.
For our overview of the memory crisis, we put it all in context.
Frequently Asked Questions
Is the 2026 memory shortage the same as the COVID-era chip shortage?
No. The COVID shortage was demand surging across many chip categories at the same time. The 2026 shortage is structurally different: it’s a deliberate wafer reallocation within the memory industry from conventional DRAM to HBM, driven by AI economics. The constraint is internal to the memory market, not external.
How much more does HBM cost compared to standard DRAM?
SemiAnalysis estimates HBM costs roughly three times more per gigabyte. It also represents over 50% of the total packaged cost of an AI GPU — the single most expensive component in AI accelerator hardware.
Why can’t AI systems just use regular DDR5 instead of HBM?
DDR5 delivers roughly 50–60 GB/s per channel. HBM delivers 1–2 TB/s. LLMs need sustained high-bandwidth data flow to keep GPU compute units busy. Using DDR5 would leave GPUs idle most of the time, making the compute investment largely wasted.
Should our company buy servers and laptops now or wait for prices to drop?
No meaningful price relief is expected before mid-2027 at the earliest, with structural pressure continuing through 2028. Intel CEO Lip-Bu Tan put it bluntly: “There’s no relief until 2028.” Deferring risks higher prices and longer lead times as deferred demand creates a backlog.
What is a KV cache and why does it matter for memory demand?
The KV cache stores intermediate computation results from LLM attention mechanisms during inference. It grows linearly with context window length and has to be maintained separately for each concurrent user session. As context windows expand and user counts grow, the KV cache becomes a major driver of HBM consumption.
Who controls the global memory chip supply?
Three companies — Samsung, SK Hynix, and Micron — manufacture virtually all of the world’s DRAM. Their wafer allocation decisions directly determine supply availability for every device category globally.
Will Chinese chipmakers fill the gap?
Chinese manufacturers such as ChangXin Memory Technologies are catching up but remain years behind. U.S. export controls restrict access to advanced lithography and packaging equipment needed for HBM production. No meaningful supply relief is expected in the 2026–2028 timeframe.
How does the memory shortage affect cloud computing costs?
Cloud providers are themselves major HBM buyers with rising infrastructure costs. While long-term supply agreements partially insulate hyperscalers from spot volatility, increased memory costs will eventually flow through to cloud service pricing.
What is advanced packaging and why is it a bottleneck?
Advanced packaging — techniques like CoWoS and hybrid bonding — integrates HBM stacks and GPU dies into a single package. This capacity is limited and requires specialised equipment. Even if more HBM dies were manufactured, packaging constraints limit how quickly finished modules reach the market.
How much has enterprise DDR5 memory actually increased in price?
DRAM prices rose 80–90% in Q1 2026 alone (Counterpoint Research). Enterprise DDR5 64GB RDIMMs more than doubled, from roughly $7/unit to $19.50/unit. Samsung raised 32GB consumer modules 60%, from $149 to $239.
What does Nvidia’s next-generation Rubin GPU mean for the shortage?
Rubin uses up to 288GB of HBM4 per chip, with the NVL72 rack combining 72 GPUs — over 20TB of HBM per rack. HBM4 moves to 16-die stacks, which means even more wafer consumption per unit. Each new GPU generation deepens structural HBM demand.
Are there any alternatives to HBM for AI workloads?
Some companies are exploring alternatives. Majestic Labs is designing an inference system with 128TB of memory using non-HBM architectures. But these involve significant performance trade-offs and aren’t proven at scale. For mainstream AI training and inference, HBM remains a hard architectural requirement.