James A. Wondrasek, Author at SoftwareSeni

What the Data Actually Shows About AI and Junior Developer Employment Decline

The debate about whether AI is displacing junior developers has moved past the speculation stage. We have data now. But the data needs careful reading, because it does not all point in the same direction.

The strongest evidence comes from the Stanford Digital Economy Lab. Using ADP payroll records covering 3.5 to 5 million workers, researchers found significant employment declines for workers aged 22-25 in the most AI-exposed occupations. A Danish study using similar methodology found near-zero effects. And ordinary recession dynamics muddy the picture further.

So in this article we are pulling together multiple independent data sources to work out what the evidence actually supports, where it falls short, and why that matters if you are making workforce decisions right now. It connects to the broader team compression phenomenon and builds on what team compression actually means in practice.

What does the data actually say about junior developer employment?

Workers aged 22-25 in the most AI-exposed occupations experienced a 16% relative employment decline compared to the least AI-exposed occupations. That comes from the Stanford/ADP study. In absolute terms, employment for this age group has fallen roughly 20% from its peak.

Here is where it gets interesting. Workers aged 35-49 in the same AI-exposed occupations saw 6-9% employment growth over the same period. The decline is not hitting everyone. It is hitting the youngest workers in roles where AI does the most work.

That asymmetry is the finding that matters. Within the same firms, at the same time, junior workers in AI-exposed roles declined while experienced workers grew. That looks like a structural shift, not a cyclical downturn.

Independent data backs this up. NY Fed College Labor Market data shows CS graduate unemployment at 6.1%, computer engineering at 7.5%. The overall rate for young workers aged 22-27 sits at 7.4% — nearly double the national average. By Q4 2025, the underemployment rate for recent college graduates climbed to 42.5%, its highest level since 2020.

The implications for the junior employment decline on the talent pipeline are significant. But first, it is worth understanding why this particular study carries weight.

What is the Stanford study and what makes it credible?

“Canaries in the Coal Mine?” was published in November 2025 by Erik Brynjolfsson, Bharat Chandar, and Ruyu Chen at the Stanford Digital Economy Lab.

The study uses ADP administrative payroll data. ADP is the largest payroll processing firm in the US, covering over 25 million workers. The researchers worked with individual-level monthly records for 3.5 to 5 million workers across tens of thousands of firms through September 2025. This is actual payroll data — not surveys, not job-posting scraping. It tracks real employment.

Occupations are classified into AI exposure quintiles using the Eloundou et al. (2023) framework. Software engineering sits in the top quintile. The study also uses Anthropic Economic Index data on the share of Claude queries involving automation versus augmentation — a second framework that independently backs up the exposure measures.

The methodological detail worth understanding is the firm-time effects controls. In plain language: the study compares workers within the same firm at the same time. If a company shrank overall, that shock gets absorbed. What remains is the differential between junior and experienced workers in AI-exposed roles within those firms.

Pre-2022 placebo tests confirm there was no pre-existing divergence before GenAI tools became widely available. The pattern showed up specifically in late 2022 and early 2023.

After publication, independent researchers using LinkedIn data from the US and UK found similar declines, as discussed in the Stanford paper itself. The core finding has been replicated.

What does the corroborating evidence from independent sources show?

The Stanford study is not the only evidence. Several independent sources using different methodologies point the same way.

Internship postings are shrinking. Handshake reports a 30% decline in tech-specific internship postings since 2023. Applications rose 7% over the same period. More people chasing fewer openings.

Entry-level tech hiring is down. Indeed’s Hiring Lab shows US tech job postings down 36% from February 2020 levels as of July 2025. Software engineers — the most common tech job title — were down 49% from early 2020. Entry-level tech hiring specifically dropped 25% year-over-year in 2024.

Employer sentiment is shifting. A 2024 SHRM survey found 70% of hiring managers say AI can do the jobs of interns. 57% trust AI’s work more than that of interns or recent graduates. Let that sink in for a moment.

Developer adoption is near-universal. The Stack Overflow 2025 Developer Survey shows 84% of developers now use AI tools, up 14 percentage points from 2023.

CS enrolments may follow. Forrester‘s 2026 Predictions project a 20% decline in CS enrolments as prospective students respond to deteriorating job market signals. Fewer CS graduates now could produce a senior engineer shortage in 5-10 years.

These are trailing indicators (employment data), concurrent indicators (job postings), and leading indicators (enrolment forecasts, employer sentiment). They are not all measuring the same thing, but they are all moving in the same direction.

The decline is not uniform across roles either. Developer roles like Android, Java, .NET, iOS, and web development are down 60% or more from 2020, while machine learning engineer postings are up 59%. The reallocation within tech is as telling as the overall decline.

Why are junior developers affected while experienced engineers are not?

It comes down to what kinds of knowledge AI can replace. This is also why AI is compressing teams rather than replacing programmers — the compression is selective.

Junior developers primarily supply codified knowledge. Writing boilerplate code, implementing standard patterns, debugging routine errors, writing unit tests. Well-documented, repeatable work.

Experienced engineers supply tacit knowledge. System-level architectural decisions, debugging novel failures across distributed systems, navigating stakeholder dynamics, mentoring. The kind of accumulated expertise that has never been written down and in many cases cannot be.

GenAI tools are excellent at codified tasks. The Anthropic Economic Index shows 79% of Claude Code conversations are classified as automation rather than augmentation. The specialist coding agent operates predominantly as a substitute for labour, not a complement to it.

So AI directly substitutes for the task mix that juniors perform, while it augments the work of experienced engineers who use it to amplify their existing judgement. The Stanford study confirms this: employment declines are concentrated in occupations where AI automates rather than augments, and junior roles cop the worst of it.

Wage stickiness adds another signal. Compensation has not fallen proportionally to employment. Firms are reducing headcount rather than cutting pay — which is consistent with task substitution at the entry level rather than a general softening of demand.

What does the Denmark study say and why does the US finding still stand?

Humlum and Vestergaard published an NBER Working Paper (33777) examining LLM effects on earnings and hours worked in Denmark. They found “precise null effects” — ruling out impacts larger than 2% two years after ChatGPT adoption.

This is a credible study. It was discussed at the 2025 NBER Summer Institute, Chicago Booth, the Chicago Fed, Microsoft Research, and MIT Sloan FutureTech. You cannot just dismiss it.

But several structural differences explain the divergence. Denmark has stronger labour protections and collective bargaining — rapid headcount reduction is simply harder there. The US tech sector adopted GenAI coding tools faster and more aggressively. And the post-2022 US tech correction created a permissive environment for replacing junior roles with AI tooling that Denmark did not experience.

The measurement difference matters too. Denmark measures earnings and hours for existing workers. Stanford measures employment counts. Firms can stop hiring new workers (the US pattern) without changing anything for the workers they already have (the Denmark measurement). Both studies can be correct at the same time.

The honest conclusion: US data shows a real, large employment decline correlated with AI exposure. Danish data shows this outcome is not inevitable everywhere. Structural context determines whether AI exposure translates into actual displacement.

What does the data not yet tell us?

The Stanford authors are upfront about this: “While we explore a variety of alternative explanations, we caution that the facts we document may in part be influenced by factors other than generative AI.”

The post-pandemic tech correction is a genuine confound. Layoffs at Meta, Google, Amazon and others in 2022-2023 hit junior engineers hardest, and they overlap temporally with GenAI adoption. Indeed’s analysis notes that nearly half the net decline in tech postings occurred before ChatGPT’s release — but suggests AI may be preventing the rebound that would otherwise have happened.

The post-ZIRP environment compressed entry-level demand independently of AI. The Stanford study addresses this by showing results hold separately for high and low interest-rate-exposed occupations. But the temporal overlap means the two cannot be fully untangled.

Sectoral breakdown is missing. The data does not distinguish between Big Tech, mid-market SaaS, FinTech, or other segments. The decline could be concentrated in specific sectors rather than spread evenly. Non-US data beyond Denmark is sparse — we do not know whether Australia, the UK, or Canada show US-like or Denmark-like patterns. Recovery scenarios are unmodelled.

What we can say with confidence: there is a statistically significant, age-asymmetric decline in employment for 22-25 year olds in AI-exposed occupations, robust to firm-time controls and pre-existing trend testing. What remains uncertain: whether AI is the primary cause or a contributing factor, the sectoral distribution, and whether the trend will persist or reverse. The company-level evidence corroborating these figures adds further texture, but the picture is still developing. For a complete overview of how this evidence connects to engineering team strategy, see our AI team compression guide for engineering organisations.

FAQ

How much have junior software developer jobs actually declined because of AI?

The Stanford/ADP study found a 16% relative employment decline for workers aged 22-25 in the most AI-exposed occupations compared to the least exposed. In absolute terms, employment for this cohort has fallen approximately 20% from its peak. Meanwhile, workers aged 35-49 in the same occupations saw 6-9% growth.

What is the Stanford Digital Economy Lab AI employment study?

“Canaries in the Coal Mine” (Brynjolfsson, Chandar, Chen, November 2025) uses ADP administrative payroll data covering 3.5-5 million US workers monthly. It classifies occupations by AI exposure quintiles and applies firm-time effects controls to isolate AI-related employment changes from company- or industry-level shocks.

Why can’t junior developers find jobs anymore?

Multiple factors are converging. AI tools automate the codified tasks that juniors have traditionally performed, tech sector layoffs reduced entry-level openings, and employer sentiment has shifted toward viewing AI as a substitute for intern-level labour. Entry-level tech hiring dropped 25% year-over-year in 2024.

Are internships disappearing because of AI coding tools?

Handshake reports a 30% decline in tech-specific internship postings since 2023. Indeed shows an 11% decline across all industries. AI tools are not the sole cause — the post-pandemic tech correction contributes — but 57% of hiring managers now trust AI’s work more than that of interns or recent graduates.

What does the Denmark study say about AI and employment?

The Humlum and Vestergaard study (NBER Working Paper 33777) found “precise null effects” in Denmark — ruling out earnings or hours impacts larger than 2%. Structural differences (stronger labour protections, different AI adoption rates, different measurement approach) likely explain why Danish results diverge from the US findings.

Is the junior developer job decline caused by AI or by the tech recession?

Both likely play a role. The Stanford study uses firm-time effects controls and pre-2022 placebo tests to isolate AI-specific effects, but the temporal overlap between GenAI adoption and the post-ZIRP tech correction means the two cannot be fully pulled apart. The age-asymmetric pattern supports AI as a distinct factor.

What is the CS graduate unemployment rate in 2025?

According to NY Fed data, computer science graduate unemployment is 6.1% and computer engineering graduate unemployment is 7.5%. The overall rate for young workers aged 22-27 is 7.4%. These figures are elevated relative to historical norms for technical fields.

What does codified vs. tacit knowledge mean for developer jobs?

Codified knowledge is the documented, repeatable stuff — boilerplate code, standard patterns, routine debugging. Tacit knowledge is system-level judgement, architectural decisions, and organisational context. AI is excellent at codified tasks, which makes junior developers more vulnerable to automation than experienced engineers who have accumulated tacit expertise over years.

Did Dario Amodei really predict 50% of entry-level jobs will disappear?

Amodei has stated that approximately 50% of entry-level white-collar jobs could be eliminated within five years. This is a forward projection from the CEO of Anthropic, not a peer-reviewed finding. It contextualises the trend data but should be treated as an informed prediction, not evidence.

How fast are AI coding tools being adopted by developers?

The Stack Overflow 2025 Developer Survey shows 84% of developers now use AI tools in development, up 14 percentage points from 2023. AI performance on SWE-Bench improved from 4.4% to 71.7% of problems solved between 2023 and 2024. Adoption is not gradual — it is near-universal.

Will CS enrolment decline because of the junior developer job market?

Forrester’s 2026 Predictions forecast a 20% drop in CS enrolments as prospective students respond to deteriorating job market signals. This sets up a potential feedback loop: fewer CS graduates entering the pipeline today could produce a senior engineer shortage in 5-10 years, even as AI reduces demand for entry-level workers.

AI Is Not Replacing Programmers — It Is Compressing Teams and Here Is Why That Distinction Matters

Engineering teams are shrinking at companies that are growing. Shopify now tells employees to prove they can’t do something with AI before they’re allowed to request new headcount. Klarna has kept output steady while trimming engineering numbers. There’s a name for this: team compression. Igor Ryazancev coined the term in early 2026 to describe what happens when AI tools let a smaller, more senior team produce equal or greater output.

This article is part of our comprehensive guide to team compression and its implications for engineering leadership. In this piece we’re going to break down the mechanism behind compression, the data that proves it, how fast it’s spreading, and why the way you frame it changes how you plan your engineering organisation.

What does “AI team compression” actually mean in software engineering?

Team compression is a reduction in engineering headcount driven by productivity multiplication. Not strategic retreat. Not business failure. AI coding agents — Claude Code, GitHub Copilot, Cursor — let individual engineers absorb work that previously needed additional people. The output stays the same or goes up. The team gets smaller.

Ryazancev’s framing positions AI as a “productivity multiplier” rather than a replacement engine. AI handles the repetitive, time-consuming stuff. Engineers focus on problem-solving, system design, and the creative work that actually moves a product forward.

This is already happening. 58% of developers expect engineering teams to become smaller and leaner as entry-level coding tasks get automated. Some Atlassian engineering teams now have engineers writing zero lines of code — it’s all agents — and those teams are producing two to five times more output. If you want the company-level evidence from Shopify, Klarna, and Tailwind, how Shopify, Klarna, and Tailwind are responding covers that in detail.

Why is compression not the same as replacement?

The replacement narrative says AI eliminates the need for human engineers. That’s empirically wrong. Demand for senior engineers is strong or increasing. What’s actually happening is the same work gets done by fewer people because each person is more productive. That’s a completely different organisational dynamic.

And it leads to a different response. If you think AI is replacing engineers, the rational move is defensive — upskill, protect jobs, slow adoption. If you recognise AI is compressing teams, the rational move is strategic — restructure, rethink your hiring pipelines, redefine roles. Companies that understand compression will reshape their organisations proactively. Companies that treat this as replacement will hoard headcount or freeze in place.

Dario Amodei, Anthropic’s CEO, has estimated AI could affect roughly 50% of entry-level white-collar jobs within five years. In the compression framing, that’s a signal about scale — not a prediction of mass unemployment. The jobs change. They don’t vanish uniformly. 65% of developers expect their roles to be redefined in 2026, and of those, 74% expect to spend far less time writing code and far more time designing technical solutions.

The media defaults to “replacement” because it makes a better headline. But it leads to the wrong playbook. The broader team compression phenomenon and what it means for engineering leadership requires a different set of moves entirely.

How does the automation-versus-augmentation distinction explain who gets compressed?

This is the core mechanism. Anthropic’s Economic Index classifies AI interactions into two buckets: automation, where AI directly performs the task, and augmentation, where AI collaborates with a human to enhance their output.

Claude Code shows 79% automation versus only 21% augmentation. Most coding agent interactions are the AI doing the work, not assisting a human doing the work. Compare that to Claude.ai — the chatbot — which sits at 49% automation. The agent form factor shifts the balance dramatically toward autonomous task completion.

Here’s what that means in practice. Automation displaces codified, repeatable, well-specified work — the stuff you’d typically hand to a junior developer. Augmentation amplifies judgment-intensive work that relies on tacit knowledge — system design, architectural decisions, stakeholder communication. The senior engineer role is expanding into what Justice Erolin, CTO at BairesDev, describes as “part architect, part AI orchestrator, and part systems-level problem solver.”

The impact is uneven by design. Junior developers face compression because their work overlaps heavily with what AI automates. Senior engineers get augmented because their work requires the kind of contextual judgment AI can’t replicate. As Brynjolfsson and colleagues at the Stanford Digital Economy Lab found, AI is “automating the codifiable, checkable tasks that historically justified entry-level headcount, while complementing the judgment-, client-, and process-intensive tasks performed by experienced workers.”

The numbers back this up. Early-career workers aged 22 to 25 in AI-exposed occupations experienced a 16% relative employment decline. Employment for experienced workers in those same occupations increased 6 to 9%. For the labour market data behind junior developer decline, the evidence is substantial. For what the senior engineer role is becoming, the shift is already underway.

What does the Anthropic data show about how AI coding agents actually behave?

The Anthropic Economic Index analysed 500,000 coding-related interactions across Claude.ai and Claude Code. This is behavioural data — what developers actually do — not survey data or projections.

Two automation subtypes stand out. Directive interactions (43.8% of Claude Code use) are where the developer describes a task and the AI completes it end-to-end. Feedback loop interactions (35.8%) are autonomous but iterative — the developer pastes error messages back, the AI adjusts, and the cycle repeats until the task is done. Together, those two patterns make up the 79% automation figure.

The automation story isn’t one thing. Some of it is fully hands-off. Some needs a human in the loop but still takes the human out of the implementation work. Anthropic’s own researchers note that “more capable agentic systems will likely require progressively less user input.” That trend line should get your attention.

There’s also a clear adoption gap by company size. Startup work accounted for 33% of Claude Code conversations versus only 13% for enterprise. Startups have fewer legacy constraints and faster adoption cycles. Large organisations are catching up, but more slowly. If you’re at a big company, know that the startups competing with you are already further along.

How fast is AI-driven team compression actually happening?

Adoption is broad. 92% of developers use AI coding assistants at least once a month. JetBrains puts the figure at 85%. Pick your source — the range is 84 to 92%, and the direction is up.

But broad adoption doesn’t equal deep automation. Only about 15% of developers have adopted vibe coding professionally. 72% say it’s not part of their work at all. And there’s good reason for the gap: 66% of developers cite “AI solutions that are almost right, but not quite” as their biggest frustration. Sound familiar?

So the picture is nuanced. Nearly every developer has access to AI tools. The fully autonomous workflows that drive the deepest compression are still a minority practice. But compression will accelerate as that gap closes. And this isn’t a one-time adjustment — the tools improve quarterly, and each improvement shifts more work from augmentation to automation.

Why does this distinction change what engineering leaders need to do next?

The compression framing gives you a concrete playbook. It demands three shifts.

First, rethink your hiring ratios. Fewer juniors, more seniors, different onboarding. 42% of project managers already identify AI/ML specialists as the biggest talent gap for 2026. That tells you where the demand is heading.

Second, plan for the pipeline problem. If you stop developing juniors now, you won’t have seniors in 2030. Prashanth Chandrasekar, Stack Overflow’s CEO, put it plainly: “If you don’t hire junior developers, you’ll someday never have senior developers.” Internship postings in tech have dropped 30% since 2023 while applications have risen 7%. The long-term pipeline consequences of pausing junior hiring are a downstream risk most organisations haven’t accounted for.

Third, redefine what “team size” means for output planning. The old headcount-to-output ratios don’t hold when each engineer is augmented by AI. You need new benchmarks.

The distinction also changes how you talk to your board. “We are compressing teams with AI” is a productivity story — it signals strategic sophistication. “AI is replacing our engineers” is a risk story that triggers defensive responses. The framing matters for capital allocation.

Forrester’s advice is worth repeating: “Don’t abandon entry-level hiring.” Someone trained on your systems with AI assistance might outperform a senior hire who has never touched these tools. The compression thesis doesn’t mean you stop investing in people. It means you invest differently.

FAQ

Is AI really replacing junior developers or is something else going on?

Something else is going on. AI is automating the codified, repeatable tasks that junior developers typically perform — but it’s not eliminating the need for developers altogether. The result is team compression: fewer juniors get hired because AI absorbs their task portfolio, while senior engineers become more productive. The 16% relative employment decline for ages 22-25 in AI-exposed occupations reflects this compression dynamic, not wholesale replacement.

What is the Anthropic Economic Index and why does it matter for engineering teams?

The Anthropic Economic Index is a recurring research series from Anthropic that analyses how Claude is actually used across the economy. It classifies hundreds of thousands of real coding interactions as either automation (AI does the task) or augmentation (AI assists a human). Its finding that 79% of Claude Code interactions are automation — compared to 49% for the Claude.ai chatbot — gives you empirical evidence that coding agents are qualitatively different from AI assistants. That’s a meaningful distinction when you’re planning team structures.

Are companies actually making teams smaller because of AI or is this just hype?

It’s not hype. Shopify requires AI-first approaches before approving new headcount. Klarna has reduced engineering headcount while maintaining or increasing output. Some Atlassian engineering teams have engineers writing zero lines of code while producing two to five times more output. These are structural decisions, not experiments.

How fast are companies adopting AI coding tools across industries?

Adoption is broad but uneven. DX data shows 92% of developers use AI coding assistants monthly. However, only 15% report using vibe coding professionally. The gap between broad tool access and deep autonomous use is where the adoption frontier sits right now.

What is the difference between directive and feedback loop interaction patterns?

Directive interactions (43.8% of Claude Code use) happen when a developer describes a task and the AI completes it end-to-end with minimal further input. Feedback loop interactions (35.8%) are autonomous but iterative — the developer provides error messages or validation, the AI adjusts, and the cycle repeats. Both count as automation, not augmentation. The developer isn’t doing the implementation work in either case.

What did Dario Amodei say about AI and entry-level jobs?

Dario Amodei, Anthropic’s CEO, has estimated AI could affect approximately 50% of entry-level white-collar jobs within five years. In the context of team compression, this signals the scale of the shift rather than predicting mass unemployment. It reflects a world where entry-level task portfolios are increasingly automated, changing the composition of teams rather than eliminating the need for human engineers.

Why does the team compression framing matter for how CTOs talk to their boards?

“We are compressing teams with AI” is a productivity and efficiency narrative — it signals strategic sophistication. “AI is replacing our engineers” is a risk narrative that triggers defensive board responses. The compression framing positions headcount reduction as a deliberate investment in AI-augmented productivity rather than an admission of disruption. How you frame it determines whether your board backs the strategy or pumps the brakes.

What is the pipeline problem caused by AI team compression?

If companies pause junior hiring because AI handles junior-tier tasks, the junior developers who would have become senior engineers in five to ten years never get developed. That creates a foreseeable senior engineer shortage in 2030-2035. Internship postings in tech have declined 30% since 2023 while applications have risen 7% — the pipeline is already contracting. This is one of those problems that’s easy to ignore now and very expensive to fix later.

Is vibe coding the same as AI team compression?

No. Vibe coding is one specific workflow — you describe what you want in natural language and hand implementation entirely to AI. Only about 15% of developers use vibe coding professionally. Team compression is the organisational outcome that emerges from many AI-augmented workflows, of which vibe coding is the most extreme but least common.

How does AI team compression differ at startups versus large enterprises?

Compression is further along at startups. Anthropic’s data shows 33% of Claude Code conversations serve startup work versus only 13% for enterprise applications. Startups have fewer legacy constraints, smaller teams already, and faster adoption cycles. Large enterprises are experiencing compression more slowly because of existing team structures and longer procurement cycles for AI tools. If you’re enterprise, the startups in your space are already ahead on this.

What the AI Memory Shortage Means for Tech Companies and What Comes Next

The global memory market has a problem. AI infrastructure buildout has reallocated semiconductor manufacturing capacity away from conventional memory toward high-bandwidth memory (HBM) for AI accelerators, and the downstream effects are reaching every corner of the tech industry. DRAM prices have more than doubled through 2025, and analysts project further significant increases through 2026.

If you’re running a tech company, this affects your cloud bills, your hardware refresh plans, your product economics, and your AI deployment costs. This series maps the full picture — what’s happening, why, who is winning and losing, and what you can do about it.

Explore the full series:

How AI killed the memory supply chain and why everything else is paying for it — the root causes and technical mechanics
DRAM prices in 2026 have doubled and the numbers are getting worse — price data, market forecasts, and historical comparisons
Samsung, SK Hynix, Micron, and the hyperscalers who locked up all the memory — winners, losers, and procurement power dynamics
How the AI memory crunch is wrecking PC, smartphone, and gaming GPU markets — downstream market impact
How to run AI workloads and manage infrastructure costs during the memory shortage — the enterprise response playbook
China, DRAM, and export controls: the geopolitical factors shaping memory supply — geopolitics and trade policy
When will the memory shortage end? Fab timelines and what they tell us — resolution scenarios and planning horizons

What is the AI memory shortage and why is it happening now?

The AI memory shortage is a manufacturing capacity problem, not a materials shortage. Semiconductor fabs — principally Samsung, SK Hynix, and Micron — have reallocated wafer production from conventional DRAM (used in PCs, phones, and servers) to high-bandwidth memory (HBM), which AI accelerators like Nvidia GPUs require. Because one HBM stack uses roughly three times the wafer capacity of equivalent standard DRAM, the pivot to HBM is compressing supply across every other memory-dependent market.

Unlike the pandemic-era chip shortage, this shortage stems from deliberate manufacturing allocation decisions, not external supply disruption. When a fab pivots production lines to HBM, the conventional DRAM those wafers would have produced simply does not exist. There is no buffer, no substitute, no emergency reserve.

The tipping point came when hyperscaler capital expenditure commitments locked in allocation priorities at a scale that pushed the market into acute shortage. OpenAI‘s Stargate initiative alone is projected to consume up to 40% of global DRAM output. Google, Amazon, Microsoft, and Meta placed open-ended orders with memory suppliers, indicating they would accept as much supply as available regardless of cost.

IDC‘s Francisco Jeronimo put it plainly: “For an industry that has long been characterised by boom-and-bust cycles, this time is different.”

Understanding why AI consumes so much memory helps explain why these allocation decisions are unlikely to reverse any time soon. For the full technical explanation of how we got here, see How AI killed the memory supply chain and why everything else is paying for it.

Why does AI need so much memory — what makes AI workloads so memory-hungry?

Large language models are not compute-limited — they are memory-bandwidth-limited. The “memory wall” means GPUs can only process data as fast as memory can deliver it, so AI accelerators require high-bandwidth memory that conventional DRAM cannot provide. For inference, the KV cache (the working memory that tracks active conversation state) grows with every concurrent user and every longer context window, making memory demand non-linear as AI products scale.

A single large LLM inference session requires significantly more high-speed memory than comparable traditional compute workloads. HBM is not optional for AI at scale — adding more GPU compute without proportional HBM bandwidth is largely wasted investment. SemiAnalysis estimates that HBM constitutes 50% or more of the cost of the packaged GPU.

For companies building AI products, inference memory costs are the dominant ongoing infrastructure expense, not training costs. Micron predicts the total HBM market will grow from $35 billion in 2025 to $100 billion by 2028 — a figure larger than the entire DRAM market in 2024.

We cover the technical deep dive on the memory wall in How AI killed the memory supply chain, and the cost management implications in How to run AI workloads and manage infrastructure costs.

How much have memory prices risen and where are they heading?

Memory prices have already risen sharply and forecasters expect further increases through at least 2026. DRAM prices overall have risen 172% since the shortage began. TrendForce revised its Q1 2026 conventional DRAM contract price forecast to +90–95% quarter-on-quarter — more than double the previous record quarterly increase. The global memory market is forecast to reach $551.6 billion in 2026.

The price escalation is broad-based. DDR5 for enterprise servers, LPDDR5X for smartphones, and HBM for AI accelerators have all seen steep increases. Enterprise SSDs have been prioritised over consumer NAND, pushing NAND flash contract prices up 55–60% as well. Spot market buyers — those without long-term supply agreements — are experiencing the full force of this volatility.

The current analyst consensus is that price normalisation is unlikely before 2027–2028 at the earliest. For the full price data and market forecasts, see DRAM prices in 2026 have doubled and the numbers are getting worse.

Who is winning and who is losing from the memory shortage?

The three major memory manufacturers — Samsung, SK Hynix, and Micron — are the primary beneficiaries. Constrained supply inflates their margins on both HBM and conventional DRAM, and together they control more than 90% of global memory chip production. Hyperscalers secured multi-year long-term agreements (LTAs) ahead of the shortage and have preferential supply access. PC makers, smartphone OEMs, gaming hardware vendors, and most enterprise buyers are absorbing cost increases with limited recourse.

On the buyer side, the LTA is the mechanism that separates the insulated from the exposed. Those without LTAs — which includes most enterprise companies and virtually all SMBs — compete on the volatile spot market. Google has stationed procurement executives in South Korea. Microsoft executives staged a walkout at SK Hynix negotiations. This is not gentle commerce.

The cost impact on PC OEMs is severe: memory now makes up approximately 35% of Dell and HP PC build materials, up from 15–18%. Apple is paying a 230% premium on LPDDR5X for the iPhone 17, even with secured LTAs.

For the full analysis of market power dynamics and LTA mechanics, see Samsung, SK Hynix, Micron, and the hyperscalers who locked up all the memory.

Which markets and industries are being hit hardest?

The PC and smartphone markets are experiencing the sharpest downstream impact. IDC projects a 10–11% decline in PC shipments and an 8–9% decline in smartphone shipments in 2026 as memory costs make affordable devices financially unviable. Gaming GPU supply has been depressed, with Nvidia unable to produce discrete gaming cards at previous volumes. Enterprise hardware prices are rising, and cloud infrastructure costs are beginning to increase.

The PC market is being hit from two directions: memory costs are rising while demand is also under pressure. Gartner projects entry-level laptops under $500 will become financially unviable within two years if current cost trajectories persist. PC vendors — Lenovo, Dell, HP, Acer, ASUS — have warned clients of 15–20% price hikes and contract resets.

Smartphone OEMs face a similar squeeze. Apple is partially insulated by LTAs but is still paying premium prices. Akihabara retailers in Tokyo have already implemented RAM purchase limits to prevent hoarding. The irony in the PC space is that AI PCs require a minimum of 16GB RAM for Microsoft Copilot+, but adding more RAM has become prohibitively expensive — the AI PC narrative is undermined by the same AI demand that created it.

For the full breakdown by market segment, see How the AI memory crunch is wrecking PC, smartphone, and gaming GPU markets.

What role does geopolitics play in the memory supply chain?

Geopolitics is actively shaping who can produce memory, who can sell it to whom, and where new capacity will be built. US export controls restrict advanced semiconductor manufacturing equipment from reaching China. The US has simultaneously threatened 100% tariffs on Samsung and SK Hynix chips — punishing the South Korean suppliers that Western buyers currently depend on while trying to build domestic alternatives.

China’s domestic memory response centres on CXMT (ChangXin Memory Technologies), which is targeting HBM3 mass production by end 2026 and has filed for a $4.2 billion IPO. But US export controls constrain CXMT’s access to ASML lithography equipment and other tooling, with localisation at only around 20%. SemiAnalysis projected CXMT would account for nearly 15% of global DRAM production by 2026 — meaningful, but not enough to resolve the shortage.

On the US side, the CHIPS Act is funding domestic capacity — primarily Micron’s $100 billion-plus fab complex in New York state. But fab construction and qualification timelines are 2–4 years. US Commerce Secretary Howard Lutnick made the geopolitical stakes explicit at Micron’s groundbreaking: “Everyone who wants to build memory has two choices: they can pay a 100% tariff, or they can build in America.”

For your planning purposes, the geopolitical dimension increases the risk that supply chain disruption is not simply a market cycle but a structural feature of the memory market for the rest of the decade. For the full analysis, see China, DRAM, and export controls: the geopolitical factors shaping memory supply.

When will the memory shortage end?

The consensus among analysts is that meaningful supply relief will not arrive before 2027, and the shortage may well persist into 2028. Building new fab capacity or converting existing capacity takes 2–4 years. All three major manufacturers have announced capacity expansion programmes, but these investments will not produce marketable memory for 12–24 months from their respective commitment dates.

Intel CEO Lip-Bu Tan was direct about it at the Cisco AI Summit in February 2026: “There’s no relief until 2028.”

The specific investments in train — Samsung’s Pyeongtaek P5 fab expansion, SK Hynix’s $13 billion new HBM assembly plant, and Micron’s New York state fab complex — are all real commitments, but none will translate to significant additional supply before 2027.

There is a meaningful uncertainty in forecasts. Demand destruction is an underaddressed scenario. If AI investment slows due to business model failures or algorithmic efficiency gains — smaller models requiring less memory — demand could moderate before new supply comes online, potentially bringing forward normalisation. This is not the consensus scenario but it is worth tracking.

You should plan infrastructure budgets on the assumption that memory-intensive costs will remain elevated through at least 2027. For the full fab timeline analysis, see When will the memory shortage end? Fab timelines and what they tell us.

What should technology companies do right now?

The highest-leverage responses are: optimise AI inference workloads using quantisation and architectural efficiency techniques to reduce memory footprint per inference call; evaluate infrastructure procurement decisions now — whether to commit to reserved cloud instances, move to bare metal, or maintain on-demand flexibility; and model infrastructure costs at current elevated prices to stress-test product economics rather than assuming prices will return to 2023 levels.

Quantisation — reducing model weight precision from 16-bit to 4-bit or 8-bit — is the most accessible immediate mitigation. Halving precision often allows GPU Tensor Cores to roughly double throughput, improving speed, memory usage, and energy efficiency simultaneously.

Infrastructure procurement decisions involve real trade-offs. Reserved cloud instances offer price certainty but require commitment. Bare metal offers the best per-unit economics but requires operational overhead. On-demand cloud offers flexibility but maximum price exposure. The right choice depends on your workload predictability and company size.

FinOps discipline is increasingly necessary. As hyperscalers begin passing memory cost increases through to cloud customers, infrastructure costs that were predictable are becoming volatile. Proactive tagging, usage monitoring, and commitment evaluation are baseline requirements.

If you don’t have the scale to sign long-term agreements directly with memory manufacturers, focus on the levers you do control: reducing per-inference memory consumption, locking in cloud commitment pricing where available, and timing hardware refreshes to avoid peak spot market pricing. For the full enterprise response playbook, see How to run AI workloads and manage infrastructure costs during the memory shortage.

Resource Hub: AI Memory Shortage Library

Understanding the Crisis

How AI killed the memory supply chain and why everything else is paying for it — the foundational technical narrative covering HBM, the memory wall, wafer reallocation, and the causality chain from AI architecture to hardware prices.
DRAM prices in 2026 have doubled and the numbers are getting worse — data-driven analysis of price increases across memory types, with market size forecasts and historical comparisons.

Market Dynamics and Impact

Samsung, SK Hynix, Micron, and the hyperscalers who locked up all the memory — who is winning, who is losing, how long-term agreements work, and why hyperscalers have supply access that most companies do not.
How the AI memory crunch is wrecking PC, smartphone, and gaming GPU markets — downstream impact analysis across consumer and enterprise hardware markets.

What Companies Can Do

How to run AI workloads and manage infrastructure costs during the memory shortage — the enterprise response playbook covering quantisation, FinOps, infrastructure procurement decisions, and timing strategies.

Geopolitics and Timeline

China, DRAM, and export controls: the geopolitical factors shaping memory supply — how US export controls, CXMT’s domestic ambitions, tariff threats, and CHIPS Act investments are reshaping the global memory supply map.
When will the memory shortage end? Fab timelines and what they tell us — forward-looking analysis of fab construction timelines, capacity expansion commitments, and what to plan for in 2027 and beyond.

FAQ Section

What is high-bandwidth memory (HBM) and how is it different from regular DRAM?

HBM is a specialised DRAM architecture that stacks up to 12 memory dies vertically using through-silicon vias (TSVs), delivering dramatically higher data transfer bandwidth than conventional DDR or LPDDR modules. One HBM stack uses approximately three times the wafer capacity of equivalent standard DRAM, which is the core reason the shift to HBM production is compressing conventional memory supply. See How AI killed the memory supply chain for the full technical explanation.

How does the 2024–2026 memory shortage compare to the 2020–2023 chip shortage?

The 2020–2023 shortage was driven by pandemic supply chain disruptions — external shocks to production and logistics. The 2024–2026 shortage is driven by deliberate manufacturing strategy: fabs are actively choosing to reallocate wafer capacity to higher-margin HBM because AI demand is so strong. This makes the resolution timeline different and slower — the market is waiting for capital investment to produce new capacity, not for disruptions to clear. See DRAM prices in 2026 for the comparative analysis.

What is a long-term agreement (LTA) and why does it matter for memory supply?

An LTA is a multi-year supply contract between a memory buyer and a manufacturer that locks in committed volumes. Hyperscalers signed these agreements before the shortage tightened, securing preferential supply access. Buyers without LTAs — which includes most enterprise companies and virtually all SMBs — compete on the volatile spot market. See Samsung, SK Hynix, Micron, and the hyperscalers for a full explanation of how LTAs work.

Will cloud costs rise because of the memory shortage?

Yes, with a lag. Hyperscalers have so far largely absorbed increased memory costs rather than passing them immediately to cloud customers, but this is not sustainable indefinitely. As memory contract terms roll over and new agreements are signed at elevated prices, cloud pricing increases are expected to follow. See How to run AI workloads and manage infrastructure costs for how to prepare.

Can China’s domestic memory industry fix the shortage?

CXMT represents China’s strategic response but its near-term capacity is constrained by US export controls on advanced semiconductor manufacturing equipment. CXMT is targeting HBM3 mass production by end 2026 and could meaningfully supplement conventional DRAM supply, but it cannot substitute for SK Hynix, Samsung, or Micron at current capacity levels. See China, DRAM, and export controls for the full analysis.

Should technology companies buy server hardware now or wait for prices to improve?

It depends on your timeline and workload certainty. If you have predictable, near-term memory-intensive requirements, buying now at current elevated prices may be preferable to buying at potentially higher spot prices in 12–18 months. If requirements are uncertain or 2–3 years out, waiting for supply normalisation may yield better unit economics. See When will the memory shortage end? for supply timeline analysis.

What does quantisation mean and why does it help with memory costs?

Quantisation reduces the numerical precision of AI model weights — for example, from 16-bit floating point to 8-bit or 4-bit integers. Lower-precision weights require less memory to store and less bandwidth to transfer during inference, reducing the GPU memory footprint per inference call. Halving precision typically allows Tensor Cores to roughly double throughput. Dropbox uses quantisation in production for its Dash product. See How to run AI workloads and manage infrastructure costs for the full optimisation playbook.

When Will the Memory Shortage End and What the Fab Timelines Actually Tell Us

The memory shortage is structural, and the question sitting on top of every infrastructure budget is the same: when does it end? The people who should know can’t agree. Intel’s CEO says no relief until 2028. Counterpoint Research reckons Q4 2027. IDC says it’s not a cyclical blip at all — it’s a permanent structural reset.

Why can’t they agree? Because they’re making different bets on two things: how fast AI demand will keep growing and how quickly HBM yields will improve. This article maps every announced fab timeline against demand, explains why prices will come down a lot more slowly than they went up, and gives you a planning horizon you can actually work with. For the full picture of how AI created this mess, see the full AI memory shortage story.

What Do the Analysts Actually Disagree About — and Why Does It Matter?

Intel CEO Lip-Bu Tan told the Cisco AI Summit in February 2026 there is “no relief until 2028” — the most pessimistic mainstream forecast from a named executive. Counterpoint Research analyst Tarun Pathak puts Q4 2027 as the earliest point where supply and demand curves could cross.

Micron CEO Sanjay Mehrotra projects the HBM market hitting $100 billion by 2028. Micron is sold out for 2026, with demand having “far outpaced our ability to supply that memory.” TrendForce projects the memory market surging to $842.7 billion in 2027. And IDC’s Nabilia Popal doesn’t mince words: “This is not just a temporary situation. This is going to result in a structural reset of the entire industry.”

So here’s where the disagreement lands. It’s all about two variables: how fast AI demand will grow and how quickly HBM yields will improve. Get both wrong in the same direction and you’re off by years.

The spread between Q4 2027 and 2028+ is the window you need to plan around. For the current price data, see our analysis of why DRAM prices have doubled.

What Is the Fab Construction Timeline — and Why Can’t Money Simply Speed It Up?

A new fab takes at least 18 months to build — and that’s just the physical construction. Getting yields up to meaningful volume adds more months on top. Each facility costs $15 billion or more. The physics don’t compress, no matter how much capital you throw at them.

The timing problem goes back to the 2022-2023 memory bust. Prices tanked, Samsung cut production by 50%, and once things recovered the industry was gun-shy about expanding. Then AI demand exploded and caught everyone short.

HBM production makes it worse. Each HBM chip stacks up to 12 thinned-down DRAM dies. When Micron makes one bit of HBM, it has to forgo making three bits of conventional memory — the “three-to-one basis.” And even when DRAM wafers are available, packaging capacity at TSMC (CoWoS) and SK Hynix (MR-MUF) puts a hard cap on how many HBM units actually get assembled.

Shawn DuBravac at the Global Electronics Association reckons yield improvement is the faster path: better stacking efficiency and tighter coordination between memory suppliers and AI chip designers will deliver gains before new fabs do. For more on the packaging bottleneck, see how AI broke the memory supply chain.

When Will New Capacity Actually Arrive — and Will It Be Enough?

The 2027 wave brings three facilities: Micron Singapore (HBM fab), Micron Taiwan (retooled from a PSMC acquisition, H2 2027), and SK Hynix’s $13 billion HBM packaging facility at Cheongju — the world’s largest HBM assembly plant.

The 2028 wave adds SK Hynix West Lafayette, Indiana (CHIPS Act-funded) and Samsung Pyeongtaek with a new production line.

Then there’s the Micron Clay, New York megafab — the single largest planned capacity addition. It broke ground in January 2026, but first production will not arrive until 2030. The facility that would have made the biggest near-term difference is now a 2030 story.

Here’s the catch: all three 2027 facilities are purpose-built for HBM. They don’t directly relieve conventional DRAM supply. And new capacity has to outpace growing demand, not just match it. NVIDIA’s B300 uses eight HBM chips, each stacking 12 DRAM dies. HBM4 goes to 16. IDC expects just 16% year-over-year DRAM supply growth in 2026, well below AI-driven demand growth. These new fabs may merely keep pace rather than creating any surplus.

Why Do Memory Prices Fall More Slowly Than They Rise?

Most shortage coverage skips this bit, and it’s the part that matters most for your budget.

Kim at Mkecon Insights puts it plainly: “In general, economists find that prices come down much more slowly and reluctantly than they go up. DRAM today is unlikely to be an exception.”

It’s called price asymmetry. Three things drive it.

First, contract reset cycles. OEMs like Dell and HP purchase memory in bulk about a year in advance. When spot conditions ease, your contracted pricing stays elevated until the next reset window. You’re locked in.

Second, vendor inventory protection. Pricing floors and volume commitments slow the pass-through of falling costs. Procurement leverage hinges on strategic alignment, not volume. Hyperscalers lock in supply. Everyone else fights over what’s left.

Third, vendor pricing discipline. Manufacturers who sank $15 billion into each fab need to recoup that capital. Analysts assess that these price increases appear more durable than temporary. They’re not going to race to the bottom.

For your budgeting: even when supply relief arrives, prices won’t snap back. Model a gradual 12-18 month decline from the inflection point, not a step function.

What Planning Horizon Should Enterprises Use for Hardware Refresh and Cloud Commitment Decisions?

Here’s the concrete planning guidance.

Upside scenario (Q4 2027): The earliest possible inflection, if HBM yield ramps exceed expectations and AI demand moderates. This is the Counterpoint Research view. Don’t build your base budget on this.

Base case (gradual improvement through 2028): New fabs come online, yield improves, but price asymmetry means prices decline slowly. Meaningful relief arrives mid-2028. Use this for your 2026-2028 infrastructure roadmap.

Downside scenario (2029-2030): If HBM4 demand accelerates before capacity catches up, or if the Micron Clay NY delay cascades, full normalisation doesn’t arrive until 2030.

For hardware refresh: if servers need replacing in 2026, waiting for price relief is not viable. The earliest meaningful supply improvement is 18+ months away. Budget at current elevated pricing and move on.

For cloud commitments: multi-year contracts signed in 2026 lock in shortage pricing. Consider shorter terms with renegotiation windows aligned to the Q4 2027-2028 relief window. Give yourself room to move.

Even after normalisation, memory pricing may settle at a higher equilibrium than pre-2025 levels. The permanent wafer reallocation toward HBM means conventional DRAM carries a structural premium until manufacturers reverse course. For more on timing these decisions, see managing infrastructure costs during the shortage.

Is There a Scenario Where the Shortage Ends Faster — or Slower — Than the Base Forecast?

Yes, on both sides.

The fastest path to relief is a demand contraction. If enterprise AI deployments don’t prove ROI — a risk Deloitte‘s 2026 semiconductor outlook flags — spending could contract before new capacity arrives. Deloitte notes the industry “has placed all its eggs in the AI basket”. If those eggs don’t hatch, the surplus comes early.

On the supply side, HBM yield improvements could accelerate relief faster than new construction. SK Hynix’s MR-MUF process and hybrid bonding could enable denser stacks without waiting for new fabs.

The downside scenarios deserve equal weight. HBM4 goes to 16 stacked dies versus 12 today, consuming more wafer area before new capacity arrives. McKinsey predicts $7 trillion in data centre spending by 2030. If that accelerates, demand could outrun even the most aggressive fab plans.

Unlike the COVID chip shortage, which was cyclical, this one is structural. Resolution requires active vendor decisions to reverse wafer allocation, not just the passage of time.

Your planning assumption: gradual improvement through 2028, with demand-side wildcards as the main variables to watch. For the full story, see what the memory supply crisis means for tech companies and what comes next.

FAQ

When will the memory shortage actually end?

The credible range is Q4 2027 (Counterpoint Research) to 2028+ (Intel CEO’s forecast). Full price normalisation may not come until 2030, given the Micron Clay NY delay and price asymmetry. Plan for gradual improvement through 2028 as your base case.

Why is the 2025-2026 memory shortage different from the COVID chip shortage?

The COVID shortage was cyclical — a temporary demand spike that normalised, and supply caught up. This one is structural — IDC frames it as a permanent reallocation of wafer capacity toward HBM. It doesn’t self-correct when demand eases. Manufacturers have to deliberately reverse it, and that requires them to be confident conventional DRAM demand will recover.

Is it a bad time to buy servers because of the AI memory crunch?

If servers need replacing in 2026, waiting isn’t an option — the earliest meaningful supply improvement is 18+ months out. Budget at current elevated pricing. For discretionary upgrades, consider deferring to late 2027 or early 2028.

How long is the AI chip shortage going to last?

The AI-driven memory shortage is forecast to persist through at least 2027, with most analysts projecting gradual improvement in 2028. It specifically affects DRAM and HBM supply; broader chip shortages vary by segment.

Will RAM prices ever go back to normal?

Price asymmetry means memory prices fall more slowly than they rise. Contract reset cycles and vendor pricing strategy slow the decline. Pre-2025 pricing levels may not return because of the structural premium from permanent wafer reallocation toward HBM.

Why can’t memory manufacturers just build more fabs faster?

A new fab takes 18+ months to build and costs $15 billion or more. Getting yields up to meaningful volume adds further months. The physics of cleanroom construction, equipment installation, and yield optimisation simply can’t be compressed with more money.

What is HBM and why does it cause shortages of regular memory?

HBM (high-bandwidth memory) stacks multiple DRAM dies vertically to provide the bandwidth AI GPUs need. Each HBM unit consumes three times the wafer area of conventional DRAM. When manufacturers allocate cleanroom capacity to HBM, they directly reduce conventional DRAM output. It’s a zero-sum game.

Which memory fab will make the biggest difference to supply?

The Micron Clay, New York megafab was the largest planned capacity addition, but it’s been delayed to 2030. SK Hynix Cheongju (2027) and Micron Singapore (2027) are the nearest major additions, though both are HBM-focused.

What is the CHIPS Act doing about the memory shortage?

The CHIPS Act funds US-based fab construction, including Micron’s Clay NY facility and SK Hynix’s West Lafayette plant. But these are targeting 2028-2030 production starts, so CHIPS Act investment doesn’t speed up near-term relief.

Should I sign a multi-year cloud contract during the memory shortage?

Consider shorter commitment terms with renegotiation windows aligned to Q4 2027-2028. Multi-year contracts signed in 2026 lock in shortage pricing. Shorter terms give you the flexibility to renegotiate when supply improves.

What does “structural reset” mean for memory pricing?

IDC uses “structural reset” to describe the permanent reallocation of wafer capacity from conventional DRAM to HBM. Unlike a cyclical shortage, this shift doesn’t self-correct when demand eases — manufacturers have to deliberately reverse it, and that requires confident forecasts of conventional DRAM demand recovery.

Could the memory shortage end sooner if AI demand drops?

Yes. If enterprise AI deployments fail to prove ROI (a risk Deloitte flags), AI infrastructure spending could contract before new capacity arrives. That would reduce HBM demand, free up wafer capacity for conventional DRAM, and potentially create oversupply after 2027.

China DRAM Export Controls and the Geopolitical Factors Shaping Memory Supply

The global memory shortage is not just about supply and demand. Geopolitics is actively redrawing the map of who gets to make what, and where. U.S. export controls are blocking China from accessing advanced memory technology. Proposed tariffs on Korean and Taiwanese chips threaten to jack up costs for everyone else. And China’s biggest DRAM maker, ChangXin Memory Technologies (CXMT), is ramping up conventional DRAM production but remains years away from manufacturing the high-bandwidth memory (HBM) that AI actually needs.

Here is the thing you need to understand: coverage of Chinese memory production routinely conflates conventional DRAM with HBM. They are not the same thing. This article separates what CXMT can realistically deliver from what it cannot, and gives you the geopolitical context for the global AI memory shortage without requiring a trade policy degree.

What Is China’s Actual Share of Global DRAM Production?

CXMT is based in Hefei, produces DDR5 and LPDDR5 at scale, and is now the world’s fourth-largest DRAM manufacturer — holding roughly 15% of global output.

But here is what matters: all of that is conventional DRAM. CXMT does not produce HBM, which is the memory type in shortest supply for AI workloads. Technologically, it sits about two process generations behind Samsung and SK Hynix.

Then there is Yangtze Memory Technologies (YMTC), China’s NAND flash leader, which is constructing a third fab in Wuhan with around half its planned output earmarked for DRAM. But YMTC’s DRAM ambitions are even further from commercial scale than CXMT’s HBM plans.

HP, Dell, Acer, and Asus are actively qualifying CXMT DRAM for non-U.S. devices. That is a legitimisation signal, but HP will reportedly only put CXMT chips in devices sold outside the United States — a deliberate compliance firewall. So CXMT is a real player, but its relevance to the Samsung/SK Hynix/Micron competitive position is confined to conventional DRAM in specific markets.

Do U.S. Export Controls Make the Memory Shortage Better or Worse?

Short answer: worse.

U.S. Bureau of Industry and Security (BIS) export controls restrict the transfer of advanced semiconductor technology to China — including HBM stacking processes and packaging equipment from ASML, KLA, and Applied Materials. The rules limit tool sales for sub-18nm DRAM processes and target advanced packaging tech relevant to HBM.

The practical effect is a technology ceiling. CXMT simply cannot get the equipment it needs to manufacture HBM at competitive yield. And because Chinese AI companies like Huawei cannot access Korean or U.S. HBM either, domestic demand concentrates on whatever CXMT can produce. CXMT’s growth stays inside China. It does not relieve global HBM shortages one bit.

It is worth understanding the distinction between export controls and tariffs. Controls restrict China’s ability to produce competitive alternatives. Tariffs increase the cost of allied supply. Both make things worse for enterprise buyers, but the question of whether Chinese growth changes the shortage resolution timeline depends entirely on which lever you are looking at.

CXMT is also on the U.S. Department of Defense “Chinese military company” list. It is not banned outright, but the designation creates compliance review requirements for any Western company even considering CXMT as a supplier.

One more thing worth knowing. Samsung and SK Hynix both operate large fabs inside China that are now frozen in capability — running, but unable to upgrade. As Chinese domestic production expands, both companies may eventually exit the Chinese market entirely. That is an underappreciated supply variable that most analyses skip over.

Can CXMT and YMTC Provide Meaningful Supply Relief?

In the near term (2026–2027), CXMT provides some conventional DRAM relief for consumer devices and domestic Chinese buyers. It is targeting HBM3 by end of 2026 and has delivered HBM3 samples to Huawei. But samples and volume production are very different things.

The challenge is mastering through-silicon via (TSV) stacking — bonding multiple DRAM dies into a single HBM package, where a single misstep ruins the whole stack. Chinese equipment makers like Naura Technology are developing domestic tooling, but none of it is confirmed ready for mass production.

So do not plan your procurement around Chinese HBM. It is not going to deliver at volume in the 2026–2027 timeframe. Looking further out to 2028 and beyond, Chinese HBM at scale is plausible but not guaranteed. Counterpoint Research expects Korean firms to maintain dominance.

What Does Supply Chain Diversification Actually Mean for the Shortage Timeline?

New fab investments are underway. Micron is building in Taiwan and New York. SK Hynix is investing in Indiana. Expansions are happening in Japan and Singapore. These are partly enabled by the U.S. CHIPS Act, and Taiwan has committed approximately $318 million in subsidies to Micron for HBM R&D. For full context on how the memory shortage is reshaping the entire tech sector, see the global AI memory shortage overview.

But new fabs come online in the 2027–2030 window. For the current shortage, diversification is a strategic hedge, not a supply solution.

Then there is the tariff threat. U.S. Commerce Secretary Howard Lutnick has warned Samsung and SK Hynix they could face 100% tariffs. His exact words at Micron’s New York fab groundbreaking: “Everyone who wants to build memory has two choices: They can pay a 100% tariff, or they can build in America.”

And here is the part that gets missed in most coverage: long-term agreements (LTAs) between hyperscalers and memory makers lock up existing capacity for AI workloads. New fab output may already be committed before it reaches the open market. Diversification does not put more memory on the spot market for enterprise buyers anytime soon.

Is There a Scenario Where Memory Demand Collapses Before New Supply Arrives?

This is the question that matters most for capital planning.

Deloitte’s 2026 semiconductor outlook flags that the industry may need to prepare for a demand correction. If enterprises cannot demonstrate returns on AI investment, the hyperscaler capex driving HBM demand could slow down. The DeepSeek efficiency narrative fed into this — if models get more efficient, maybe memory demand falls.

But committed spending tells a different story. Microsoft, Google, Amazon, and Meta have committed record AI infrastructure spending. Stargate — the $500B OpenAI/SoftBank/Oracle joint venture — represents demand spanning years. Hyperscalers are placing open-ended memory orders, accepting any HBM volume at any price. That does not look like demand about to fall off a cliff.

The reconciliation is inference scaling. More efficient models get deployed more broadly. AI workloads have expanded from training into inference and decision-making — and each stage adds memory demand at the system level. Efficiency per model can actually increase total consumption. It is a risk scenario, not a forecast — but it is worth stress-testing your procurement assumptions against.

What Does All of This Mean for Enterprise Memory Sourcing?

Chinese DRAM provides minimal near-term relief for Western enterprise buyers. CXMT does not serve the global spot market at scale, and the DoD blacklist creates compliance requirements most procurement teams are not set up for. Qualifying CXMT’s chips does not mean OEMs will automatically order from them. It is a contingency measure for non-U.S. markets. Enterprise-grade supply for Western buyers remains controlled by Samsung, SK Hynix, and Micron — and understanding how those three makers locked up supply with hyperscalers is essential context for any sourcing negotiation.

So what should you actually do? Check whether your supply chain has indirect CXMT exposure through contract manufacturers. Build sourcing strategies that account for export control tightening and tariff escalation as separate scenarios. And secure supply commitments with established suppliers before the tariff situation resolves — waiting carries price spike risk regardless of which geopolitical path unfolds.

The geopolitical layer of the broader memory supply crisis does not change your near-term options much. But it changes the medium-term calculus considerably, and it is worth understanding before you lock in multi-year sourcing strategies.

FAQ

Does CXMT sell memory directly to enterprise buyers outside China?

Not at meaningful scale. CXMT primarily serves the domestic Chinese market. Enterprise-grade memory for Western markets remains dominated by Samsung, SK Hynix, and Micron.

Is it legal to buy CXMT memory for products sold in the United States?

CXMT is on the U.S. DoD “Chinese military company” list, which restricts certain transactions. It is not an outright ban, but you should conduct a formal export control compliance review before incorporating CXMT components into any U.S.-destined product.

When will China be able to manufacture its own HBM?

CXMT is targeting HBM3 production by end of 2026, but volume production at competitive yield is more realistically a 2028+ prospect. The technology gap in TSV stacking and domestic equipment maturity are the primary constraints.

What is the difference between U.S. export controls and tariffs on memory chips?

Export controls block China’s access to manufacturing equipment. Tariffs increase the cost of memory from allied nations like South Korea and Taiwan. They are distinct policy levers with different supply-side effects, but both make the shortage worse.

Will the proposed 100% tariff on Korean and Taiwanese memory actually happen?

As of early 2026, it has not been enacted. But the threat alone is already influencing procurement decisions and pricing expectations. Plan for multiple tariff scenarios.

How does the DeepSeek efficiency narrative affect memory demand?

More efficient models reduce memory demand per model, but they get deployed more broadly. Inference scaling tends to sustain or increase total memory consumption at the aggregate level.

What happens to Samsung and SK Hynix fabs inside China?

They continue operating at their current technology level but cannot upgrade due to export control rules. The long-term future of these assets is unclear and represents an underappreciated variable in supply projections.

Can supply chain diversification solve the memory shortage?

Not within the 2026–2027 window. New fabs operate on 3–5 year construction and ramp timelines. Diversification reduces geographic concentration risk long term but does not provide near-term relief.

What is the CHIPS Act’s role in memory supply?

The U.S. CHIPS Act subsidises domestic semiconductor manufacturing, supporting fab investments by Micron and SK Hynix on U.S. soil. It accelerates geographic diversification but does not change the timeline for new capacity coming online.

Should enterprise procurement teams worry about indirect CXMT exposure?

Yes. As OEMs qualify CXMT memory, components sourced through contract manufacturers may include CXMT DRAM. Audit your supply chain for indirect exposure, particularly for products with U.S. market distribution.

How the AI Memory Crunch Is Wrecking the PC Smartphone and Gaming GPU Markets

For the first time in roughly 30 years, NVIDIA isn’t releasing a new gaming graphics card. Not because anything went wrong in engineering. Because the memory those cards need is worth more to AI accelerator customers.

The AI infrastructure buildout has turned semiconductor wafer capacity into a zero-sum competition — and consumer electronics are losing. IDC projects PC shipments down 11.3% and smartphone shipments down 12.9% in 2026. LPDDR5x mobile memory prices are up 90% quarter-on-quarter. Counterpoint Research calls it “the steepest on record.”

This article maps the damage across four markets and identifies which brands and buyers are most exposed. For the background on how we got here, the AI memory shortage and its causes are covered in our pillar overview.

Why Did NVIDIA Delay Gaming GPUs for the First Time in 30 Years?

NVIDIA has shipped new gaming graphics cards every year for roughly three consecutive decades. As of early 2026, TrendForce confirmed both the planned RTX 50-series Kicker refresh and the RTX 60-series launch have been paused — making 2026 the first year in modern NVIDIA history with no new gaming GPU.

The economics are pretty straightforward. Gaming GPU revenue was approximately 35% of NVIDIA’s total revenue in 2022. By late 2025 that had collapsed to around 8%. NVIDIA’s AI compute segment runs at approximately 65% operating margins versus roughly 40% for gaming graphics. So SK Hynix, Samsung, and Micron — who together account for more than 90% of global memory production — have reallocated accordingly. They’re prioritising HBM for AI accelerators over the GDDR6 and GDDR7 that gaming cards require.

There’s a geopolitical angle here too. US export controls restrict Chinese researchers from buying NVIDIA’s advanced AI chips. One workaround that emerged was buying GeForce gaming GPUs for AI training and inference instead. Pausing production closes that channel — a second-order effect that goes well beyond consumer gaming.

How Is the Memory Shortage Hitting the PC Market at the Worst Possible Time?

The PC industry entered 2026 expecting a tailwind. Microsoft ending support for Windows 10 was supposed to drive one of the largest enterprise refresh cycles in years. Instead, peak memory pricing arrived at exactly the same moment as the Windows 10 EOL deadline. Terrible timing.

The cost mechanics are stark. HP‘s CFO disclosed that memory now accounts for 35% of a PC’s build cost — up from 15–18% the prior quarter. Dell‘s COO reported DRAM at $2.39 per gigabit, a 5.5x increase over six months. Lenovo, Dell, HP, Acer, and ASUS have all warned of 15–20% unit price hikes. Gartner projects PC prices rising 17% in 2026. IDC projects shipments falling 11.3% — “more negative than even our most pessimistic scenarios suggested just a few months ago.”

Gartner estimates higher memory prices are causing enterprise refresh cycles to lengthen by 15%. For organisations that can’t defer, the math is painful: you’re paying 15–20% more per device in a supply environment where vendors aren’t guaranteeing prices beyond two to three weeks. IDC expects actual PC shortages to start in Q2 2026 as smaller vendors struggle to secure components. If you want to understand the price mechanics underlying these market declines, the DRAM contract pricing trajectory explains both the scale and the duration.

What Is Happening to Smartphone Prices and Shipments in 2026?

The smartphone market is experiencing what Counterpoint Research calls “the sharpest decline on record.” Global shipments are projected to fall 12% in 2026 — down to volumes not seen since 2013. IDC projects a 12.9% decline.

The driver is LPDDR5x — the mobile memory variant powering mid-range and flagship smartphones. Prices have risen approximately 90% quarter-on-quarter. That’s the steepest in the technology’s history. The reason is direct competition from AI infrastructure: NVIDIA’s most powerful AI rack systems contain 54 terabytes of LPDDR5x each — the same variant that goes into your phone. As Counterpoint’s Tarun Pathak put it: “A lot of these memory companies are asking smartphone vendors to stand in line behind the hyperscalers.”

Smartphone average selling prices are expected to rise 3–8% in 2026 (IDC) or as much as 14% (Gartner). The problem is that the buyers most exposed — those shopping in the sub-$400 tier — are also the most price-sensitive. When a $250 Android phone needs to be priced at $290 to maintain margins, unit volumes collapse faster than the price increase percentage suggests. Qualcomm is already absorbing the downstream damage as OEM customers order fewer application processors.

Why Is the AI PC Rollout Stalling Despite All the Industry Momentum?

The AI PC story heading into 2026 was coherent. Intel integrated Neural Processing Units into its Core Ultra line. AMD followed with Ryzen AI. Microsoft defined Copilot+ PC — requiring a minimum of 16GB DRAM — as the hardware standard for on-device AI. Lenovo, Dell, HP, Acer, and ASUS all launched Copilot+ lines.

Then the memory cost shock arrived.

The Copilot+ specification requires 16GB DRAM minimum; 32GB is recommended for serious AI inference workloads. Under current DRAM pricing, those 32GB configurations have become “economically punishing” — IDC’s framing, not ours: “Just as the industry is seeing a need to add more RAM, it has become prohibitively expensive to do so, even if they can get supply. This will result in higher prices, lower margins, or a potential downmix in the amount of RAM in new systems at the worst possible time.”

Here’s the bind that creates. If OEMs ship Copilot+ at 16GB minimum rather than 32GB recommended, those devices can’t run the AI workloads that justify the Copilot+ premium. You pay more for a device that delivers less. Gartner confirmed enterprises will continue investing in AI PCs but at a slower rate, “likely to purchase devices with reduced memory.” The momentum isn’t gone — it’s been pushed back by at least 12–18 months. For actionable guidance on evaluating AI PC procurement against current memory pricing, the cost framework requires more than a simple timing call.

Which Brands Are Most Exposed — and Why Apple Is Different?

The memory shortage doesn’t affect all brands equally. The clearest illustration: Apple versus Android mid-range OEMs.

Apple: well-positioned. Apple secures memory supply through long-term agreements covering 12–24 months in advance. Its 2026 flagship iPhone models are expected to hold at 12GB RAM rather than increasing to 16GB, keeping BOM exposure manageable. Apple’s premium pricing means even a 3–5% cost pass-through doesn’t threaten demand.

Android mid-range OEMs: exposed. Honor, Vivo, Oppo, Xiaomi, TCL, Transsion, and Realme operate on thin margins with little supply hedging and large portfolios in the sub-$400 tier. Counterpoint Research forecasts these brands will be hit hardest.

Samsung: the dual-role position. As a vertically integrated memory manufacturer, Samsung produces the LPDDR5x it uses in its own phones — supply security that no pure-play OEM can match. SK Hynix and Samsung together surpassed the combined market cap of Alibaba and Tencent in early 2026. That tells you who’s winning from this shortage.

PC OEMs: scale matters. Lenovo’s scale gives it supplier leverage that smaller regional OEMs simply can’t match. Gartner has said directly that “consolidation isn’t off the map here — it’s survival of the fittest.” For broader context on the broader memory supply crisis driving these dynamics, the structural supply constraints explain why this asymmetry isn’t closing quickly.

What Does This Mean for Companies Making Hardware Decisions Right Now?

The memory shortage translates directly into decisions that need to be made in the next 90 days: PC refresh timing, AI PC adoption, smartphone fleet management, server hardware planning.

Here’s what you’re working with:

Memory prices won’t stabilise until 2027 at the earliest. The top four cloud providers are projected to spend $600 billion on AI infrastructure in 2026 — 70% more than 2025 — sustaining demand pressure well into next year.

Hardware costs are 15–20% higher than 12 months ago. Dell’s data ($2.39/Gbit DRAM, up 5.5x over six months) and HP’s BOM disclosure (memory at 35% of build cost, up from 15–18%) set the baseline. If your budget was built on 2025 hardware cost assumptions, it’s wrong.

The Windows 10 EOL decision has no good timing. Organisations that can’t run Extended Security Updates are refreshing at peak memory prices. That’s the constraint — not the strategy.

A practical framework to work through:

Assess your true Windows 10 EOL exposure. Can you purchase Extended Security Updates to buy 12–18 months while prices normalise?
Evaluate standard PC versus Copilot+ AI PC. For most buyers, the Copilot+ premium is hard to justify at current memory pricing when normalisation isn’t expected before 2027.
Tier your vendor selection by supply chain position. Prioritise Lenovo, HP, Dell — suppliers with established long-term agreements and scale. IDC expects shortages to hit smaller vendors in Q2 2026.
Build a 15–20% hardware cost buffer into your 2026–2027 technology budget.

For the full response strategy covering AI workload cost management during the shortage, the framework goes beyond hardware procurement. And for when pricing will actually normalise, the memory shortage end-date analysis covers what the production data suggests about the 2027–2028 horizon.

Frequently Asked Questions

Why is NVIDIA not releasing a new graphics card in 2026?

Gaming GPU revenue collapsed from approximately 35% of NVIDIA’s total revenue in 2022 to around 8% in 2025. The GDDR memory gaming cards require is more profitably allocated to HBM for AI accelerators, where operating margins run at approximately 65% versus 40% for gaming. TrendForce confirmed the pause covers both the RTX 50-series Kicker refresh and the next-generation RTX 60-series launch — making 2026 the first year in approximately 30 years with no new NVIDIA gaming GPU.

How much will PC prices increase because of the DRAM shortage in 2026?

HP disclosed that memory now accounts for 35% of a PC’s build cost, up from 15–18% the prior period. Dell’s COO cited DRAM at $2.39 per gigabit, up 5.5x over six months. Gartner projects PC average selling prices rising 17% in 2026; IDC projects ASP increases of 4–8%.

What is the AI memory shortage and how does it affect everyday devices?

Semiconductor fabrication lines that previously produced standard DRAM for laptops and phones have been retooled to produce HBM for AI data centre chips. Every wafer allocated to an HBM stack is a wafer not available for the LPDDR5x in a smartphone or the DDR5 in a laptop, creating simultaneous supply shortfalls across all consumer memory types.

Is now a good time to buy a new PC or should I wait?

Gartner advises that memory prices will not stabilise until 2027. If Windows 10 EOL compliance requires a refresh, deferral may not be possible. If it can be deferred, waiting 12–18 months for prices to normalise would reduce hardware costs meaningfully.

Why are smartphone prices rising in 2026?

LPDDR5x — the mobile memory variant — has seen prices rise approximately 90% quarter-on-quarter, the steepest in the technology’s history. Smartphone manufacturers either absorb the cost through margin compression or pass it to buyers through ASP increases of 3–8% (IDC) to 14% (Gartner). Memory companies are prioritising hyperscaler AI orders over smartphone vendor allocations.

What is HBM and why does it matter for the shortage?

High Bandwidth Memory (HBM) is DRAM chips stacked vertically and placed adjacent to the GPU die — delivering very high data transfer rates, essential for AI accelerators. SK Hynix, Samsung, and Micron, who collectively account for more than 90% of global memory production, are diverting fabrication capacity from conventional DRAM to HBM, creating the shortage across all consumer memory types.

Which smartphone brands are safest to buy from during the shortage?

Apple and Samsung are hedged via long-term supply agreements covering 12–24 months in advance. Apple’s 2026 flagship models are expected to hold at 12GB RAM rather than increasing to 16GB, limiting BOM exposure. Budget Android OEMs — TCL, Transsion, Realme, Honor, Oppo, Vivo, Xiaomi — face the worst margin pressure and are most likely to raise prices sharply, reduce memory configurations, or exit lower-end segments.

Will the AI memory crunch affect AI PC availability?

Yes. Microsoft Copilot+ AI PCs require a minimum of 16GB DRAM, and 32GB configurations — recommended for serious AI inference workloads — are significantly more expensive under current pricing. The shortage has raised the price floor for AI PCs at the exact moment the market was building adoption momentum.

How does the Windows 10 end-of-life situation make the memory shortage worse?

Microsoft’s end of Windows 10 support was expected to trigger a large enterprise PC refresh wave in 2026. Instead, memory costs are causing businesses to extend refresh cycles by approximately 15% (Gartner). Organisations that cannot delay face paying peak memory prices for hardware that costs 15–20% more per unit than 12 months ago.

Is the 2026 memory shortage worse than past chip shortages?

In terms of price velocity, yes. DRAM and NAND prices are projected to rise 130% year-on-year by end 2026. The 2026 shortage is distinctive because it is being deliberately sustained by economically rational wafer reallocation decisions — not COVID disruptions or factory fires — making it more persistent and predictable.

Can gaming cards be used for AI inference instead of data centre GPUs?

Yes, and this is already occurring in China, where US export controls restrict access to NVIDIA’s advanced AI chips. Chinese researchers have been using GeForce gaming GPUs for AI training and inference. NVIDIA’s pause therefore has a geopolitical dimension well beyond consumer gaming.

What does the memory shortage mean for server and cloud infrastructure costs?

Server DRAM has risen sharply alongside consumer memory variants. The top four cloud providers are projected to spend $600 billion on AI buildouts in 2026 — 70% more than 2025. For CTOs evaluating on-premises AI infrastructure versus cloud deployment, the full cost framework is covered in the companion article.

How to Run AI Workloads and Manage Infrastructure Costs During a Memory Shortage

DRAM prices have doubled. HBM is sold out through at least 2027. And unless you’ve got the kind of procurement leverage that comes with being a hyperscaler, you’re wearing the full impact. Every infrastructure line item — cloud GPU instances, on-prem servers, endpoint devices — costs more than it did six months ago. And it’s staying that way through the end of 2027.

You need to work two tracks at the same time:

Hardware-side decisions: cloud vs on-prem, procurement tactics, hardware refresh timing
Software-side mitigations: model quantisation, KV cache optimisation, inference architecture

This article walks through both, with a clear recommendation at the end of each section. For the full context of the AI memory shortage, start there.

What is the actual cost impact of the memory shortage on enterprise infrastructure budgets?

Let’s talk numbers. DRAM contract prices surged 90–95% quarter-on-quarter in Q1 2026 — revised upward from an initial 55–60% forecast. DDR5 32GB modules have crossed the $500 mark. Samsung bumped their 32GB DDR5 module prices from $149 to $239 — a 60% increase, with contract pricing surging more than 100%.

That flows through to everything you buy. Enterprise PC and laptop prices are expected to rise 17% in 2026 because memory now makes up 23% of the total bill of materials, up from 16% in 2025. Cloud GPU instance pricing is climbing too — AWS raised EC2 Capacity Block prices for ML instances by around 15% in January 2026.

The shortage hits both CapEx and OpEx. There’s no cost-free path right now. And it hits smaller organisations harder — the hyperscalers lock in pricing through Long-Term Supply Agreements with multi-year, high-volume commitments. If you’re buying on shorter contracts or spot pricing, you’re absorbing the worst of the volatility.

Gartner projects a 130% year-on-year rise in DRAM prices in 2026, with prices not expected to normalise through the end of 2027. Intel CEO Lip-Bu Tan put it bluntly: “There’s no relief until 2028.”

There are actually two distinct shortages happening at once. HBM production eats roughly three times the wafer capacity of standard DRAM per gigabyte, and SK Hynix (~50% market share), Samsung (~40%), and Micron (~10%) have it sold out through at least 2027. Then the DDR5/NAND shortage affects everything else as manufacturers shift capacity toward HBM. For the price data to use in budget modelling and for why AI applications need so much memory, those pieces cover the foundations.

The recommendation: Budget for sustained elevated pricing through 2027. Model your infrastructure spend across cloud GPU instances, on-prem server DRAM, and enterprise endpoint devices — understand your exposure in each.

Cloud or on-prem: which is the better choice during a memory shortage?

Neither is free. Cloud providers are passing HBM cost increases through. On-prem server DRAM has doubled. The right answer depends on five things: your existing infrastructure investment, your team’s operational capability, the type of workload (training vs inference), how much contract flexibility you need, and whether you prefer CapEx or OpEx.

Cloud has some real advantages during the shortage: no upfront capital for hardware that may depreciate faster than expected, the ability to blend spot and reserved instances, and access to alternative silicon like AWS Trainium for compatible workloads. On-prem has its own: predictable costs once you’ve bought the hardware, better economics at sustained high utilisation, and no exposure to future cloud price increases.

Be honest about the TCO comparison. A proper apples-to-apples model needs a 3-year horizon that includes hardware amortisation, staffing costs to manage the infrastructure, power, networking, and opportunity cost. Cloud looks expensive on a per-hour basis but it includes all of that. On-prem looks cheap on paper until you add the hidden costs — data egress fees, idle resource costs, and the people to keep it running. And don’t assume cloud availability is a given — as hyperscalers prioritise scarce GPU capacity, other workloads can experience throttling or degraded performance.

But utilisation makes or breaks on-prem. Nearly half of enterprises waste millions on underutilised GPU capacity. Industry data shows inference GPU utilisation as low as 15–30% without active management. If your GPUs are sitting idle, cloud is cheaper. It really is that simple.

Here are the rough breakpoints:

1–8 GPUs: Cloud wins. The operational overhead of managing a small on-prem cluster usually costs more than the compute itself.
8–64 GPUs, sustained utilisation above 70%: On-prem becomes competitive over a 3-year TCO horizon.
In between or uncertain: Hybrid — cloud for burst, on-prem for baseline. This is where most organisations in the 50–500 employee range end up.

How do reserved instances and spot pricing compare for AI workloads?

Spot instances cut your cloud GPU costs by 60–70% below on-demand pricing, but they can be interrupted. For training: implement checkpoint/resume patterns (saving model state to S3 at regular intervals) and interruptions become manageable. For production inference: use reserved capacity for baseline demand, spot for overflow. Running a mixed fleet across availability zones improves spot availability. Orchestration frameworks like Kubeflow or Ray automate instance selection and failover.

The recommendation: Model the decision using your actual utilisation data. If you don’t have utilisation data, start on cloud until you do.

That covers the hardware-side decisions. But there’s a parallel track that’s faster to implement and doesn’t require any new procurement at all — the software side.

How can you reduce your AI applications’ memory footprint without sacrificing performance?

Software-side mitigation requires no new procurement. You can start today.

Start with quantisation. 4-bit quantisation cuts memory footprint by approximately 75% versus 32-bit models; 8-bit cuts it by approximately 50%. Weight-only quantisation (A16W4) works best for memory-bound workloads with small batch sizes — that’s the typical inference pattern for organisations running their own models. Activation quantisation (A8W8) suits high-throughput serving with large batch sizes.

Dropbox Dash is a good reference point. Their engineering team deployed FP8, INT8, and INT4 quantisation along with KV cache quantisation across their AI product, and they documented concrete cost and performance results. Worth looking at if you want to see what this looks like in production rather than just benchmarks.

Which quantisation format should you try first?

For models under 13B parameters on workstation GPUs: start with INT4 weight-only. Larger models on data centre GPUs (A100, H100): FP8 preserves more accuracy while cutting memory roughly in half. INT8 is a solid general-purpose choice. Newer formats like MXFP and NVFP4 are worth watching as hardware support broadens. The accuracy tradeoff depends on the task — summarisation tolerates quantisation well, precise numerical reasoning less so. Test on your actual workload.

What is the KV cache and how does it drive inference costs?

The KV cache is the model’s short-term memory — it stores intermediate attention results during inference so the model doesn’t have to recompute them. Every new token needs to “look back” to every previous token. Longer context windows mean proportionally larger KV caches. If you’re running a 128K context window when 16K would do, you’re burning memory for nothing.

PagedAttention, implemented in vLLM, treats GPU memory like virtual memory — it eliminates fragmentation and enables prefix sharing across requests. If you’re running LLM inference in production without vLLM or something similar, that’s your next move.

The concept tying all of this together is the memory wall. As Sha Rabii, co-founder of Majestic Labs, puts it: “Your performance is limited by the amount of memory and the speed of the memory that you have, and if you keep adding more GPUs, it’s not a win.” Majestic Labs is building a 128TB inference system that deliberately avoids HBM — showing that memory-efficient architecture is a legitimate design direction, not just a workaround. For the technical foundation: memory wall and KV cache, that piece covers why AI applications need so much memory and why adding more GPUs alone doesn’t solve the problem.

The memory wall is also why the remaining techniques matter. Speculative decoding uses a smaller draft model to generate candidate tokens that the main model verifies in parallel — you get better throughput without a proportional memory increase. Disaggregated serving separates the prefill phase from the decode phase, running each on different hardware tiers. Red Hat‘s llm-d project implements this and reports 30–50% cost reductions. Knowledge distillation trains permanently smaller models from larger ones — it takes weeks to months, but the results are durable.

Rank by implementation effort: quantisation (days), vLLM/PagedAttention (framework selection), speculative decoding (configuration tuning), disaggregated serving (architecture change), distillation (medium-term training investment). Start at the top.

The recommendation: Implement quantisation this week. Evaluate vLLM this month. Queue the rest after you’ve captured the quick wins.

What procurement tactics are available for organisations that cannot match hyperscaler purchasing power?

You can’t get the Long-Term Supply Agreements the hyperscalers use. That’s a market structure issue, not a negotiation skill gap. Samsung recently signed a “non-cancellable, non-returnable” contract with a key server customer — supply earmarked for that customer provides zero relief for the rest of the market. These agreements require volume commitments and multi-year horizons that simply aren’t available at smaller scales.

Here’s the tactical checklist.

Multi-vendor sourcing. Engage SK Hynix, Samsung, and Micron (or their distributors) at the same time. Competitive tension between suppliers improves pricing even at modest volumes. You won’t get hyperscaler terms, but you can avoid being captive to a single vendor’s allocation decisions.

Pre-load inventory. If you know you need hardware in the next 12 months, buying now locks in current pricing. Deloitte projects further DRAM price increases of ~50% are possible. Gartner’s Ranjit Atwal puts it directly: “Buy now, or wait, because whatever you’re getting at the moment is going to be the best price.”

Work with procurement aggregators who combine demand from multiple organisations to negotiate enterprise-tier pricing. This is the closest thing you’ll get to hyperscaler leverage.

Lock extended quote validity. Vendors are guaranteeing prices for only two or three weeks. Push for 60–90 day windows.

Choose contract duration carefully. Six-month contracts preserve flexibility but offer no price protection. Annual contracts provide modest discounts but risk overpayment if prices stabilise faster than expected. Match your commitment length to your business certainty, not to the vendor’s sales cycle. Cloud GPU spot instances also work as a hedge — if on-prem hardware becomes uneconomical, spot capacity gives you a fallback without long-term commitment.

The recommendation: Source from multiple vendors, pre-load known-needed hardware, engage a procurement aggregator. For the price data to use in budget modelling, that piece has the specific numbers.

Should you delay your hardware refresh cycle, and how do you decide?

15% of organisations are extending PC refresh timelines. But deferral doesn’t save money if prices keep rising. And OEMs are reducing default RAM and SSD configurations to manage headline pricing — so you might end up paying the same price for a lower-spec device later.

Windows 10 End-of-Life (October 2025) adds a forcing function for compliance-related endpoints. Windows 11 has higher RAM and SSD requirements, and that’s hitting at exactly the wrong time. If you’re thinking about AI PCs, those need a minimum of 16GB RAM (the Microsoft Copilot+ PC spec), making memory even more of a budget factor.

The right approach is category-by-category, not all-or-nothing.

Accelerate refresh for compliance-related endpoints — anything still on Windows 10 that handles sensitive data or faces regulatory requirements. Lock pricing on approved quotes immediately. Accelerate developer workstations too — these are productivity multipliers, and skimping on developer hardware costs you more in lost output than you save on the purchase price.

Extend general office hardware 6–12 months. Standard office endpoints running Windows 11 can wait. There’s no functional reason to replace them at inflated prices if they’re doing the job. Deprioritise meeting room hardware and shared devices entirely.

For AI infrastructure specifically, fab expansion timelines suggest meaningful relief is unlikely before Q4 2027. If the workload can stay on cloud in the interim, that’s the safer path.

The recommendation: Prioritise by category. Lock pricing on compliance and developer hardware. Extend everything else. Budget for 15–20% cost inflation on all endpoint hardware through 2027.

How do you build a memory cost scenario into your 2026 and 2027 infrastructure budget?

Point estimates won’t hold. IDC‘s December 2025 “most pessimistic scenario” was already exceeded by February 2026 — the speed at which pricing has moved has shocked everybody, including the analysts. Model three scenarios instead.

Base case: prices plateau at current levels. DRAM stays at roughly current pricing. Cloud GPU instances hold at current rates. This is your planning floor.

Downside case: further increases through 2027. Supply continues tightening and prices climb another 30–50%. DRAM capex from memory makers is expected to rise only 14% in 2026, and NAND capex only 5% — the manufacturers aren’t rushing to fix the shortage. Stress-test against it.

Upside case: relief begins Q4 2027. New fab capacity from SK Hynix (Cheongju, 2027), Micron (Singapore, 2027; Idaho, 2027–2028), and Samsung (Pyeongtaek, 2028) begins delivering. Prices start declining in early 2028. This is your planning horizon for deferred purchases.

Project costs across all three budget categories for each scenario: cloud AI compute, on-prem server hardware, and enterprise endpoint refresh. The practical allocation: commit 70% of budget against the base case, reserve 20% as downside contingency, and identify which purchases can be deferred to capture the upside.

Focus your optimisation effort where it counts. Inference accounts for 80–90% of total AI spend. Every dollar saved through quantisation or KV cache optimisation reduces your exposure across all three scenarios. Build ongoing visibility through FinOps practices — track cost-per-inference as your primary unit metric. Tools like OpenCost give you real-time cost data across namespaces, pods, and nodes, so you can measure the actual impact of each optimisation rather than guessing.

The recommendation: Build all three scenarios into your infrastructure budget. Allocate against the base, reserve for downside, identify deferral candidates for upside. Track cost-per-inference quarterly. Revisit every quarter — conditions are moving faster than the forecasts.

For a complete overview of the memory supply crisis driving these costs — including the root causes, market dynamics, and what comes next — see the full context of the AI memory shortage.

FAQ

How much does model quantisation actually reduce memory usage?

4-bit quantisation cuts your memory footprint by roughly 75% compared to FP32 models. 8-bit brings it down by about 50%. These ratios hold across most transformer-based LLMs. If you’re running typical inference workloads with small batch sizes, weight-only quantisation (A16W4) is the place to start.

Is it cheaper to run AI workloads on cloud or on-premises hardware right now?

It depends on scale and utilisation. For 1–8 GPUs, cloud is usually the better deal. At 8–64 GPUs with sustained utilisation above 70%, on-prem starts to make sense over a 3-year TCO horizon. Neither option escapes the memory cost inflation.

What is the KV cache and why does it matter for AI infrastructure costs?

The KV cache stores intermediate attention computation results during LLM inference so the model doesn’t have to re-compute context for every generated token. Longer context windows need proportionally larger KV caches, which directly increases your memory demand and cost.

How long will the memory shortage last?

New fab capacity from SK Hynix, Samsung, and Micron won’t deliver meaningful supply relief before Q4 2027 at the earliest. Budget for sustained elevated pricing through at least 2027 — meaningful price declines aren’t likely before early 2028.

What is vLLM and should my organisation use it?

vLLM is an open-source LLM inference framework that implements PagedAttention, continuous batching, speculative decoding, and quantisation support. It’s become the de facto standard for production LLM serving. If you’re running LLM inference in production, vLLM should be your first evaluation point.

Can spot instances reliably handle production AI workloads?

Spot instances cut cloud GPU costs by 60–70% but they can be interrupted. They work well for training with checkpoint/resume patterns. For production inference, use reserved capacity for your baseline demand and spot for overflow. A mixed fleet across availability zones improves spot availability.

What is AWS Trainium and is it a viable alternative to NVIDIA GPUs?

AWS Trainium (Trn1/Trn2 instances) offers meaningful cost reductions compared to P5 (H100) instances for compatible workloads. It requires the AWS Neuron SDK and isn’t universally compatible. Worth evaluating if your workloads run on AWS and you can tolerate some framework lock-in.

Should I accelerate or delay enterprise PC purchases given current memory prices?

Accelerate for compliance-related endpoints and developer workstations. Extend general office hardware by 6–12 months. OEMs are reducing default configurations to manage pricing, so deferred purchases may get you lower-spec devices at similar prices.

What is disaggregated serving and how does it reduce AI infrastructure costs?

Disaggregated serving separates the prefill phase (processing input) from the decode phase (generating output). Prefill runs on high-performance GPUs while decode runs on cheaper hardware. The result is 30–50% infrastructure cost reductions by matching hardware to what each phase actually needs.

How do I track whether my AI cost optimisation efforts are working?

Start with cost-per-inference as your primary metric. Layer in GPU utilisation tracking and cloud spend breakdowns by workload type. Measure each optimisation individually so you know what’s actually moving the needle. OpenCost and cloud provider tools handle the instrumentation.

What is the memory wall and why does it matter for AI scaling?

The memory wall is the point where adding more GPUs without proportional memory bandwidth doesn’t actually improve performance. That’s why software-side optimisations like quantisation and KV cache management are structurally valuable — they reduce memory demand rather than throwing more hardware at the problem.

Samsung SK Hynix Micron and the Hyperscalers Who Locked Up All the Memory

Three companies make all the world’s high-bandwidth memory. Samsung, SK Hynix, and Micron. That’s it. HBM is the specialised DRAM that every AI accelerator needs, and the big cloud providers — AWS, Google, Microsoft, Meta — have used long-term supply agreements to lock in capacity well before production runs even start. If you’re not one of them, you get the spot market. Volatile pricing, no guarantees.

The supply was spoken for before most buyers even knew there was a shortage.

This article maps out where the three memory makers stand, explains how those long-term agreements structurally favour hyperscalers, and shows why AI inference demand keeps the pressure building. For the full picture of the AI memory shortage, start with our pillar overview.

Who controls global HBM production and why does it matter?

There are only three companies producing HBM at scale. As of Q2 2025, SK Hynix held 62% of the HBM market, Micron held 21%, and Samsung held 17%. There is no fourth player waiting in the wings.

So what actually is HBM? It’s a stacked DRAM architecture — up to 12 thinned-down DRAM dies piled on top of each other and connected by through-silicon vias (TSVs), sitting within a millimetre of a GPU. It costs roughly three times as much as conventional DRAM per bit, and SemiAnalysis estimates it accounts for 50 percent or more of the cost of a packaged GPU. That makes HBM the single biggest cost item in AI accelerator hardware.

Every major AI GPU needs it. Nvidia’s Blackwell B300 uses eight 12-die HBM3E stacks. AMD’s MI350 uses the same setup. Nvidia’s next-generation Rubin GPU comes with up to 288 gigabytes of HBM4. HBM isn’t optional — it’s the bottleneck component.

And the oligopoly isn’t going anywhere. New fabs cost $15 billion or more, take years to build, and require decades of process expertise. Nobody new is walking into this market to put downward pressure on prices. For the technical case for HBM production economics, see our companion article on the supply chain.

How did SK Hynix overtake Samsung at the top of the HBM market?

Samsung was the world’s largest memory maker by volume. Then SK Hynix beat it in annual operating profit for the first time in 2025 — 47.2 trillion won versus Samsung’s 43.6 trillion won. Now, that comparison needs some context. Samsung is a diversified conglomerate with fingers in consumer electronics, foundry services, and displays. But SK Hynix, focused purely on memory, outearned Samsung’s entire company.

The technical reason boils down to packaging. SK Hynix uses MR-MUF (mass reflow molded underfill), which handles heat better when you’re dealing with that layer cake of stacked silicon. Samsung’s NCF (Non-Conductive Film) approach ran into quality and yield problems with HBM3E. When you’re stacking 12 dies on top of each other, thermal management and yield rates are what make or break commercial viability.

SK Hynix turned that process advantage into a deep relationship with Nvidia, securing over two-thirds of HBM supply orders for Nvidia’s next-generation Vera Rubin products. Samsung still holds around 60 percent of the HBM on Google’s TPUs, but that’s a narrower position to be in.

Samsung is fighting back with its Pyeongtaek P5 fab, but mass production will not begin until 2028. As Counterpoint Research director MS Hwang put it: “We expect Samsung to show a significant turnaround with HBM4 for Nvidia’s new products, moving past last year’s quality issues.”

Why did Micron abandon consumer memory for AI?

While SK Hynix and Samsung fight it out for the top spot, Micron made a completely different kind of bet. It restructured its entire business around AI.

Micron’s HBM and cloud memory revenue grew from 17 percent of its DRAM revenue in 2023 to nearly 50 percent in 2025. Then in December 2025, the company discontinued its consumer-facing Crucial brand and walked away from consumer DRAM entirely. All wafer capacity redirected to AI and enterprise memory.

Think about what that signals. A company making good money selling memory to PC builders decided the AI opportunity was so big and so permanent that consumer memory wasn’t worth the wafer space anymore. CEO Sanjay Mehrotra projects the HBM market growing from $35 billion in 2025 to $100 billion by 2028 — hitting that number two years earlier than Micron had previously expected. For how to manage infrastructure costs when supply is this constrained, see our actionable guide for SMB CTOs.

And it’s still not enough. As Micron business chief Sumit Sadana put it: “We’re sold out for 2026.” Even with all that redirected capacity, the company can only meet two-thirds of medium-term requirements for some customers.

How do hyperscalers lock up memory supply before anyone else can access it?

Long-term supply agreements (LTAs) are multi-year capacity reservation contracts. A hyperscaler commits capital 12 to 24 months in advance to lock in guaranteed memory allocation from a manufacturer. The memory is spoken for before it’s even produced.

How seriously do they take this? Google reportedly fired a procurement executive who failed to sign LTAs ahead of the shortage. When Microsoft executives visited SK Hynix headquarters and were told their conditions would be “difficult” to meet, one of them stormed out of the meeting.

Hyperscalers now station dedicated procurement executives in South Korea. Google posted a job for a “Global Memory Commodity Manager.” Meta is hiring “Memory Silicon Global Sourcing Managers.” This isn’t routine purchasing — it’s a strategic function.

The mechanism is straightforward: hyperscale cloud providers secure supply through long-term commitments, capacity reservations, and direct fab investments, getting lower costs and guaranteed availability. Everyone else fights over what’s left.

You can’t replicate that level of capital pre-commitment, volume guarantee, or dedicated procurement infrastructure. As semiconductor analyst Manish Rawat at TechInsights noted: “This imbalance creates a dual constraint for the mid-market: higher input costs and longer delivery timelines.” For what this means for enterprises who cannot match hyperscaler LTAs, see our article on managing infrastructure costs.

Why does AI inference keep growing memory demand?

AI training requires a temporary burst of compute and memory. AI inference — serving users in real time — requires continuous memory allocation that grows with every new user and every new deployment.

During inference, a large language model stores its state in the key-value cache (KV cache). Think of it as the model’s working memory. It sits in HBM during active sessions and gets pushed to slower system memory when idle. Storing these precomputed caches reduces the compute required for extended sessions, but it chews through memory.

Demand compounds through three factors: concurrent users, context window length, and number of deployed models. As hyperscalers roll out more AI models to more users with longer context windows, their HBM appetite grows continuously. And as AI moves towards agent-based systems, those agents need frequent access to extensive vector databases for retrieval-augmented generation (RAG), adding yet another persistent demand driver.

Hyperscaler LTA volumes renew and grow with each deployment cycle. Even if fab capacity increases, demand may keep outpacing supply for the foreseeable planning horizon.

What does fab expansion mean for supply relief timing?

All three memory makers are building new capacity and spreading it across multiple countries — geographic diversification that doubles as a geopolitical hedge.

SK Hynix has an HBM fab in Cheongju, South Korea, targeted for 2027, and a facility in West Lafayette, Indiana, for 2028. Samsung’s Pyeongtaek P5 targets 2028. Micron is building an HBM fab in Singapore for 2027, retooling a fab in Taiwan for second-half 2027, and broke ground on a DRAM fab complex in New York that won’t hit full production until 2030.

Multiple industry sources point to 2028 as the earliest date for meaningful supply relief. As Intel CEO Lip-Bu Tan put it: “There’s no relief until 2028.”

Multiple industry sources point to 2028 as the consensus date for meaningful supply relief — and even then, hyperscalers with existing LTAs will have priority access to new capacity before it reaches the open market.

What does the memory oligopoly mean for companies buying on the open market?

With LTA-secured supply going to hyperscalers, everyone else is on the spot market. And the prices tell the story. Samsung raised prices for 32GB DDR5 modules to $239 from $149 — a 60% increase. DDR5 contract pricing has surged past $19.50 per unit, up from around $7 earlier in 2025.

Gartner forecasts a 47% DRAM price increase in 2026. IDC expects only 16% year-on-year DRAM supply growth — well below demand. IDC puts it plainly: this is “a potentially permanent, strategic reallocation of the world’s silicon wafer capacity” — not a temporary pricing spike.

The question for your organisation is how to plan procurement, budgets, and infrastructure strategy around a supply chain where hyperscalers hold structural priority. For strategies for managing infrastructure costs during the shortage, see our actionable guide. For whether Chinese domestic DRAM changes the competitive picture, we cover the geopolitical factors separately. And for the complete view, read our deep-dive into the memory crisis.

FAQ Section

What is HBM and why is it different from regular server memory?

HBM uses 3D die stacking — up to 12 thinned DRAM dies connected by through-silicon vias — to deliver far higher bandwidth than DDR5. It costs about 3x more per bit to produce and accounts for over half the cost of a packaged AI GPU.

Why are there only three companies that make memory chips?

Building a memory fab costs upwards of $15 billion, takes years, and demands decades of process learning. Samsung, SK Hynix, and Micron collectively hold 100% of the HBM market. Chinese manufacturers like CXMT are expanding but still face large technology gaps and export control restrictions.

What is a long-term supply agreement (LTA) in the memory market?

An LTA is a capacity reservation contract where a buyer pre-commits capital — typically 12 to 24 months ahead — to guarantee memory allocation directly from a manufacturer. Hyperscalers use LTAs to secure supply before it reaches the open market.

How much has server memory pricing increased in 2026?

DDR5 contract pricing went from about $7 to $19.50 per unit. Samsung raised 32GB DDR5 module prices from $149 to $239. TrendForce estimates Q1 2026 DRAM contracts surged 90-95% quarter-on-quarter, and Gartner forecasts a 47% overall DRAM price increase for the year.

Why did SK Hynix overtake Samsung in HBM market share?

SK Hynix’s MR-MUF packaging technology delivers better thermal performance and higher yields than Samsung’s NCF approach. That process advantage, combined with early and deep supply contracts with Nvidia, gave SK Hynix roughly 62% of HBM market share versus Samsung’s 17%.

What happened to Micron’s consumer memory products?

Micron shut down its Crucial consumer brand in December 2025 and redirected all wafer capacity to AI and enterprise memory. AI-related DRAM revenue went from 17% to nearly 50% of total DRAM revenue between 2023 and 2025, and the company is fully committed for 2026.

When will the memory shortage end?

Industry consensus points to 2028 at the earliest. New fabs from all three manufacturers — SK Hynix in Cheongju (2027) and West Lafayette (2028), Samsung in Pyeongtaek (2028), Micron in Singapore and Taiwan (2027) — will take years to reach full output. Intel CEO Lip-Bu Tan summed it up: “There’s no relief until 2028.”

What is the KV cache and why does it increase memory demand?

It’s an AI model’s working memory during inference. Precomputed attention states are held in HBM while a session is active. Memory requirements scale with concurrent users, context window length, and the number of deployed models — so demand compounds rather than levelling off.

How much are hyperscalers spending on AI infrastructure in 2026?

TrendForce projects combined CapEx of the top eight cloud service providers at over $710 billion in 2026. A large share of that flows directly into memory procurement through LTAs with Samsung, SK Hynix, and Micron.

Can smaller companies negotiate the same memory supply terms as hyperscalers?

No. LTAs require the kind of capital pre-commitment, volume, and dedicated procurement staff that mid-market and SMB companies simply don’t have. Hyperscalers station procurement executives at memory fabs and have created roles like “Memory Silicon Global Sourcing Manager” — that level of investment is out of reach for smaller organisations.

Will Chinese DRAM manufacturers change the competitive landscape?

Chinese production — notably CXMT — is growing, but it faces significant HBM technology gaps and export control constraints. Whether Chinese manufacturers can meaningfully change the three-player oligopoly is covered in our companion article on geopolitical factors shaping memory supply.

How does AI inference differ from AI training in memory demand?

Training uses a temporary burst of memory and compute. Inference runs continuously, serving concurrent users with expanding context windows, which creates compounding memory demand that grows with each new deployment rather than reaching a fixed ceiling.

DRAM Prices in 2026 Have Doubled and the Numbers Are Getting Worse

Six months ago, DRAM cost $0.43 per gigabit. Dell’s COO says it recently hit $2.39. That’s a 5.5x increase in half a year, and it shows no sign of letting up.

This isn’t a single-product blip. DRAM, LPDDR5x, and NAND flash are all getting hammered with 90–95% quarter-on-quarter price surges in Q1 2026. The structural driver is AI infrastructure — and that’s what makes this fundamentally different from the 2021 COVID-era shortage and much harder to plan around.

We’ve pulled together every major analyst forecast on memory pricing in one place. Bookmark it, send it to your stakeholders. For the broader memory shortage story, we’ve put together a full overview of how AI’s appetite for memory is reshaping the tech industry.

How much have DRAM prices actually increased in 2026?

The headline number: TrendForce’s February 2026 survey revised its Q1 2026 forecast for conventional DRAM contract prices to 90–95% quarter-on-quarter. That’s the largest quarterly increase on record. Their earlier January estimate was 55–60% QoQ — the forecast nearly doubled within weeks.

PC DRAM specifically is tracking over 100% QoQ increases in some contract categories.

And it’s not just conventional DRAM. LPDDR5x contract prices surged roughly 90% QoQ — what TrendForce calls “the steepest increases in their history.” NAND flash prices rose 55–60% QoQ, with enterprise SSD pricing up 53–58% QoQ. That’s three simultaneous price shocks across different memory types, all landing at once.

Gartner’s research director Ranjit Atwal put it plainly: “The speed at which the memory pricing has increased has shocked everybody.” Gartner projects a 130% year-on-year DRAM price rise for 2026.

Here’s where things get genuinely strange. Counterpoint Research reports that DDR4 spot prices have hit $2.10 per gigabit — which now exceeds advanced HBM3e at $1.70 per gigabit. Old memory costs more than the cutting-edge stuff. That’s a legacy price inversion, and it tells you just how distorted this market has become.

The gap between contract and spot pricing matters here too. Enterprise buyers locked into quarterly contracts have some insulation. Those buying on the spot market are wearing the full brunt. If your procurement is on spot, you’re paying more for DDR4 than hyperscalers pay for HBM3e.

For why prices rose so sharply — the production mechanics — we’ve covered that in detail separately.

What does the full memory market revenue picture look like?

TrendForce projects the global memory market at $551.6 billion in 2026, rising to $842.7 billion in 2027. DRAM revenue alone is up 144% year-on-year, hitting $404.3 billion. NAND flash revenue is up 112% YoY at $147.3 billion.

Those numbers tell a specific story — revenue is surging because of price, not because more memory is shipping to consumer markets. The volume going to PCs and smartphones is actually shrinking. It’s the price per unit that’s driving everything.

Samsung regained the number one DRAM revenue position in Q4 2025. SK Hynix leads in HBM production. Micron’s HBM and cloud-related memory revenue grew from 17% of its DRAM revenue in 2023 to nearly 50% in 2025. Micron CEO Sanjay Mehrotra projects the HBM market will grow from $35 billion in 2025 to $100 billion by 2028.

Here’s the thing to keep in mind: what’s profit for Samsung, SK Hynix, and Micron is cost for everyone else. Every record revenue quarter for memory makers is a record bill-of-materials quarter for OEMs.

What does this mean for PC and smartphone prices?

When IDC, Gartner, and Counterpoint Research all converge on the same direction, the signal is pretty clear: the PC and smartphone markets are contracting.

IDC projects the PC market will shrink 11.3% in 2026. Gartner forecasts 10.4%. For smartphones, Counterpoint Research projects shipments down 12% — “the sharpest decline on record.” IDC has smartphones down 12.9%; Gartner at –8.4%.

HP reported in its Q1 earnings call that memory now accounts for 35% of the cost to build a PC, up from 15–18% the previous quarter. That makes memory the single largest cost driver in PC manufacturing. Let that sink in — one component, 35% of the build cost.

At the product level, Lenovo, Dell, HP, Acer, and ASUS have all flagged 15–20% price hikes and are shortening quote validity windows. Some OEMs are shipping lower-SSD configurations just to keep headline prices down — less storage for more money.

The timing couldn’t be worse for enterprise IT. Windows 10 end-of-life in October 2025 triggered refresh cycles that are now colliding head-on with the shortage. And AI PCs require a minimum 16GB RAM — that’s Microsoft’s Copilot+ standard — so devices need more memory at precisely the moment memory is most expensive.

Not everyone is equally exposed, though. Apple and Samsung have long-term supply agreements that secure memory 12–24 months ahead. Budget Android OEMs don’t have that luxury and have no choice but to pass costs straight to consumers.

For a deeper look at how these price increases are hitting the PC and smartphone markets, we’ve covered the device-level impact separately.

How does 2026 compare to the COVID-era chip shortage?

The 2021 shortage was a supply-side disruption. Factories closed, logistics chains broke, and pandemic demand for work-from-home gear spiked. Hyperscalers bought up huge memory inventories, prices surged, then supply normalised.

The 2026 shortage is demand-driven. AI infrastructure demand for HBM is consuming wafer capacity that would otherwise produce conventional DRAM and NAND. That distinction matters: supply disruptions resolve when logistics normalise. Demand-driven shortages persist as long as the demand source keeps growing — and AI demand isn’t slowing down.

The setup traces back to the 2022–2023 memory bust. Samsung cut production by roughly 50%. After that bust, all three major memory companies pulled back on investment. IEEE Spectrum’s Thomas Coughlin put it bluntly: “There was little or no investment in new production capacity in 2024 and through most of 2025.”

Then the AI demand wave hit, and there was simply no spare capacity to absorb it.

By the numbers, 2026 is worse. Q1 2026 is delivering 90–95% QoQ increases — there’s no comparable single-quarter spike in the historical record. IDC characterises the situation as “not just a cyclical shortage driven by a mismatch in supply and demand, but a potentially permanent, strategic reallocation of the world’s silicon wafer capacity.”

McKinsey projects $7 trillion in data centre spending through 2030, with $5.2 trillion AI-focused. That demand concentration is why AI is driving this price surge.

What do analyst forecasts say about the memory market in 2026 and 2027?

Here are the data points from each firm that we haven’t covered above.

IDC: DRAM supply growth in 2026 is projected at just 16% YoY and NAND at 17% YoY — well below historical norms. IDC uses the phrase “structural reset” to describe what’s happening. That’s not the kind of language you want to hear from analysts.

Gartner’s Atwal: “Price is not only increasing in the short term… it’s going to remain high almost through to the end of 2027.”

Counterpoint Research places the earliest inflection point at Q4 2027.

Intel CEO Lip-Bu Tan, speaking at the Cisco AI Summit, was blunter: “There’s no relief until 2028.”

The fab construction timelines explain why. New capacity is coming — Micron Singapore (2027), SK Hynix Cheongju (2027), Samsung Pyeongtaek (2028), SK Hynix West Lafayette (end of 2028), Micron’s Onondaga County megafab (full production 2030). But building a fab and getting it to volume production takes 18 months or more. That structural lag is why new capacity can’t relieve the current shortage quickly.

For the full timeline analysis of when relief arrives, we cover the fab-by-fab breakdown in detail.

Is the shortage going to get worse before it gets better?

Yes. The data points in one direction for the next 18 months.

The only near-term supply lever is incremental process node upgrades within existing fabs. TrendForce notes that “gains in output can be achieved only through incremental process upgrades, making short-term capacity tightness difficult to resolve.” And it gets worse — NAND flash manufacturers are reallocating production lines to DRAM because it’s more profitable, which further squeezes NAND capacity. The two shortages are feeding each other.

Meanwhile, demand keeps climbing. NVIDIA’s B300 GPU uses eight HBM chips, each containing 12 DRAM dies. AMD’s MI350 follows the same architecture. Every new GPU generation ratchets up per-unit memory consumption.

There’s also a supply lock-up problem. North American hyperscalers — Meta, Google, Microsoft, Amazon — are negotiating long-term DRAM agreements with memory suppliers. Counterpoint Research’s Tarun Pathak put it directly: “A lot of these memory companies are asking smartphone vendors to stand in line behind the hyperscalers.”

And here’s the part that matters for your budget planning: memory prices historically fall more slowly than they rise. Economist Mina Kim of Mkecon Insights told IEEE Spectrum: “In general, economists find that prices come down much more slowly and reluctantly than they go up. DRAM today is unlikely to be an exception.”

Where can you track memory prices and analyst forecasts?

Given the outlook, you’re going to want to keep tabs on this. Here are the primary sources we’ve referenced throughout.

TrendForce (trendforce.com) publishes quarterly DRAM and NAND contract price forecasts through its DRAMeXchange platform. Their February 2, 2026 press release is the definitive Q1 2026 pricing source.

IDC (idc.com) publishes quarterly PC and smartphone market trackers. Their December 2025 “Global Memory Shortage Crisis” report is the key public reference document.

Counterpoint Research (counterpointresearch.com) covers smartphone and memory market analysis with public-facing research summaries.

Gartner (gartner.com) publishes enterprise IT spending and semiconductor forecasts. The February 2026 data on shipment declines was reported through Computerworld.

IEEE Spectrum (spectrum.ieee.org) provides technical analysis and expert commentary. Samuel K. Moore’s “How and When the Memory Chip Shortage Will End” is the most comprehensive single source for the COVID-era comparison and fab timeline data.

NPI Financial (npi.ai) offers enterprise IT procurement advisory with pricing guidance for hardware buyers navigating the shortage.

For a complete picture of what the AI memory shortage means for tech companies — the root causes, who is winning and losing, and what to do about it — see the broader memory shortage story.

FAQ Section

Why is RAM so expensive right now in 2026?

AI infrastructure demand for High-Bandwidth Memory (HBM) is eating up wafer capacity at Samsung, SK Hynix, and Micron — capacity that would otherwise produce the conventional DRAM that goes into PCs and smartphones. TrendForce reports DRAM prices surged 90–95% QoQ in Q1 2026 as a result.

Is the memory shortage going to get worse in 2026?

Yes. Counterpoint Research puts the earliest inflection point at Q4 2027. Intel CEO Lip-Bu Tan stated “there’s no relief until 2028.” New fab capacity from Micron and SK Hynix doesn’t reach volume production until 2027 at the earliest.

When will DRAM prices go back down?

Analyst consensus points to gradual relief beginning in late 2027 as new fab capacity comes online. But memory prices historically fall more slowly than they rise, so normalisation will lag behind supply additions. Gartner expects prices to remain high through to the end of 2027.

Why is my new laptop or phone going to cost more this year?

Memory is now the single largest cost component in many devices. HP reports memory represents 35% of PC bill of materials. With DRAM prices up 90–95% and NAND up 55–60%, OEMs including Lenovo, Acer, and ASUS have warned of 15–20% device price increases.

How does the 2026 memory shortage compare to previous semiconductor shortages?

The 2021 COVID-era shortage was supply-side — factory closures, logistics breakdowns, pandemic-driven demand. The 2026 shortage is demand-driven — AI infrastructure is consuming wafer capacity. Demand-side shortages are harder to resolve because the demand source keeps growing.

What is HBM and why is it causing a DRAM shortage?

HBM (High-Bandwidth Memory) stacks multiple DRAM dies vertically. Each NVIDIA B300 GPU requires eight HBM chips with 12 DRAM dies each. Manufacturing HBM consumes the same wafer capacity as conventional DRAM — every wafer allocated to HBM is a wafer that doesn’t go to a smartphone or laptop.

How long is the AI chip shortage going to last?

Based on fab construction timelines, meaningful new capacity arrives in 2027–2028 (Micron Singapore, SK Hynix Cheongju, Samsung Pyeongtaek). Full-scale production at major new fabs like Micron’s Onondaga County facility isn’t expected until 2030.

What is the difference between contract and spot DRAM pricing?

Contract prices are negotiated quarterly between memory suppliers and large buyers through long-term agreements (LTAs). Spot prices reflect what’s available on the open market. During shortages, spot prices spike faster and higher. Counterpoint Research noted DDR4 spot prices ($2.10/Gbit) now exceed HBM3e contract prices ($1.70/Gbit).

How much has the global memory market grown in 2026?

TrendForce projects the global memory market at $551.6 billion in 2026, with DRAM revenue up 144% year-on-year. The market is forecast to reach $842.7 billion in 2027.

Will cloud computing costs increase because of the memory shortage?

Cloud service providers are themselves major memory buyers — North American hyperscalers are negotiating LTAs as of January 2026 — so memory cost inflation is baked into their infrastructure costs. Whether AWS, Azure, and GCP pass those costs through directly is an open question worth keeping an eye on.

Should enterprises buy server hardware now or wait for prices to drop?

NPI Financial advises locking in pricing early, as OEMs are shortening quote validity windows. Gartner’s Atwal recommends buying now: “Whatever you’re getting at the moment is going to be the best price.” No meaningful relief is expected before late 2027.

What does NAND flash pricing look like alongside DRAM?

NAND flash prices rose 55–60% QoQ in Q1 2026, with enterprise SSD pricing up 53–58% QoQ per TrendForce. NAND market revenue is up 112% YoY. And NAND manufacturers are reallocating production lines to DRAM because it’s more profitable, which further limits NAND capacity.