Three companies make all the world’s high-bandwidth memory. Samsung, SK Hynix, and Micron. That’s it. HBM is the specialised DRAM that every AI accelerator needs, and the big cloud providers — AWS, Google, Microsoft, Meta — have used long-term supply agreements to lock in capacity well before production runs even start. If you’re not one of them, you get the spot market. Volatile pricing, no guarantees.
The supply was spoken for before most buyers even knew there was a shortage.
This article maps out where the three memory makers stand, explains how those long-term agreements structurally favour hyperscalers, and shows why AI inference demand keeps the pressure building. For the full picture of the AI memory shortage, start with our pillar overview.
Who controls global HBM production and why does it matter?
There are only three companies producing HBM at scale. As of Q2 2025, SK Hynix held 62% of the HBM market, Micron held 21%, and Samsung held 17%. There is no fourth player waiting in the wings.
So what actually is HBM? It’s a stacked DRAM architecture — up to 12 thinned-down DRAM dies piled on top of each other and connected by through-silicon vias (TSVs), sitting within a millimetre of a GPU. It costs roughly three times as much as conventional DRAM per bit, and SemiAnalysis estimates it accounts for 50 percent or more of the cost of a packaged GPU. That makes HBM the single biggest cost item in AI accelerator hardware.
Every major AI GPU needs it. Nvidia’s Blackwell B300 uses eight 12-die HBM3E stacks. AMD’s MI350 uses the same setup. Nvidia’s next-generation Rubin GPU comes with up to 288 gigabytes of HBM4. HBM isn’t optional — it’s the bottleneck component.
And the oligopoly isn’t going anywhere. New fabs cost $15 billion or more, take years to build, and require decades of process expertise. Nobody new is walking into this market to put downward pressure on prices. For the technical case for HBM production economics, see our companion article on the supply chain.
How did SK Hynix overtake Samsung at the top of the HBM market?
Samsung was the world’s largest memory maker by volume. Then SK Hynix beat it in annual operating profit for the first time in 2025 — 47.2 trillion won versus Samsung’s 43.6 trillion won. Now, that comparison needs some context. Samsung is a diversified conglomerate with fingers in consumer electronics, foundry services, and displays. But SK Hynix, focused purely on memory, outearned Samsung’s entire company.
The technical reason boils down to packaging. SK Hynix uses MR-MUF (mass reflow molded underfill), which handles heat better when you’re dealing with that layer cake of stacked silicon. Samsung’s NCF (Non-Conductive Film) approach ran into quality and yield problems with HBM3E. When you’re stacking 12 dies on top of each other, thermal management and yield rates are what make or break commercial viability.
SK Hynix turned that process advantage into a deep relationship with Nvidia, securing over two-thirds of HBM supply orders for Nvidia’s next-generation Vera Rubin products. Samsung still holds around 60 percent of the HBM on Google’s TPUs, but that’s a narrower position to be in.
Samsung is fighting back with its Pyeongtaek P5 fab, but mass production will not begin until 2028. As Counterpoint Research director MS Hwang put it: “We expect Samsung to show a significant turnaround with HBM4 for Nvidia’s new products, moving past last year’s quality issues.”
Why did Micron abandon consumer memory for AI?
While SK Hynix and Samsung fight it out for the top spot, Micron made a completely different kind of bet. It restructured its entire business around AI.
Micron’s HBM and cloud memory revenue grew from 17 percent of its DRAM revenue in 2023 to nearly 50 percent in 2025. Then in December 2025, the company discontinued its consumer-facing Crucial brand and walked away from consumer DRAM entirely. All wafer capacity redirected to AI and enterprise memory.
Think about what that signals. A company making good money selling memory to PC builders decided the AI opportunity was so big and so permanent that consumer memory wasn’t worth the wafer space anymore. CEO Sanjay Mehrotra projects the HBM market growing from $35 billion in 2025 to $100 billion by 2028 — hitting that number two years earlier than Micron had previously expected. For how to manage infrastructure costs when supply is this constrained, see our actionable guide for SMB CTOs.
And it’s still not enough. As Micron business chief Sumit Sadana put it: “We’re sold out for 2026.” Even with all that redirected capacity, the company can only meet two-thirds of medium-term requirements for some customers.
How do hyperscalers lock up memory supply before anyone else can access it?
Long-term supply agreements (LTAs) are multi-year capacity reservation contracts. A hyperscaler commits capital 12 to 24 months in advance to lock in guaranteed memory allocation from a manufacturer. The memory is spoken for before it’s even produced.
How seriously do they take this? Google reportedly fired a procurement executive who failed to sign LTAs ahead of the shortage. When Microsoft executives visited SK Hynix headquarters and were told their conditions would be “difficult” to meet, one of them stormed out of the meeting.
Hyperscalers now station dedicated procurement executives in South Korea. Google posted a job for a “Global Memory Commodity Manager.” Meta is hiring “Memory Silicon Global Sourcing Managers.” This isn’t routine purchasing — it’s a strategic function.
The mechanism is straightforward: hyperscale cloud providers secure supply through long-term commitments, capacity reservations, and direct fab investments, getting lower costs and guaranteed availability. Everyone else fights over what’s left.
You can’t replicate that level of capital pre-commitment, volume guarantee, or dedicated procurement infrastructure. As semiconductor analyst Manish Rawat at TechInsights noted: “This imbalance creates a dual constraint for the mid-market: higher input costs and longer delivery timelines.” For what this means for enterprises who cannot match hyperscaler LTAs, see our article on managing infrastructure costs.
Why does AI inference keep growing memory demand?
AI training requires a temporary burst of compute and memory. AI inference — serving users in real time — requires continuous memory allocation that grows with every new user and every new deployment.
During inference, a large language model stores its state in the key-value cache (KV cache). Think of it as the model’s working memory. It sits in HBM during active sessions and gets pushed to slower system memory when idle. Storing these precomputed caches reduces the compute required for extended sessions, but it chews through memory.
Demand compounds through three factors: concurrent users, context window length, and number of deployed models. As hyperscalers roll out more AI models to more users with longer context windows, their HBM appetite grows continuously. And as AI moves towards agent-based systems, those agents need frequent access to extensive vector databases for retrieval-augmented generation (RAG), adding yet another persistent demand driver.
Hyperscaler LTA volumes renew and grow with each deployment cycle. Even if fab capacity increases, demand may keep outpacing supply for the foreseeable planning horizon.
What does fab expansion mean for supply relief timing?
All three memory makers are building new capacity and spreading it across multiple countries — geographic diversification that doubles as a geopolitical hedge.
SK Hynix has an HBM fab in Cheongju, South Korea, targeted for 2027, and a facility in West Lafayette, Indiana, for 2028. Samsung’s Pyeongtaek P5 targets 2028. Micron is building an HBM fab in Singapore for 2027, retooling a fab in Taiwan for second-half 2027, and broke ground on a DRAM fab complex in New York that won’t hit full production until 2030.
Multiple industry sources point to 2028 as the earliest date for meaningful supply relief. As Intel CEO Lip-Bu Tan put it: “There’s no relief until 2028.”
Multiple industry sources point to 2028 as the consensus date for meaningful supply relief — and even then, hyperscalers with existing LTAs will have priority access to new capacity before it reaches the open market.
What does the memory oligopoly mean for companies buying on the open market?
With LTA-secured supply going to hyperscalers, everyone else is on the spot market. And the prices tell the story. Samsung raised prices for 32GB DDR5 modules to $239 from $149 — a 60% increase. DDR5 contract pricing has surged past $19.50 per unit, up from around $7 earlier in 2025.
Gartner forecasts a 47% DRAM price increase in 2026. IDC expects only 16% year-on-year DRAM supply growth — well below demand. IDC puts it plainly: this is “a potentially permanent, strategic reallocation of the world’s silicon wafer capacity” — not a temporary pricing spike.
The question for your organisation is how to plan procurement, budgets, and infrastructure strategy around a supply chain where hyperscalers hold structural priority. For strategies for managing infrastructure costs during the shortage, see our actionable guide. For whether Chinese domestic DRAM changes the competitive picture, we cover the geopolitical factors separately. And for the complete view, read our deep-dive into the memory crisis.
FAQ Section
What is HBM and why is it different from regular server memory?
HBM uses 3D die stacking — up to 12 thinned DRAM dies connected by through-silicon vias — to deliver far higher bandwidth than DDR5. It costs about 3x more per bit to produce and accounts for over half the cost of a packaged AI GPU.
Why are there only three companies that make memory chips?
Building a memory fab costs upwards of $15 billion, takes years, and demands decades of process learning. Samsung, SK Hynix, and Micron collectively hold 100% of the HBM market. Chinese manufacturers like CXMT are expanding but still face large technology gaps and export control restrictions.
What is a long-term supply agreement (LTA) in the memory market?
An LTA is a capacity reservation contract where a buyer pre-commits capital — typically 12 to 24 months ahead — to guarantee memory allocation directly from a manufacturer. Hyperscalers use LTAs to secure supply before it reaches the open market.
How much has server memory pricing increased in 2026?
DDR5 contract pricing went from about $7 to $19.50 per unit. Samsung raised 32GB DDR5 module prices from $149 to $239. TrendForce estimates Q1 2026 DRAM contracts surged 90-95% quarter-on-quarter, and Gartner forecasts a 47% overall DRAM price increase for the year.
Why did SK Hynix overtake Samsung in HBM market share?
SK Hynix’s MR-MUF packaging technology delivers better thermal performance and higher yields than Samsung’s NCF approach. That process advantage, combined with early and deep supply contracts with Nvidia, gave SK Hynix roughly 62% of HBM market share versus Samsung’s 17%.
What happened to Micron’s consumer memory products?
Micron shut down its Crucial consumer brand in December 2025 and redirected all wafer capacity to AI and enterprise memory. AI-related DRAM revenue went from 17% to nearly 50% of total DRAM revenue between 2023 and 2025, and the company is fully committed for 2026.
When will the memory shortage end?
Industry consensus points to 2028 at the earliest. New fabs from all three manufacturers — SK Hynix in Cheongju (2027) and West Lafayette (2028), Samsung in Pyeongtaek (2028), Micron in Singapore and Taiwan (2027) — will take years to reach full output. Intel CEO Lip-Bu Tan summed it up: “There’s no relief until 2028.”
What is the KV cache and why does it increase memory demand?
It’s an AI model’s working memory during inference. Precomputed attention states are held in HBM while a session is active. Memory requirements scale with concurrent users, context window length, and the number of deployed models — so demand compounds rather than levelling off.
How much are hyperscalers spending on AI infrastructure in 2026?
TrendForce projects combined CapEx of the top eight cloud service providers at over $710 billion in 2026. A large share of that flows directly into memory procurement through LTAs with Samsung, SK Hynix, and Micron.
Can smaller companies negotiate the same memory supply terms as hyperscalers?
No. LTAs require the kind of capital pre-commitment, volume, and dedicated procurement staff that mid-market and SMB companies simply don’t have. Hyperscalers station procurement executives at memory fabs and have created roles like “Memory Silicon Global Sourcing Manager” — that level of investment is out of reach for smaller organisations.
Will Chinese DRAM manufacturers change the competitive landscape?
Chinese production — notably CXMT — is growing, but it faces significant HBM technology gaps and export control constraints. Whether Chinese manufacturers can meaningfully change the three-player oligopoly is covered in our companion article on geopolitical factors shaping memory supply.
How does AI inference differ from AI training in memory demand?
Training uses a temporary burst of memory and compute. Inference runs continuously, serving concurrent users with expanding context windows, which creates compounding memory demand that grows with each new deployment rather than reaching a fixed ceiling.