For two decades the data centre processor market was a settled duopoly. Intel and AMD controlled virtually all of it. But at hyperscale, Intel’s 60 per cent plus gross margins became a line item large enough to build your own chip team around.
In 2015, Amazon acquired a small Israeli chip design startup called Annapurna Labs for about $350 million. The industry mostly shrugged. Ten years later, AWS Graviton has 120,000 customers, Meta is deploying tens of millions of Graviton cores, and Microsoft and Google have launched their own Arm server CPUs. What started as one company’s efficiency experiment has become a disruptive force in the server CPU market, the CPU renaissance accelerating hyperscaler silicon investment. And the company that enabled it all, Arm Holdings, just decided it wants a seat at the table.
How did Amazon become the first hyperscaler to successfully deploy custom Arm server CPUs at scale?
The Annapurna Labs acquisition gave AWS a head start no competitor has matched. While rivals were still signing purchase orders with Intel, Amazon was building a chip design team that would iterate through five silicon generations in eight years.
The progression tells the story. Graviton2 in 2020, with 64 Neoverse N1 cores, proved Arm could compete on cloud workloads. Graviton3 moved to Neoverse V1 cores and a chiplet design, debuting DDR5 and PCIe5 a full year ahead of AMD and Intel. Graviton4 scaled to 96 cores with dual-socket support. And Graviton5, which went GA on 10 June 2026, packs 192 Neoverse V3 cores and 172 billion transistors on TSMC’s 3nm process. It replaces Graviton4’s dual-socket NUMA design with a single socket, eliminating the cross-socket latencies that complicated application performance.
What made this work was a self-reinforcing loop. AWS runs its own services like Lambda, Fargate, RDS, and ElastiCache on Graviton, so each new generation gets a large captive deployment before anyone else touches it. Over half of the new CPU capacity AWS added in the past two years has been Arm based. And Annapurna Labs uses Graviton powered EDA tools to design the next Graviton. The cycle feeds itself.
The real proof came from customers. Pinterest saw 47 per cent cost savings on key workloads and a 62 per cent reduction in carbon emissions. Honeycomb measured 36 per cent better throughput per core compared to Graviton4. Atlassian moved over 3,000 Jira and Confluence instances to Graviton, with instance counts dropping roughly 30 per cent and throughput improving similarly. By the time 98 per cent of AWS’s top 1,000 EC2 customers were running production workloads on Graviton, the question was no longer whether Arm belonged in the data centre. It was how fast competitors could respond.
Why did AWS, Google, and Microsoft build their own custom Arm-based server CPUs?
The hyperscaler custom silicon wave sits on three reinforcing drivers.
First, economics. Designing custom silicon on Arm’s Neoverse platform and contracting directly with TSMC for manufacturing shifts Intel and AMD’s margins to you. Amazon has not purchased an Intel or AMD CPU for its own services since 2022.
Second, architecture. Arm’s RISC ISA delivers 30 to 60 per cent better performance per watt than x86. At data centre scale, where power and cooling eat 40 to 60 per cent of operational costs, that efficiency compounds into hundreds of millions in annual savings. A 20 per cent performance per watt advantage is worth more than a 15 per cent raw performance lead.
Third, control. Intel and AMD design for every workload. A custom CPU can be tuned for the workloads your fleet actually runs. Cache sizing, core count, memory bandwidth, and I/O are all dialled to your profile, not a general purpose compromise. Microsoft used telemetry from real Azure workloads to engineer Cobalt 200. Google’s Axion instances deliver up to 65 per cent better price performance than comparable x86 systems.
The competitive intensity is real. Microsoft launched Cobalt 200 on 2 June 2026. AWS launched Graviton5 on 10 June. That eight day gap tells you everything about how fast these companies are moving.
What is Arm Neoverse and how does it enable hyperscalers to design custom server chips?
The invisible platform behind all of this is Arm Neoverse and its Compute Subsystem licensing model. CSS turns chip design from a $500 million plus multi year gamble into a customisation exercise.
When you license CSS, you get pre validated building blocks: Neoverse V series cores for performance (used in Graviton, Nvidia Grace, and Arm’s own AGI CPU), N series cores for efficiency (used in Cobalt 100 and Axion), a coherent mesh interconnect for scaling core counts, and ready made memory controllers and PCIe/CXL I/O subsystems. You add your own fabric, accelerators, and workload specific tuning on top.
The model cuts time to market from five plus years to two to three and derisks execution enough that 12 companies have signed 21 CSS licences. Over one billion Neoverse cores have been deployed globally. Arm’s data centre royalty revenue more than doubled year over year.
But Neoverse CSS is also where the tension lives. It is the enabler of the custom silicon wave and now a potential constraint. If Arm prioritises its own AGI CPU over CSS licensees, hyperscalers may start exploring RISC V alternatives. More on that in a moment.
How do Arm custom CPUs stack up against x86 on performance per watt?
The short answer: they win, and the advantage is structural.
On SPECrate2017, Cobalt 200 scored 840 on a 128 core instance compared to Graviton5’s 780 on 96 cores, though Graviton5’s per core performance is stronger. On Redis latency at 500,000 operations per second, Graviton5 recorded p99 of 0.45 milliseconds against an x86 baseline of 0.82 milliseconds. Signal65’s benchmarks of Graviton4 against x86 tell the same story: on Llama 3.1 8B inference, Graviton4 delivered 168 per cent better performance than AMD and 162 per cent better than Intel.
The efficiency advantage comes from fundamentals. RISC ISA simplicity means fewer transistors per core for the same work. Non SMT design means no shared execution resources, giving deterministic per thread performance under sustained load. And Arm was built for power efficiency from the start. Its mobile heritage, decades of optimising for battery powered devices, gave it a design philosophy x86 has been retrofitting for a decade.
Cross cloud comparisons are imperfect, of course. Graviton5 leads on core density per instance. Cobalt 200 on Azure native integration. Axion on Google’s AI infrastructure ecosystem. “Best” depends on where your workloads already live.
But what makes all of this more than an architectural debate is a new workload that demands exactly what Arm custom silicon is best at.
How has agentic AI reshaped demand for custom Arm server CPUs in 2026?
Agentic AI has turned the server CPU from a support actor into a primary compute tier. And that changes your procurement calculus.
Traditional inference is GPU bound. Training is GPU bound with CPU head nodes. But agentic workloads are CPU intensive throughout. Each reasoning step triggers tool calls, code execution, database queries, and API calls. Research from Georgia Tech and Intel found that CPU side tool processing accounts for up to 90.6 per cent of total latency in representative agentic workloads. The GPU sits idle while the CPU handles the active work.
Reinforcement learning multiplies the demand. A single RL training run spawns thousands of parallel code compilation and verification environments, creating demand surges that SemiAnalysis described as an extremely severe capacity shortage for CPUs.
The CPU to GPU ratio in AI data centres is shifting. Today it runs roughly 1:4 to 1:8. TrendForce expects this to move toward 1:1 to 1:2 in agentic AI deployments. Arm’s CEO sees CPU core demand reaching 120 million cores per gigawatt, up from roughly 30 million today.
Custom silicon is adapting. Graviton5’s 192 cores, five times larger L3 cache, and 33 per cent lower inter core latency were explicitly designed for the orchestration workloads agents throw at CPUs. Meta’s deployment of tens of millions of Graviton5 cores targets real-time reasoning, code generation, and multi step task orchestration.
What is the Arm AGI CPU and why did Arm decide to make its own chip after 35 years of licensing?
In March 2026, Arm Holdings did something it had never done in 35 years. It announced its own chip, stepping onto the field it built for everyone else.
The Arm AGI CPU is a 136 core Neoverse V3 based processor, manufactured at TSMC 3nm, purpose built for agentic AI orchestration. It runs at 300W TDP, fits in air cooled 1U deployments, and Arm claims it delivers more than two times the performance per rack of x86 CPUs, with potential CAPEX savings of up to $10 billion per gigawatt of AI data centre capacity.
The strategic logic is straightforward. After enabling the hyperscaler custom silicon wave through Neoverse CSS, Arm sees a market mature enough for direct participation. Meta approached Arm three years ago asking for finished CPU parts. SoftBank’s $6.5 billion acquisition of Ampere Computing consolidated server chip efforts under one owner. Arm’s market capitalisation jumped 15 per cent the day after the announcement, adding roughly $20 billion in value.
The tension is equally straightforward. Arm now competes with AWS, Microsoft, and Google, three of its largest CSS licensees. Dan Hutcheson of TechInsights called it a tightrope. Bernstein analyst Stacy Rasgon noted that capturing even 5 per cent of the server CPU market within three years would mean billions in revenue at margins that dwarf the licensing business. As the architectural shift reshaping data centre investment accelerates, Arm’s move from licensor to competitor changes the calculus for every player.
Arm’s framing is that the AGI CPU is additive rather than competitive. More than 50 companies announced support at launch, including AWS, Google, Microsoft, and Nvidia. The dual monetisation structure means Arm gets paid whether it wins the socket directly or a licensee does, since everyone builds on Arm architecture.
Still, the symbolism is hard to ignore. The company that spent 35 years saying it would never sell chips, a principle articulated by founding CEO Robin Saxby with the words “we’ll make chips over my dead body“, just entered the market as a vendor.
The server CPU market is now a multi front contest where hyperscalers compete with each other, the architecture licensor competes with its own licensees, and x86 incumbents scramble to adapt a cloud native philosophy they did not invent. What infrastructure buyers need to ask is not Intel or AMD. It is whose custom silicon, and whose architecture, you are betting on. For the full competitive landscape — including how x86 incumbents and Nvidia are responding — the broader picture matters as much as the custom silicon race itself, and evaluating custom silicon for your own infrastructure is the practical question this disruption forces every buyer to answer.
Frequently Asked Questions
Will RISC-V challenge Arm in the custom server CPU market?
RISC-V is emerging as an alternative instruction set architecture, particularly in China where geopolitical pressures are accelerating adoption, but it remains several years behind Arm in data centre maturity. Arm’s Neoverse CSS platform provides pre-validated subsystems and a billion-core deployed base that RISC-V lacks. Hyperscalers considering RISC-V would need to rebuild their software ecosystem from scratch, a multi-year effort that currently favours staying within the Arm ecosystem. RISC-V’s server moment will come, but not before 2028 at the earliest.
Are Intel and AMD just standing still while hyperscalers eat their market?
No. Intel’s Clearwater Forest (2025) and AMD’s EPYC Turin (2024) represent aggressive responses to the Arm custom silicon threat. Intel’s efficiency-core (E-core) strategy with Sierra Forest and Clearwater Forest targets the exact workload sweet spot where Arm excels. AMD’s Zen 5c dense cores in Bergamo and Turin match Arm’s core density while maintaining x86 compatibility. The merchant vendors are not standing still, but the structural advantage of workload-specific customisation means they are competing against designs precisely tuned for their largest customers’ exact workloads.
What does custom silicon mean for cloud prices, will my AWS bill go down?
Directionally, yes. Graviton instances typically deliver 20 to 40 percent better price-performance than comparable x86 instances, and AWS passes much of that saving through. Spotify reported a 250 percent performance improvement, Pinterest saw 47 percent cost reduction, and Honeycomb measured 36 percent more throughput per core after migrating to Graviton. The savings compound when organisations commit to single-architecture deployment. However, migration costs, retesting, and potential compatibility issues with x86-specific dependencies can offset some savings in the first year.
Do you need special software to run on Arm server CPUs?
For Linux-based cloud workloads, the Arm software maturity gap has effectively closed. Major Linux distributions (Amazon Linux, Ubuntu, RHEL), container runtimes (Docker, containerd), Kubernetes, and all major databases (MySQL, PostgreSQL, Redis, MongoDB) have native Arm builds. The remaining friction is in legacy enterprise software with x86-specific optimised libraries, .NET Framework (not .NET Core) applications, and Windows Server workloads, where Arm support is still developing. For greenfield cloud-native applications, running on Arm is no longer a compromise.
How does Nvidia’s Grace CPU fit into this market?
Nvidia’s Grace CPU (Neoverse V2-based, 144 cores) serves a different market than Graviton or Cobalt. It is designed as a tightly coupled companion to Nvidia’s H100 and B200 GPUs, using NVLink-C2C interconnect for 900 GB/s bandwidth between CPU and GPU, far exceeding PCIe limits. Grace is not a general-purpose server CPU, it is an AI supercomputing building block. Nvidia’s upcoming Vera CPU (2026) extends this strategy. While Graviton competes for cloud-native workloads, Grace competes for GPU-attached AI infrastructure, a complementary but distinct market.
Is Arm’s decision to make its own chips a conflict of interest with its licensees?
It is the central tension of the 2026 server CPU market. Arm’s AGI CPU puts the company in direct competition with AWS, Microsoft, and Google, the three most important Neoverse CSS licensees. Arm argues that the AGI CPU targets a specific workload (agentic AI orchestration) at a scale beyond individual cloud providers, but the optics are uncomfortable. SoftBank’s acquisition of Ampere Computing for $6.5 billion consolidates Arm’s chip ambitions. The real question is whether hyperscalers maintain their Neoverse CSS licences or accelerate exploration of RISC-V alternatives as a hedge.
Can companies smaller than the hyperscalers benefit from Arm-based custom silicon?
Yes, and this is the significance of Arm’s 21 CSS licences across 12 companies. Ampere Computing (now SoftBank-owned) ships Altra and AmpereOne processors available to any data centre operator. Nvidia’s Grace is available through DGX and OEM channels. Oracle Cloud, Alibaba (Yitian 710), and Tencent have all invested in Arm server silicon. The CSS model means a company with the right engineering team can build a differentiated server CPU for a fraction of the cost of a full custom design, though the capital investment remains in the hundreds of millions.
Is performance-per-watt really more important than raw performance for server CPUs?
At hyperscale, unequivocally yes. Power and cooling represent 40 to 60 percent of data centre operational costs. A 30 percent performance-per-watt advantage delivering the same work at lower power is worth more than a 15 percent raw performance lead that requires more cooling and electricity. This is why cloud providers optimise for total cost of ownership rather than peak benchmark scores. Arm’s structural efficiency advantage, non-SMT (no hyperthreading) deterministic per-thread performance, and higher core density combine to deliver better throughput per watt even when individual x86 cores are faster at single-threaded tasks.
What happens to the server CPU market if hyperscalers stop buying from Intel and AMD entirely?
This is already happening at the margin. Amazon has not purchased a single Intel or AMD CPU for its own services since 2022, running AWS infrastructure entirely on Graviton internally. Microsoft and Google are following similar trajectories with Cobalt and Axion respectively. However, the merchant CPU market will not disappear. Cloud providers must still offer Intel and AMD instances because enterprise customers demand them for compatibility, lift-and-shift migrations, and specific software requirements. The market bifurcates into hyperscaler-owned silicon for internal and cloud-native workloads, and merchant silicon for customer-facing instance types.
Has the Arm server CPU market reached mainstream adoption or is it still early?
It crossed into mainstream adoption in 2024 to 2025. Over one billion Neoverse cores deployed, 120,000 AWS Graviton customers, Meta’s deployment of tens of millions of Graviton cores, and three concurrent hyperscaler chip programmes (plus Nvidia’s Grace) make this categorically no longer an experiment. The 2026 launch of Graviton5, Cobalt 200, and the Arm AGI CPU signals the beginning of the second generation of competitive custom silicon. Organisations not evaluating Arm for their cloud infrastructure are now behind the curve rather than ahead of it.