Insights Business| Generative AI| SaaS| Technology How AI Agents Are Reshaping Server CPU Demand and Data Centre Architecture
Business
|
Generative AI
|
SaaS
|
Technology
Jun 17, 2026

How AI Agents Are Reshaping Server CPU Demand and Data Centre Architecture

AUTHOR

James A. Wondrasek James A. Wondrasek
How AI Agents Are Reshaping Server CPU Demand and Data Centre Architecture

If you’ve been watching data centre infrastructure over the last two years, you’ve probably noticed something that doesn’t match the conversation. Everyone is talking about GPUs, but CPUs keep showing up in procurement numbers. Intel’s data centre revenue hit $5.1 billion in the first quarter of 2026, up 22% year over year. AMD posted its fourth consecutive record quarter. And when you trace the demand back to its source, the driver is agentic AI workloads, not a supply chain correction or a hardware refresh cycle.

The data centre CPU, which served a support role in the GPU-dominated AI era, is experiencing a demand surge that has nothing to do with chatbots—it is the CPU renaissance driving a generational hardware shift in how data centres are being built. Unlike chatbot inference, where CPUs served brief head-node bursts feeding GPUs, agentic AI creates sustained, always-on CPU orchestration load from tool calling, memory management, and multi-step reasoning loops. If your team is deploying agents, the question is no longer whether CPU demand is growing. It’s how to provision for workloads where CPU orchestration latency accounts for the majority of total response time.

How are AI agents different from chatbots in their compute infrastructure requirements?

The difference starts with how each workload actually runs. A chatbot receives a prompt, the GPU does the heavy lifting, and the interaction ends. The CPU’s job is modest: tokenise input, ship it to the GPU, de-tokenise output. Maybe 5 to 10 percent of total compute.

An agent operates differently. It receives a task, decomposes it into sub-tasks, spawns parallel sub-agents, invokes external tools, stores intermediate results, evaluates progress, and integrates outputs into a final response. All of this is coordinated by the CPU across hundreds of parallel execution threads. Agents don’t follow a pre-determined sequence: they call tools, spawn sub-agents with different models, retain information in memory, manage their own context window, and decide for themselves when they’re finished.

There are four structural differences that separate agents from chatbots. First, sustained load: agents run for seconds to minutes, not sub-second bursts. Second, statefulness: agents maintain context across multi-step reasoning, requiring CPU-managed memory coordination. Third, I/O intensity: agentic designs need roughly 100 PCIe lanes compared to 16 for AI training, a fivefold increase, because the CPU must interact with network cards, storage, and external services simultaneously. Fourth, parallelism: agents orchestrate dozens to hundreds of concurrent sub-tasks, each requiring CPU scheduling.

Research from Georgia Tech and Intel published in late 2025 found that CPU-side tool processing accounts for up to 90.6 percent of total latency in agentic workloads. With better GPUs, the bottleneck shifts further toward CPUs because GPU inference gets faster while CPU tool execution does not accelerate at the same rate. Chatbot-era data centres were designed around GPU throughput. If you’re building for agents, your data centre needs to be designed around CPU orchestration throughput.

Why has the CPU-to-GPU ratio shifted from 1:8 toward 1:1 in AI data centres?

In the training-dominant era, the standard rack configuration was 1 CPU per 4 to 8 GPUs. CPUs served as head nodes, data loaders feeding accelerator trays, with workload patterns characterised by brief bursts of scheduling activity followed by long GPU compute intervals.

Agentic AI inverts this. Where training-era CPUs saw intermittent activity, every deployed agent generates the continuous orchestration load described above: tool calling, memory management, context swapping, and sub-agent scheduling running nonstop. TrendForce sees the ratio moving to between 1:1 and 1:2 for agentic applications. Intel CFO David Zinsner confirmed during the Q1 2026 earnings call that server deployment ratios have already moved from 1:8 to 1:4, and as workloads migrate toward inference and agentic AI, that ratio could converge to 1:1 or tilt further toward CPUs.

There is a power-budget dimension to this. Arm estimates that traditional AI data centres require about 30 million CPU cores per gigawatt. In the agent era, that number surges to 120 million cores per gigawatt, a fourfold increase. Sustained CPU load replaces intermittent head-node activity, and each gigawatt of data centre power now requires roughly four times the CPU cores compared to the training-dominant era. This matters for your power and cooling planning: the CPU is no longer a rounding error in the rack-level thermal budget.

The ratio is not fixed at 1:1. Agent-dense deployments with parallel sub-agent coordination and real-time tool integration may push to 3 to 5 CPUs per GPU. Meta’s deployment of tens of millions of AWS Graviton5 cores for agentic workloads shows the trend is operational reality at hyperscale, not an analyst projection.

Why are standalone CPU racks emerging as a distinct product category?

When CPU-to-GPU ratios exceed 1:1, attaching every CPU as a head node inside GPU racks creates three simultaneous bottlenecks. Power: combined CPU and GPU draw pushes racks beyond their thermal design envelope. Cooling: CPUs now require the same liquid cooling infrastructure as GPUs rather than air cooling. Interconnect: data movement between CPUs on different GPU trays introduces latency that agentic orchestration cannot tolerate.

Standalone CPU racks solve this by decoupling CPU scaling from GPU scaling. These are CPU-only racks deployed in line with GPU and accelerator racks as part of a complete agentic inference system. Nvidia’s Vera CPU Rack fits 256 CPUs with 22,528 cores and 400 TB of memory into a single rack. Arm’s AGI CPU Rack pushes higher still: 336 CPUs, 45,696 cores, 1 petabyte of memory in a liquid-cooled configuration. CoreWeave has already deployed standalone Vera CPU racks, and Jensen Huang indicated “there are going to be many more.”

From an economic perspective, standalone racks offer three advantages over head-node-only configurations. Rack-level utilisation improves because CPU racks are decoupled from GPU workload patterns and can run at higher average utilisation. Power overhead drops when dedicated cooling is optimised for CPU thermal profiles rather than the mixed profile of a combined rack. And workload isolation means CPU-heavy orchestration workloads don’t contend with GPU workloads for rack resources, eliminating the performance interference that degrades both. The Diligence Stack estimates this adds roughly 25 to 30 percent to the server CPU TAM beyond head-node procurement models.

This is where standalone racks connect directly to the market story. They are not just an architectural novelty; they represent a measurable expansion of the server CPU total addressable market, one of several vectors driving projections that are rewriting infrastructure planning timelines.

How is the server CPU total addressable market projected to grow through 2030?

The numbers are large and, more importantly, converging. Bank of America projects the server CPU TAM reaching $170 billion by 2030 on structural demand. UBS arrives at the same number with different assumptions, forecasting 40.6 percent annual growth. Citi breaks the market into buckets: agentic CPU growth alone hits a 185 percent CAGR to reach $59.4 billion by 2030. AMD, which doubled its own forecast from 18 percent to 35 percent annual growth, now sees a $120 billion market by decade’s end.

But the TAM is not just growing, it is restructuring. The fastest-growing segment is Arm-based custom silicon at roughly 37 percent of the market. That is AWS Graviton, Azure Cobalt, Google Axion, and the newer entrants: Nvidia’s Vera and Arm’s own AGI CPU. The merchant x86 segment, Intel Xeon and AMD EPYC, still generates the most revenue but its share is declining.

Intel’s data centre CPU market share fell 9.5 points in 12 months to 54.9 percent in Q1 2026. AMD, meanwhile, reached 46.2 percent of x86 server CPU revenue. The custom silicon segment is growing fastest, and traditional vendors are least exposed to it. For your procurement planning, this means total spending grows but who captures it is changing, and your vendor strategy needs to account for the CPU renaissance reshaping silicon investment priorities.

What makes server CPU demand structural rather than cyclical?

Cyclical demand reflects temporary supply-demand imbalances, like pandemic-era chip shortages. Structural demand reflects a permanent change in what workloads require. Agentic AI has permanently expanded the set of compute tasks that CPUs must handle.

Tool calling, memory coordination, context swapping, and sub-agent orchestration are workload categories that did not exist in the chatbot era and will not diminish as GPU hardware improves. Jason Beckett, CTO EMEA at Hitachi Vantara, characterises the shift plainly: “Always-on, multi-step reasoning systems don’t create brief orchestration bursts around GPU workloads. They demand high-core-count CPUs running at sustained loads, continuously.” Roger Cummings, CEO of PEAK:AIO, makes the same assessment: the requirement is “structural, not cyclical.”

Agentic orchestration is one vector of structural demand. Reinforcement learning adds a second, independent one. RL training loops require parallel CPU clusters for code compilation, verification, and physics simulation that GPUs cannot perform. SemiAnalysis identifies this as a new bottleneck because GPU performance-per-watt improves faster than CPU, widening the gap between what RL algorithms demand and what CPU infrastructure can deliver. These two vectors, agent orchestration and RL environment execution, are additive, not overlapping, and neither diminishes as GPU generations advance.

The demand signal is self-reinforcing. As agentic capabilities improve, more workloads become agentic. As more workloads become agentic, CPU demand grows. CoreWeave’s deployment of standalone Vera racks and Meta’s partnership with Arm on the AGI CPU are not bets on a temporary spike. They are operational commitments to a demand profile that only grows from here.

Competing for this structural demand is a vendor landscape that looks nothing like the one most infrastructure teams built their procurement processes around.

How is the competitive landscape for server CPUs changing as agentic AI scales?

The server CPU market has not been this contested in two decades. Six distinct vendor categories now compete: Intel and AMD in x86, Nvidia with its Vera CPU, Arm as a direct silicon vendor, hyperscaler custom silicon from AWS, Microsoft, and Google, and merchant Arm chips from Ampere and Huawei.

Intel remains the revenue leader but the trajectory is downward. Diamond Rapids ships without Simultaneous Multithreading, and Clearwater Forest faces yield challenges on the 18A process. AMD’s EPYC Venice, with 256 cores on TSMC N2, claims a 70 percent generational performance leap and the company has doubled its market growth forecast. Nvidia’s Vera enters not as a general-purpose CPU competitor but as a reasoning-end CPU where the NVLink-C2C interconnect, 1.8 terabytes per second of chip-to-bandwidth, is the defining feature.

Arm occupies a position no other company can replicate. Graviton, Cobalt, Axion, Grace, and Vera are all built on the Arm ISA, generating royalty income regardless of which platform wins. The AGI CPU adds direct silicon revenue on top. Over 1 billion Neoverse cores have been deployed, with 21 CSS licenses across 12 companies. The hyperscaler wildcard is that AWS, Microsoft, and Google now design their own server CPUs, and Meta’s co-development partnership with Arm on the AGI CPU represents a model where hyperscalers skip merchant silicon entirely. The fastest-growing TAM segment is the one traditional vendors are least exposed to.

Server CPU demand represents the architectural signature of a workload transition that changes what data centres look like, who supplies them, and how infrastructure teams plan. The CPU-to-GPU ratio, the emergence of standalone CPU racks, and the restructuring of the competitive landscape are three facets of the same shift. The buildout is structural, the ratio varies by workload, and the vendor landscape has permanently expanded beyond a two-player x86 contest. For your infrastructure planning, CPU architecture decisions have become the primary choices of the agentic era—part of the broader competitive dynamics that will define data centre procurement through 2030 and beyond.

Frequently Asked Questions

Will better GPUs eventually eliminate the need for more CPUs in agentic AI?

No. Better GPUs solve a different problem. GPUs accelerate model inference, but agentic AI’s CPU bottleneck comes from orchestration tasks like tool calling, memory management, and sub-agent coordination, not from the model itself. Faster GPUs do not reduce the number of API calls an agent makes. The Georgia Tech and Intel finding that CPU-side processing accounts for up to 90.6% of total agentic latency underscores this: the bottleneck is coordination, not computation.

How soon should infrastructure teams begin planning for the CPU-to-GPU ratio shift?

Planning should begin now. The shift from 1:8 toward 1:1 is already visible in hyperscaler procurement, and the lead time for data centre power, cooling, and interconnect provisioning is 18 to 36 months. While not every deployment reaches a 1:1 ratio today, teams that wait for their own utilisation data to confirm the trend will face capacity shortfalls. Meta’s deployment of tens of millions of AWS Graviton5 cores shows the transition is operational, not theoretical.

Can existing GPU-heavy data centres be retrofitted for agentic AI, or does a full architectural rebuild become necessary?

Retrofitting is feasible but has hard limits. Existing data centres can add standalone CPU racks where power and cooling headroom exists, and head-node CPU upgrades can improve orchestration throughput within existing GPU trays. However, once the CPU-to-GPU ratio exceeds approximately 1:1, the combined power, cooling, and interconnect demands make retrofitted mixed racks thermally unworkable. At that threshold, purpose-built CPU-only racks with dedicated liquid cooling become an architectural requirement, not an option.

What does the shift toward CPU-heavy agentic workloads mean for data centre energy consumption?

The energy profile changes shape more than total draw. While adding more CPUs increases total rack power consumption, standalone CPU racks can run at higher average utilisation than mixed CPU-GPU racks because they are decoupled from GPU workload patterns, improving capital efficiency per watt. The fourfold increase in CPU cores per gigawatt reflects a workload-driven rebalancing of compute resources, not an across-the-board increase in total power. The sustainability outcome depends more on utilisation rates and cooling efficiency than on total core count.

Are standalone CPU racks relevant for anyone beyond the major hyperscaler cloud providers?

They are most directly relevant to hyperscalers and large-scale AI infrastructure operators today. CoreWeave has already deployed standalone Vera CPU racks, and Nvidia and Arm are shipping standalone designs as standard products rather than bespoke hyperscaler builds, indicating broader market intent. For mid-size operators, the shift first appears as increased head-node CPU density within existing GPU racks. As agentic workloads scale, however, the same physics constraints that drive hyperscalers toward standalone racks apply at smaller scale too.

How does the agentic CPU demand shift affect software architecture and orchestration frameworks?

Agentic orchestration frameworks must evolve to treat CPU capacity as a first-class resource, not an assumed abundance. Tool-calling latency, context-switching overhead, and sub-agent scheduling all run on CPU and can exceed 90% of end-to-end response time. Frameworks that treat CPU as free will encounter degraded agent performance as deployment scales. The practical implication is that CPU-aware scheduling, memory locality optimisation, and dedicated orchestration compute pools become software requirements alongside model-serving infrastructure.

Does the agentic AI CPU demand pattern apply to edge computing and on-device deployments?

The pattern applies in principle but at a different scale. On-device agents on smartphones or edge servers face the same structural challenge: tool calling, memory management, and sub-agent coordination all consume CPU cycles. Arm’s dominance in edge and mobile means its Neoverse-based designs are well positioned for the edge agentic computing layer. The ratio shift is most extreme in the data centre, but the underlying physics of agents being CPU-orchestrated rather than GPU-dominated holds across all deployment scales.

What role does memory bandwidth play in the CPU demand story, and why does it matter?

Memory bandwidth is a primary constraint for agentic CPU workloads. Unlike chatbot inference (which is GPU-memory-bound), agentic AI requires sustained CPU-managed memory access for KV cache coordination, context preservation across multi-step reasoning, and tool-calling result storage. Agents are predominantly decode-bound, generating output tokens one by one with tool calling interleaved, creating sustained memory bandwidth pressure. This is driving interest in CPU-attached memory architectures like SOCAMM and LPDDR-based server memory specifically designed for agentic workloads.

Is Intel’s market share decline a temporary execution issue or a structural disadvantage in the agentic era?

It is increasingly structural. Intel’s 9.5-point share loss in 12 months reflects two forces beyond any single product delay: the custom silicon trend where Arm’s ISA dominates the fastest-growing TAM segment at an estimated 37% share, and Intel’s difficulty matching the core density and memory bandwidth profile that agentic workloads demand. Diamond Rapids shipping without Simultaneous Multithreading (characterised as a “severe handicap”) and the cancelled 8-channel SP platform suggest the competitive gap is widening, not closing.

Will surging server CPU demand from agentic AI push up costs for traditional enterprise server CPU buyers?

There is reason to watch this closely. Intel has publicly signalled wafer reallocation from client to server processors, combined with supply constraints described as “starts with a B.” If foundry capacity continues tilting toward the higher-margin agentic AI segment, traditional enterprise buyers of x86 server CPUs could face longer lead times and higher prices. This is an emerging risk rather than an active crisis, but it reinforces the logic of diversifying CPU architecture choices now rather than waiting for supply to tighten further.

How do reinforcement learning workloads add a separate layer of CPU demand beyond agent inference?

Reinforcement learning training loops create an additional structural CPU demand layer because RL environments require parallel CPU clusters for code compilation, verification, interpretation, and physics simulation that GPUs cannot perform. SemiAnalysis identifies this as a “new bottleneck” because GPU performance-per-watt improves faster than CPU, widening the gap between what RL algorithms demand and what CPU infrastructure can deliver. This RL-driven demand is additive to the orchestration demand from agent deployment, creating two independent CPU growth vectors.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter