Insights Business| Generative AI| SaaS| Technology The CPU Renaissance: How AI Agents Are Bringing General-Purpose Computing Back to the Data Centre
Business
|
Generative AI
|
SaaS
|
Technology
Jun 17, 2026

The CPU Renaissance: How AI Agents Are Bringing General-Purpose Computing Back to the Data Centre

AUTHOR

James A. Wondrasek James A. Wondrasek
The CPU Renaissance: How AI Agents Are Bringing General-Purpose Computing Back to the Data Centre

In June 2026 at Computex Taipei, Arm CEO Rene Haas stood on stage and declared a “CPU renaissance.” For once, the numbers backed the rhetoric. After three years of GPU-dominated AI narratives, the industry’s silicon budgets are shifting. The CPU-to-GPU ratio in AI data centres has moved from 1:8 toward 1:1, with agent-dense racks now demanding three to five CPUs for every GPU. Bank of America projects a $125 billion server CPU total addressable market by 2030, and the fastest-growing segment (Arm-based custom silicon) is one that barely existed five years ago.

This is not a temporary rebalancing. Agentic AI workloads (where software agents plan, use tools, call APIs, and coordinate across dozens of sub-agents per user request) are structurally CPU-bound in ways chatbot workloads never were. The orchestration layer that agents make unavoidable is where CPUs live — as our deep-dive on agentic orchestration explains — and it is consuming an ever-larger share of the end-to-end latency budget.

This pillar page maps the architectural shift, the market numbers, the competitive battlefield, and what it all means for your infrastructure planning. Each section below links to a detailed companion article that goes deeper on that dimension. Think of this as your orientation to a topic that is quietly rewriting how data centre silicon budgets get allocated.

In This Series

What is the CPU renaissance and why is it happening now?

The CPU renaissance describes a structural reallocation of data centre silicon budgets driven by the arrival of agentic AI. Coined by Arm CEO Rene Haas at Computex 2026, it captures the shift from an era where CPUs were an afterthought in GPU-dominated AI clusters toward one where general-purpose processors are the critical path for orchestration, tool execution, and agent coordination. It is happening now because agentic AI workloads (unlike stateless chatbots) introduce multi-step reasoning chains, external tool calls, and sub-agent spawning that all execute on the CPU. This reflects a structural architectural change in how AI compute is consumed, not a short-term market adjustment.

The training era of 2022 to 2025 was a GPU party. Training runs demanded GPUs at scale and CPUs were provisioning overhead (feed the GPUs data fast enough, and your job was done). The agentic era inverts this. Every deployed agent generates continuous CPU load whether or not a GPU is actively inferring. The headline numbers make this legible: the ratio shift from 1:8 toward 1:1, Bank of America’s $125 billion server CPU TAM projection, and a fourfold increase in CPU core demand per gigawatt of data centre power. Arm estimates data centres will need more than four times the current CPU capacity per gigawatt, from 30 million cores to 120 million — a scale of demand the market numbers make starkly clear. CPUs are no longer a commodity line item. They are the command layer of agentic systems.

CPUs have always been present in every server and data centre. What has changed is not the CPU’s existence but its role. The GPU-centric AI narrative of the training era obscured the orchestration layer that agentic workloads make unavoidable, and the renaissance reflects that layer becoming visible and budget-worthy. General-purpose compute is reclaiming budget share because agentic workloads structurally require it.

If you are planning AI infrastructure, the CPU decision is no longer an afterthought to GPU procurement. CPU architecture choice, core count, memory bandwidth, and vendor strategy are now first-order infrastructure decisions — a reality the decision-support capstone in this series walks through in detail. The rest of this pillar page and its companion articles provide the context you need to make them.

For the full architectural deep-dive, read Why AI Agents Are Bringing CPUs Back to the Centre of the Data Centre.

Why are CPUs becoming a bottleneck in agentic AI when GPUs dominated the AI narrative?

GPUs dominated the training-era narrative because training is a throughput problem — vast matrix multiplications best handled by parallel processors. Agentic AI is different: it is a latency-sensitive, multi-step orchestration problem where the GPU finishes inference in milliseconds and then waits while the CPU executes tool calls, parses results, and coordinates sub-agents. Research from Intel and Georgia Tech found that tool processing on CPUs accounts for 50 to 90 percent of total latency in agentic workloads. The bottleneck reflects orchestration latency compounding across multi-step agent chains, creating a structural dependency that did not exist when AI meant single-turn chatbot responses.

The training-era compute profile was GPU-dominated, batch-oriented, throughput-optimised. The agentic-era profile is CPU-orchestrated, latency-sensitive, multi-step. The bottleneck emerges not because CPUs are slow but because agentic workloads generate orders of magnitude more CPU work than chatbot workloads. Every tool call, every sub-agent spawn, every state-management operation runs on the CPU. In a 10-step agentic task with three tool calls per step, you get 30 rounds of CPU-heavy orchestration work for every 10 GPU inference passes. The compute profile looks nothing like batch inference.

The compounding effect is what makes this structural. A five-tool agent call (where the agent queries a database, calls an API, reads a file, spawns a sub-agent, and validates results) may involve 200 milliseconds of GPU inference and 2.5 seconds of CPU execution. The CPU becomes the dominant latency contributor, and the GPU sits idle between inference calls. This dynamic scales linearly with agent complexity and concurrency, driving the CPU-to-GPU ratio shifts now reshaping data centre architecture.

Meta’s deployment of AWS Graviton5 at “tens of millions of cores” for agentic workloads — part of the hyperscaler custom silicon trend reshaping the server CPU market — validates that the bottleneck is a production concern at hyperscale. Intel admitted it “misjudged” demand. AMD is effectively sold out. CPU lead times have stretched to six months. This is not theoretical.

For the full explainer on why agentic orchestration is different, read Why AI Agents Are Bringing CPUs Back to the Centre of the Data Centre.

What does a CPU actually do in an agentic AI workflow?

In an agentic AI workflow, the CPU handles everything between GPU inference calls: tokenizing user inputs, parsing model outputs into structured actions, managing agent memory and context state, executing tool calls (API requests, database queries, file I/O, web scraping), spawning and coordinating sub-agents, tracking dependency graphs, and running reflection loops where the agent evaluates whether its goal was met. The GPU does the “thinking” (generating text, code, or plans through matrix operations). The CPU coordinates execution: deciding what happens next, dispatching work, integrating results, and keeping the agent loop moving. Without sufficient CPU orchestration capacity, the GPU sits idle.

The agentic loop follows a cycle: first, the GPU generates an execution plan from the user’s request. Second, the CPU decomposes the plan into sub-tasks, spawns parallel sub-agents, and orchestrates their tool execution. Third, the GPU performs a reflection inference to evaluate whether the original goal was met (restarting the cycle if not). Each iteration amplifies CPU utilisation because every sub-agent runs its own orchestration loop, and tool calls compound across parallel branches.

Tool execution (API calls, database queries, code compilation, web scraping) is inherently general-purpose compute. It involves I/O, parsing, validation, and state management that do not map well to GPU parallelism. The CPU’s role as the system’s general-purpose processor makes it the natural home for orchestration. Even if you wanted to move orchestration to the GPU, the workload profile (branch-heavy, I/O-bound, latency-sensitive) is a poor fit for GPU architecture.

Then there is the sub-agent multiplier. A single user request to a coding agent may spawn sub-agents for repository search, test execution, documentation lookup, and dependency analysis (all running in parallel, all executing on CPU cores). This multiplies CPU demand linearly with agent count and is the primary reason agentic workloads are orders of magnitude more CPU-intensive than chatbot workloads — a dynamic that shapes how the six major CPU vendors are competing for this emerging workload. A financial anomaly detection workflow showed CPU operations dominated total runtime across data loading, baseline calculation, document retrieval, and enrichment through web searches.

Read Why AI Agents Are Bringing CPUs Back to the Centre of the Data Centre for the full orchestration breakdown.

CPU vs GPU for AI — when does each actually make sense, and are NPUs changing the equation?

GPUs are the right tool for throughput-oriented matrix computation (training large models and running high-volume inference where batch processing amortises latency). CPUs are the right tool for orchestration, tool execution, and state management (the general-purpose work that surrounds and coordinates GPU inference). They are complementary, not competitive. NPUs serve a third role: low-power on-device inference for laptops, phones, and IoT devices. They are not a CPU replacement. They handle inference while the CPU handles orchestration. As AI shifts from single-turn chatbots to multi-step agents, the CPU’s share of the total compute budget grows because orchestration (not inference throughput) becomes the dominant workload.

The CPU-GPU relationship is complementary rather than competitive. GPUs excel at the matrix operations that power model inference and training. CPUs excel at the general-purpose orchestration that surrounds those operations. The question is not “which is better?” but “what is your workload profile?” Training-heavy workloads remain GPU-dominated. Agentic inference workloads shift the balance toward CPU because orchestration consumes an increasing share of total latency.

NPUs are purpose-built for low-power inference on edge devices. They handle the model’s forward pass efficiently but cannot manage the orchestration, tool execution, and state management that agentic AI requires. In a Copilot+ PC, the NPU runs the on-device model while the CPU handles agent orchestration. They are complementary roles, and NPUs do not reduce the CPU demand from agentic workloads. If anything, they increase it by enabling more agents to run locally.

The CPU-vs-GPU framing is increasingly outdated. The relevant question for infrastructure planning is: what is your orchestration-to-inference ratio? As agentic AI adoption grows, the CPU share of the silicon budget rises (not because CPUs are replacing GPUs but because orchestration is a new workload class that GPUs cannot efficiently serve). Amazon’s own guidance recommends CPU for SLMs under 8 billion parameters, embeddings, classifiers, and batch scoring, reserving GPU for large-model online inference with strict latency requirements.

For the market numbers behind this shift, read How AI Agents Are Reshaping Server CPU Demand and Data Centre Architecture.

How does agentic AI change the CPU-to-GPU ratio in data centre infrastructure?

In the training era, a typical AI cluster operated at a CPU-to-GPU ratio of 1:8 (one CPU managing eight GPUs was sufficient because the CPU’s only job was feeding data to the GPU pipeline). Agentic AI has driven that ratio toward 1:1, with agent-dense deployments reaching three to five CPUs per GPU. The shift is workload-driven: every deployed agent generates continuous CPU load from orchestration loops, tool calls, memory management, and inter-agent coordination. TrendForce projects the ratio stabilising at 1:1 to 1:2 for agentic AI clusters, and Arm estimates CPU core demand per gigawatt of data centre power will quadruple. This is already visible in hyperscaler procurement patterns.

The training-era baseline of 1:8 made economic sense when AI meant training runs and single-turn inference. The CPU was a head node, provisioning GPUs, managing data pipelines, handling network I/O. Compute was GPU-bound, and the CPU was a commodity afterthought.

Agentic AI breaks this model because each deployed agent generates a continuous stream of CPU work that did not exist in the training era. A cluster running thousands of concurrent agent sessions needs CPU cores at a scale that the 1:8 head-node model cannot provide — the core reason agentic orchestration is structurally CPU-bound. The shift toward 1:1 is an architectural necessity, not a design preference. SemiWiki’s current sizing guidance recommends 86 to 120 CPU cores per GPU depending on workload characteristics.

The optimal ratio depends on your agent architecture. Simple tool chains (two to three tools per turn) can operate at lower ratios than complex orchestration graphs with sub-agent spawning. The trend line is clear: CPU demand per unit of GPU compute is rising. Infrastructure plans that treat CPU procurement as an afterthought will hit a scaling wall. Nvidia’s standalone Vera rack (256 CPUs, no GPUs) is a market signal: the ratio can invert for certain workloads.

For the full market analysis and ratio planning framework, read How AI Agents Are Reshaping Server CPU Demand and Data Centre Architecture.

Why are hyperscalers building their own custom CPUs?

AWS, Google, and Microsoft are building their own Arm-based server CPUs for three reinforcing reasons. First, economics: at hyperscale, Intel’s margins become a line item too large to accept. Intel’s gross margins typically run 60 to 70 percent, and at hyperscale that becomes a direct cost to the cloud provider. Custom silicon eliminates the merchant silicon markup. Second, control: owning the silicon design allows workload-specific optimisation that merchant chips cannot deliver, particularly for the orchestration-heavy agentic workloads each cloud provider sees from its customers. Third, architecture: Arm’s RISC ISA offers a structural power-efficiency advantage that compounds at data centre scale. A 20 percent performance-per-watt improvement translates to substantial power and cooling savings. AWS Graviton5, Microsoft Cobalt 200, and Google Axion are not experiments. They are the fastest-growing segment of the server CPU market.

The competitive intensity is unmistakable. The eight-day gap between Microsoft Cobalt 200 VM launch (June 2, 2026) and AWS Graviton5 M9g general availability (June 10, 2026) offers a snapshot of how fast these companies are moving. Hyperscalers are racing each other as much as they are racing Intel. Each generation of custom silicon tightens the performance-per-watt gap with x86 and deepens the hyperscaler’s platform lock-in with customers who optimise for their specific silicon — adding fuel to the six-way battle for the server CPU market.

AWS Graviton5 packs 192 Neoverse V3 cores across a 4-chiplet design on TSMC 3nm with 12-channel DDR5, PCIe Gen6, and CXL 3.0 — specifications that reflect how agentic AI is reshaping CPU demand. Microsoft Cobalt 200 delivers 132 Neoverse V3 cores across two TSMC 3nm dies with a 50 percent speedup over Cobalt 100. Google Axion C4A claims up to 65 percent better price-performance and 60 percent greater energy efficiency than comparable x86 systems. About half of the compute shipped to top hyperscalers is now Arm-based.

Then there is Arm’s own-chip pivot. After 35 years as a licensing-only company, Arm announced the AGI CPU, a purpose-built agentic AI processor. When the licensor becomes a competitor, the dynamics shift for everyone. Arm’s hyperscaler customers are now also Arm’s competitors. That is a market-structure change, not a product launch.

Read Inside the Custom Silicon Race Reshaping the Server CPU Market for the full custom silicon deep-dive.

How do the major CPU vendors compare for agentic AI in 2026?

The agentic AI server CPU market has expanded from an Intel-AMD duopoly to a six-player field. Nvidia’s Vera CPU targets $100 billion in CPU revenue with standalone rack configurations. AMD’s EPYC Venice (up to 256 cores, 512 threads via SMT) leads on thread density for action-heavy agentic workloads. Intel’s Diamond Rapids counters on the Intel 18A process node but drops SMT (a potential weakness for orchestration-heavy deployments). Arm’s AGI CPU enters as a purpose-built agentic AI processor. AWS Graviton5, Microsoft Cobalt 200, and Google Axion compete as hyperscaler-specific custom silicon. There is no single winner. Each has structural advantages in different workload dimensions. Your choice depends on whether your agents are reasoning-heavy (few agents, long chain-of-thought) or action-heavy (many agents, many tool calls).

Nvidia is the most disruptive entrant. Vera packs 88 custom Olympus cores with spatial multithreading, 1.2 TB/s LPDDR5X memory bandwidth, and NVLink-C2C interconnect at 1.8 TB/s. Phoronix found 1.5x overall performance advantage over 128-core x86 in early benchmarks. Michael Larabel called it “competitiveness to Intel/AMD x86_64 CPUs that I have never seen out of any other ARM or non-x86_64 processors.” Nvidia’s $100 billion CPU revenue target signals this is not a side project.

AMD is the most resilient x86 incumbent, reaching 46.2 percent x86 server revenue share in Q1 2026. EPYC Venice on TSMC N2 with 256 Zen 6 cores and 512 threads via SMT claims a 70 percent generational performance leap. For action-heavy workloads where thread density matters, Venice’s 512 threads are the benchmark.

Intel is the incumbent under siege. Diamond Rapids offers up to 256 cores but drops SMT entirely (meaning 256 threads versus Venice’s 512). That is a significant disadvantage for orchestration-heavy agentic workloads where thread density matters. Intel’s x86 server revenue share fell 9.5 percentage points in 12 months to 53.8 percent in Q1 2026. Diamond Rapids is rumoured delayed to 2027, giving AMD a head start.

The key insight: vendor selection should follow your workload profile, not the other way around — a principle the decision-support framework in this series makes actionable. Reasoning-heavy workloads (few agents, long chain-of-thought) favour high per-core performance and low-latency CPU-GPU interconnects like Nvidia Vera’s NVLink-C2C. Action-heavy workloads (many agents, many tool calls) favour core count and thread density. There is no single winner across all dimensions.

For the full six-player competitive analysis, read The Six-Way Battle for the Agentic AI Server CPU Market.

x86 vs ARM CPUs for agentic AI workloads — which architecture is better?

Neither architecture is universally better. The right choice depends on your workload profile, scale, and platform commitments. Arm-based designs (Graviton5, Cobalt 200, Axion, AGI CPU) typically deliver 30 to 60 percent better performance per watt than comparable x86 processors, an advantage that compounds at data centre scale. Goldman Sachs forecasts data centre power demand growing 165 percent by 2030, amplifying the value of Arm’s efficiency advantage. x86 (Intel Xeon, AMD EPYC) retains the broadest software ecosystem compatibility and mature ISV support, which matters if your agentic workloads depend on specific x86-compiled libraries or legacy enterprise toolchains. The gap is narrowing but has not closed.

The Arm efficiency advantage is real and compounding. At data centre scale, a 30 percent performance-per-watt delta translates to substantial annual power and cooling savings. Arm’s AGI CPU claims 2x the performance per rack versus x86, with “up to $10 billion in CAPEX savings per GW of AI data center capacity” (an Arm estimate that has not been independently validated). But compatibility is not a binary. Arm’s software ecosystem has matured rapidly, and the tool-call libraries that agentic AI depends on (HTTP clients, database drivers, JSON parsers) are largely ISA-agnostic. The compatibility gap that matters is in enterprise-specific workloads. If your agent needs to call a legacy x86-compiled library, Arm introduces friction.

Platform lock-in is a variable that is easy to overlook. Arm custom silicon is cloud-provider-specific. Choosing Graviton5 means choosing AWS. Choosing Cobalt 200 means choosing Azure. Choosing Axion means choosing GCP. For organisations with multi-cloud strategies, the Arm versus x86 decision is often inseparable from the cloud provider decision — a reality the custom silicon deep-dive explores in full. One analysis noted that once you optimise your CI/CD pipelines for Graviton’s specific cache behaviour, switching providers “becomes an order of magnitude more difficult than it was in the era of generic x86 VMs.”

The ISA gap is narrowing. As one analyst put it, “this is the last iteration of the table that will overweight x86 over ARM ISA. The gap is narrow today, and more hyperscalers are deploying both equally.”

Read Choosing Between Arm and x86 CPUs for Agentic AI Infrastructure for the full evaluation framework.

How should I evaluate and select a server CPU for agentic AI workloads?

Start with workload profiling, not vendor comparison. Characterise your agentic workloads on the reasoning-to-action spectrum: are your agents making few long chain-of-thought decisions (reasoning-heavy), or are they spawning dozens of tool-calling sub-agents per request (action-heavy)? Reasoning-heavy workloads favour high per-core performance and low-latency CPU-GPU interconnects. Action-heavy workloads favour core count, thread density, and memory bandwidth for context switching. Next, map your scale threshold. The Arm TCO advantage is accretive at small scale and decisive above it. Factor in procurement risk: merchant x86 supply chains face different exposure profiles than hyperscaler custom silicon. Finally, consider multi-architecture deployment. Running Arm for orchestration-heavy workloads and x86 for legacy-dependent workloads is often the most resilient strategy, though it adds operational complexity.

Starting with vendor benchmarks is the wrong order. You need to understand what your agents actually do before you can evaluate which CPU serves them best. Profile your agent’s end-to-end latency breakdown. Run a representative workload and measure GPU inference time versus total tool execution and orchestration time. If orchestration and tool calls consume more than 50 percent of end-to-end latency, your deployment is CPU-bound and will benefit from higher CPU investment. If GPU inference dominates (common in reasoning-heavy, few-tool agents), your bottleneck is on the GPU side.

TCO is where the decision gets real. Per-core cost, power and cooling (where Arm’s efficiency advantage compounds), and platform lock-in (custom silicon ties you to a cloud provider) are the primary variables — factors the server CPU market sizing analysis grounds in hard numbers. Add procurement risk. Intel distributors are only fulfilling about 40 percent of yearly backlog, with Asian customers waiting up to eight months. AMD EPYC delivery windows stretch beyond 30 weeks. The global DRAM shortage affects memory costs for high-core-count CPU deployments. The supply picture is tight through at least 2027 (more detail in the FAQ below).

Multi-architecture deployment (Arm for orchestration-heavy workloads, x86 for legacy-dependent workloads) is the most resilient strategy. It avoids single-vendor lock-in and allows workload-specific optimisation. But it adds operational complexity that needs to be weighed against the lock-in risk of single-architecture commitment. The nine metrics that define CPU suitability for agentic workloads are: per-core performance, core count, CPU-xPU interconnect bandwidth, memory bandwidth and capacity, cache design and size, performance per watt, PCIe generation and lanes, ISA, and NUMA domains. Knowing which matter for your specific workload is what separates informed decisions from vendor-driven ones.

For the full decision-support framework, read Choosing Between Arm and x86 CPUs for Agentic AI Infrastructure.

Resource Hub: The CPU Renaissance Deep Dives

The resources below deepen each dimension of the evaluation framework laid out above.

Understanding the Shift: Architecture and Market Forces

Why AI Agents Are Bringing CPUs Back to the Centre of the Data Centre: The foundational explainer on what the CPU renaissance is, how agentic AI differs from chatbot AI, and why CPU orchestration now dominates end-to-end latency. Read this first if you are new to the topic.

How AI Agents Are Reshaping Server CPU Demand and Data Centre Architecture: The market quantification. The 1:8 to 1:1 ratio shift, the $125B+ TAM projection, and why standalone CPU racks are emerging as a distinct product category. Read this to understand the numbers behind the narrative.

The Competitive Landscape: Who Is Winning and Why

Inside the Custom Silicon Race Reshaping the Server CPU Market: Deep dive into AWS Graviton5, Microsoft Cobalt 200, and Google Axion. Why hyperscalers vertically integrated, how they compare on performance per watt, and what Arm’s AGI CPU means for the market. Read this to understand the fastest-growing segment of server CPU spend.

The Six-Way Battle for the Agentic AI Server CPU Market: The full competitive landscape. Nvidia’s $100B CPU ambition, Intel and AMD’s counterpunches, Arm’s own-chip entry, and how the six players compare on the dimensions that matter for agentic workloads. Read this for a structured vendor comparison.

Planning Your Infrastructure: Decisions and Frameworks

Choosing Between Arm and x86 CPUs for Agentic AI Infrastructure: The decision-support capstone. An evaluation framework for architecture choice, CPU-to-GPU ratio planning, TCO comparison, and procurement risk management. Read this when you are ready to act on the insights from the rest of the series.

Frequently Asked Questions

What is the difference between reasoning-heavy and action-heavy agentic workloads, and why does it matter for CPU selection?

Reasoning-heavy workloads involve fewer agents making long chain-of-thought decisions, with the GPU dominating the latency budget. They favour CPUs with high per-core performance and low-latency CPU-GPU interconnects like Nvidia Vera’s NVLink-C2C — a landscape the six-way vendor comparison maps fully. Action-heavy workloads spawn many parallel sub-agents executing dozens of tool calls per turn, making the CPU the dominant latency contributor. They favour high core counts, strong thread density via SMT, and high memory bandwidth (areas where AMD EPYC Venice and Arm’s Neoverse designs excel). Understanding where your workloads sit on this spectrum is the starting point for CPU evaluation.

Standalone CPU racks vs. traditional head-node CPU deployments — which delivers better utilisation?

Standalone CPU racks (where racks are populated entirely with CPUs rather than fixed-ratio CPU-GPU pairs) become the more efficient architecture when your CPU-to-GPU ratio exceeds 1:1. They allow CPU scaling to operate independently of GPU scaling, avoiding the power, cooling, and interconnect bottlenecks that arise when all CPUs must be co-located with GPUs. Head-node deployments remain efficient at lower ratios where the CPU’s primary role is GPU provisioning. The tipping point is workload-driven: if your agents are action-heavy with high orchestration demand, standalone racks typically deliver better rack-level utilisation and lower TCO. Read more about ratio planning.

Will there be enough CPUs for all these AI agents, or is there a shortage?

There is already a shortage. Intel admitted it “misjudged” demand. Lead times have stretched to six months. AMD is effectively sold out for 2026. The constraint is compounded by TSMC’s advanced-node wafer allocation (CEO C.C. Wei says 3nm capacity is “about three times short” of demand) and the global DRAM shortage affecting high-core-count CPU deployments. Most analysts project supply will remain tight through at least 2027. This is not a temporary disruption. It is a structural supply-demand imbalance caused by the speed of the agentic AI adoption curve. Explore the supply-demand dynamics in more detail.

Can I run AI agents on a CPU without buying a GPU?

Yes. For many agentic workloads, CPU-only deployment is viable and increasingly common. The GPU handles inference (generating text, code, or plans), but the orchestration, tool execution, and state management that dominate agentic workloads run on the CPU regardless. If your agents make infrequent inference calls relative to their orchestration work, or if you use smaller models that run efficiently on CPU (aided by quantization and Intel AMX matrix extensions), CPU-only deployment can be cost-effective. AWS recommends CPU for SLMs under 8 billion parameters, embeddings, classifiers, and batch scoring, reserving GPU for large-model online inference. Modern Xeon processors deliver up to 31.5 times the inference throughput of legacy platforms. Read the architectural guidance.

Is Intel actually losing the CPU market to AMD and Arm right now?

Yes. Intel’s x86 server revenue share fell 9.5 percentage points in 12 months to 53.8 percent in Q1 2026, while AMD reached 46.2 percent — a market shift the full competitive landscape analysis tracks in detail. More importantly, the fastest-growing segment of the server CPU market (Arm-based custom silicon) is one Intel cannot access. Bank of America estimates Arm custom silicon will capture approximately 37 percent of the server CPU TAM. Intel is fighting back with Diamond Rapids on the 18A process node, but the structural headwinds (hyperscaler vertical integration, Arm’s efficiency advantage, Nvidia’s CPU entry) are not cyclical.

What are reinforcement learning training loops and why do they demand so many CPU cores?

Reinforcement learning (RL) is a post-training technique where AI models improve by executing actions in an environment and receiving rewards. Each RL training iteration spawns hundreds of agent trajectories, each generating dozens of tool calls (code compilation, verification, physics simulation, synthetic data validation), all executing on CPU cores. RL training environments are effectively massive CPU clusters running parallel agent simulations. This creates a second CPU demand driver beyond agentic inference: every frontier AI lab investing in RL training is also investing in CPU infrastructure at a scale that did not exist in the pre-agentic era. Nvidia’s Vera CPU Rack delivers over 22,500 concurrent RL environments, more than 4x the capacity of x86-based racks. Deeper coverage of orchestration workloads.

How do I assess whether a given agentic workload is CPU-bound or GPU-bound before provisioning?

Profile your agent’s end-to-end latency breakdown. Run a representative workload and measure GPU inference time versus total tool execution and orchestration time. If orchestration and tool calls consume more than 50 percent of end-to-end latency (as the Intel and Georgia Tech research found is typical for tool-dominated agentic workloads), your deployment is CPU-bound and will benefit from higher CPU investment. If GPU inference dominates (common in reasoning-heavy, few-tool agents), your bottleneck is on the GPU side. The key variable is tool-call complexity: the more external systems your agent interacts with per turn, the more CPU-bound it becomes. Full evaluation framework.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter