When you look at how the frontier AI companies are talking and operating, the future looks like it will be filled with giant data centres within which all jobs are performed. It doesn’t appear to leave much for the rest of us.
Looking beyond OpenAI and Anthropic talking up their future prospects, there are some trends in software, hardware and AI that are bootstrapping off each other and pointing to a different direction things might go.
We’re going to look at these trends and show how they fit together and what it might mean, starting with software.
From Claude Code to Clawed Code
In May 2025 Claude Code became generally available. This agentic coding tool was one of the first tools to clearly demonstrate how capable models had become. Using the chat interface within a terminal (and the code open in an IDE), a developer could tell Claude to make some changes and then sit back and watch as the agent performed all the steps to find and read files, make edits and run tests. It might even do an online search if it needed more information.
The uptake of Claude Code was rapid, and subscriptions to it drove up Anthropic’s revenues. A whole new product category had been born. And it was a category that people were happy to pay for – unlike AI chat. Other model providers – Google, OpenAI – took notice and launched their own versions.
Anthropic continued to refine their models, training them to perform more consistently and diligently within the Claude Code “harness” and across the kinds of tasks developers needed completed.
In October of the same year Peter Steinberger, was working on a tool to connect Claude Code to a WhatsApp chat so he could keep working from his phone when he was away from his computer. It was simple middleman that ran commands from a WhatsApp chat in Claude Code and passed the results back to the chat.
Steinberger “discovered” that the Claude Code instance running back at home on his computer could use any command line tools it had available, and would even download and install tools if it needed them.
All the training Anthropic had performed to improve the coding workflow ability of the model had made it usable as a general agent.
Steinberger went on to build tools so Claude Code could access Google services, including mail and calendars, and the first “ClawdBot” was born. Three months later it was the fastest growing open source project in history, renamed to OpenClaw to avoid litigation, and Steinberger was hired by OpenAI.
Another new product category had been stumbled upon. And this one was not limited to developers. Ignoring all the security issues, people were using OpenClaw to run their businesses. People were installing OpenClaw for their parents. In China tech firms were having OpenClaw days where they helped consumers install it and get it connected to their services. OpenClaw as user lock-in.
Agents were no longer just for coders, they were for anyone. Agents were becoming the universal voice/chat interface to the complexities and drudgeries of the services everyone is forced to use.
Smarter, faster, smaller
Once a model is released, the providers continue to train them to improve performance. Training data is cumulative. The more training data you can collect over time (especially via free products where training on interactions is the price of using it) the more high quality data you can accumulate. Model providers have been collecting data from millions of users for a few years now.
At the same time, optimisation of model training is a global research focus in the field. As the data curation improves so do the techniques to make use of that data. The effects of this can be seen in the leap of ability in Claude Opus in late November 2025 and OpenAI’s GPT-5.2 a few weeks later. Developers reported both as the latest step change in model ability.
Post-training techniques and data curation are also benefitting open source models. These models are now only 3 to 6 months behind frontier models in performance and the intelligence gap between them has dropped to 5-7% according to the Artificial Analysis Intelligence Index.
Optimised training and curated data is resulting in small models that can be run on local hardware, like the Qwen3.5 9 Billion parameter model (released March 2026) that can beat GPT-4 on instruction following and reasoning.
The GPT-4-0613 model, though official numbers have never been released, was estimated to be a 1.7 Trillion parameter MOE model with 220 Billion parameters active at any one time. 200x larger than Qwen3.5 9B.
Qwen3.5 also comes in 27B, 35B-A3B, 122B-A10B, 397B-A17B sizes, as well as smaller 0.8B, 2B, 4B parameter versions.
The 27B model is smarter than the MOE 35B-A3B, but the MOE model, due to having only 3 billion parameters active at a time, can be used in lower memory environments and respond faster.
There has also been a shift towards 4 bit parameters, both on the training side (via Nvidia NVPF4) and on the local model side (eg Unsloth’s Qwen3.5 4 bit quants). Normally model parameters are a 16 bit floating point number (eg 0.29847). This means a 9 billion parameter model would need 18 gigabytes (16 bits = 2 bytes, 2 * 9B = 18B) just to be loaded into memory. Quantised to 4 bits in a way that minimises loss of quality, it can be loaded into 4GB of RAM.
The take away is that there exists smart, useful models that can match the resources available for compute.
Building brains for AI
In 2017 Apple announced the A11 Bionic chip for the new iPhoneX. It was a system-on-a-chip (SoC) that featured integrated CPU, memory, GPU, and a Neural Processing Unit (NPU). The NPU was used to power Face ID, computation photography (needed by the tiny camera), speech recognition and more.
Apple’s drive to bring more compute to the power-constrained environments of the iPhone and iPad translated nicely to their volume-constrained and slightly less power-constrained laptops. In November 2020 they launched the M1 – a laptop SoC that set new benchmarks in speed and efficiency, but was still just a higher-powered version of the chip powering the iPhone.
But in building a SoC that could support the iPhone’s best-in-class features, Apple ended up with an architecture that looks like it was designed for the AI boom. The memory tightly integrated with the processor instead of connected via copper traces running across circuit boards, the GPU and NPU cores – it all made running local AI models fast and efficient.
And because the memory architecture did not separate main memory from the memory available to the GPU and NPU, it meant M-series chips could run any model that could fit into the total memory available – up to 512GB on M3 Ultra chips.
Videos of developers running trillion parameter models on Apple hardware (Kimi K2.5 on M3 Ultra Mac Studios) made developers on PCs with 16 GB GPU cards take notice. So did CPU manufacturers. They have all started bringing out their own SoCs targeting local AI:
- AMD Ryzen AI Max
- Intel Lunar Lake
- NVIDIA DGX Spark
- Qualcomm Snapdragon X Elite
- MediaTek Kompanio Ultra
This move from designing and building general purpose CPUs to AI-optimised SoCs is a seismic shift in the marketplace. It’s bigger than PCs arriving with graphics capabilities or WiFi.
Nvidia is still a great believer in the RTX AI PC, which is simply the current PC+GPU set up. These are step up in performance but come at a price, and that price currently limits the size of models that can be run on them.
Where these trends might lead
Over the next few years, between hardware improvements and model intelligence increasing, most people’s needs for any kind of AI agent will be able to be met by a model running locally on their own machine.
Most people don’t build front ends for websites. They don’t need frontier coding models. And the people who do build front ends for websites: they have all the same administrative, scheduling and service navigation needs everyone else does.
Agents, the product that OpenAI, Anthropic and Nvidia want to sell to enterprise and everyone else; and what we expect is the start of general AI usage spreading out into the world, will not be able to be kept locked behind a paywall.
While the giant data centres look like a bid to corner the supply of compute, and thus control the price of access to AI (not at all helped by Sam Altman stockpiling RAM), trends in small local model intelligence and improvements in the hardware they run on suggest choice and control might remain in the hands of individual users and businesses.