Business

SaaS

Technology

•

Jun 18, 2026

The AI Productivity Paradox: Why Individual Gains Do Not Scale to Organisational Performance

87% of digital workers now use AI at work. 73% say it makes them personally more productive. Only 13% report that AI has measurably improved their organisation’s performance.

That gap is the AI productivity paradox. It is one dimension of the botsitter economy as a whole, and if you are responsible for an AI budget, understanding where that gap comes from matters more than any model benchmark you will read this year.

Two forces are driving it. One is coordination neglect. The other is token maxxing. The first explains why individual gains fail to aggregate. The second explains why the metrics most organisations track actively mislead them about whether AI is working. Neither has anything to do with model capability.

Why do individual AI productivity gains not translate into organisational performance improvement?

AI delivers task-level productivity gains of 10 to 70% for well-defined, text-intensive work, according to the International Labour Organization. Developers using AI complete 21% more tasks and merge 98% more pull requests, per Faros AI telemetry across more than 10,000 developers. The individual data is consistent across every study that has looked.

At the firm level the picture collapses. Faros AI found no correlation between AI adoption and improvement in any DORA metric at the company level. The ILO calls it the Aggregation Paradox: strong micro-level gains that vanish at sectoral and macroeconomic scale.

The mechanism is coordination neglect, and Rebecca Hinds of Glean’s Work AI Institute offers the cleanest illustration of it. One person takes a bullet point and expands it into a five-page report with AI. They send it to a colleague, who finds it too long and uses AI to condense it back into a single bullet point. Each person looks productive. The net output is zero. The AI tooling budget got spent either way.

This compounds at every handoff. AI accelerates individual task completion, but integrating, reviewing, and reconciling those outputs across teams creates a coordination burden that no one budgets for — the same dynamic the 6.4-hour botsitting tax documented in the empirical data captures at the individual level. The gain does not disappear. It gets eaten by the work of stitching everything together.

The survey of 6,000 workers across the US, UK, and Australia found that people spend 6.4 hours per week fact-checking, correcting, and reworking AI outputs. For every hour a worker gets useful output from AI, roughly another hour goes into making it usable. In a team of 50, that is about 320 hours per week, nearly eight full-time equivalents devoted solely to AI oversight. Organisations budget for AI tooling licences but treat review time, rework cycles, and cross-team reconciliation labour as incidental rather than as line items. Uber burned through its entire 2026 AI budget in four months without shipping a usable feature. The costs surface. The returns do not.

If the costs are invisible and the returns are not surfacing, the measurement framework itself has to be part of the problem.

How do you measure whether AI is actually improving organisational performance — not just individual throughput?

The Glean Work AI Index calls this token maxxing. One engineer described his own practice plainly: “I am conscious of not wanting to be seen as ‘uses too little AI’… Things I do to inflate my token usage metrics: Ask AI questions about code already in the documentation… Ask the AI to prototype a feature that I have no intention of working on… Default to always using the agent, even when I know I could do the work by hand much faster. Then watch it fail.”

It’s rational behaviour in response to the metrics leadership chose. When token volume becomes the target, workers produce token volume. Goodhart’s Law, delivered at enterprise scale.

The consequences are measurable. The same Faros AI data that showed developers completing 21% more tasks and merging 98% more pull requests also showed zero improvement in deployment frequency, lead time for changes, or change failure rate at the company level. Individual throughput is up. Organisational throughput is flat.

The right metrics sit on a different axis. Instead of tokens generated and tasks completed, measure revenue per employee, decision accuracy, time-to-market, and error rates in shipped work. Instead of AI session counts, track whether botsitting hours are trending up or down over time. Instead of adoption rates, check whether AI-assisted decisions are measurably better than pre-AI decisions on the same criteria.

Only 31% of AI spend can currently be attributed to specific business outcomes. Closing the gap starts with measuring outcomes, not activity.

Changing what you measure is necessary, but it is not sufficient. The deeper issue is that organisations are managing AI agents as though they were standard enterprise software, and they are not.

What makes managing AI agents fundamentally different from managing traditional enterprise software?

The infrastructure world has a useful distinction: pets versus cattle. Servers treated as cattle are numbered and replaceable. If one goes down, you spin up another. Servers treated as pets have names, individual histories, and are nursed back to health when something goes wrong. Traditional enterprise software follows the cattle model: deterministic, versioned, auditable, uniform across instances. You deploy it, monitor it, patch it.

AI agents are pets. They are probabilistic rather than deterministic, stateful rather than stateless, context-dependent rather than uniform across instances. A conventional API either works or it does not. An AI agent produces output that is plausible, syntactically correct, and subtly wrong in ways you cannot predict in advance.

Palo Alto Networks acquired Portkey in April 2026 to build an AI Gateway as a control plane for autonomous agents. Their chief security intelligence officer called AI agents “2026’s biggest insider threat” to The Register. This is not a product recommendation. It is an industry signal that AI management is a different category from software management.

Operationally, the shift is from deploy-and-monitor to oversee-and-steward. AI agent management needs different staffing profiles: stewards, not operators. It needs different budgeting models: ongoing oversight, not upfront deployment. And it needs different governance frameworks: adaptive, not rule-based. The average organisation now runs 37 agents in production. Fewer than half are actively monitored or secured.

The coordination burden that consumes individual AI gains is the predictable output of applying cattle operations to pets. For how enterprise architecture can reduce the coordination burden, the three-pronged response framework tackles this across psychology, regulation, and technical infrastructure.

The AI productivity paradox is resolvable, but not through better models or more adoption. It closes when organisations stop treating AI as a productivity multiplier that gets deployed and tracked like traditional software, and start treating it as a probabilistic system that demands stewardship, outcome-based measurement, and governance built for how AI agents actually behave.

That means budgeting for oversight labour as a line item. It means measuring revenue per employee and error rates rather than token counts and login frequencies. And it means redesigning workflows around the fact that AI agents are individual creatures you steward, not uniform instances you deploy.

The useful question is whether your management model lets individual productivity gains actually reach the organisation — a question that the regulatory and architectural responses reshaping the botsitter economy are designed to address.

Frequently Asked Questions

Is AI actually making companies more productive, or just busier?

For most organisations, AI is making people busier rather than more productive at the company level. The 73 percent of individuals who report personal gains are completing tasks faster, but the coordination overhead of integrating, reviewing, and reconciling those outputs across teams consumes the gains before they reach the bottom line. Faster individual task completion does not equal higher organisational throughput when every accelerated output requires cross-team validation and rework.

Will better AI models fix the coordination problem automatically?

No. Coordination costs are a function of organisational design, not model capability. More capable AI agents produce more outputs that must be integrated, reviewed, and reconciled across teams, which means the coordination burden scales with AI capability rather than diminishing with it. The paradox is structural: unless organisations redesign how work flows between people and AI systems, better models simply generate more work for the humans who must stitch everything together.

What actually counts as a coordination cost in practice?

Coordination costs include fact-checking AI outputs before they enter shared systems, reconciling conflicting AI-generated recommendations from different teams, rewriting AI drafts to align with organisational context the model lacks, explaining AI output rationale to stakeholders who were not in the generation loop, and the meeting time spent aligning on which AI outputs to accept or discard. These activities are usually classified as “normal work” rather than as AI-specific overhead, which is why they escape measurement.

How much time do knowledge workers spend fixing or correcting AI outputs each week?

The empirical data from the Work AI Index 2026 documents an average of 6.4 hours per week spent on botsitting, which includes fact-checking, correcting, and reworking AI outputs. This figure compounds at organisational scale: in a team of 50 knowledge workers, botsitting consumes roughly 320 hours per week in aggregate, representing nearly eight full-time equivalent roles devoted solely to AI oversight rather than productive work.

Are small organisations immune to the AI productivity paradox?

Smaller organisations face the same coordination dynamics but at a different scale. A startup of fifteen people may have fewer formal boundaries for coordination costs to accumulate across, but the same integration tax applies whenever AI outputs cross the boundary between one person’s work and another’s. The paradox is not about organisational size but about the number of handoff points where AI-generated work must be validated, contextualised, and reconciled before it can be trusted.

What does good AI governance look like when agents are probabilistic rather than deterministic?

Effective governance for probabilistic AI agents requires adaptive frameworks rather than rule-based checklists. This means establishing acceptable error thresholds per use case, mandating human-in-the-loop validation for outputs that cross organisational boundaries, tracking botsitting hours as a governance metric alongside traditional KPIs, and implementing feedback loops that route AI output quality data back to procurement and deployment decisions. The governance question is not “does this AI comply” but “is this AI’s error profile acceptable for this context.”

How long does it take for organisations to start seeing measurable returns from AI adoption?

There is no standard timeline because most organisations are not measuring returns on organisational metrics at all. The Faros AI data shows no significant correlation between AI adoption and improvement in DORA metrics at the company level, which means organisations that have been running AI programs for years are not yet seeing throughput translate. The timeline begins when an organisation starts measuring revenue per employee, decision accuracy, and error rates as AI performance indicators rather than counting tokens generated or tasks marked complete.

What should organisations budget for beyond AI tooling licences?

Organisations should budget for the human infrastructure of oversight: review time, rework cycles, cross-team reconciliation labour, and the management overhead of coordinating AI outputs across organisational boundaries. If 6.4 hours of botsitting per worker per week is the baseline, then a mid-size organisation should calculate that cost at fully loaded salary rates and treat it as a first-class line item rather than absorbing it as invisible overhead. Training, governance tooling, and stewardship roles should also be budgeted from the start.

Is the problem that organisations are using the wrong AI tools, or that they are managing them incorrectly?

The problem is primarily one of management and organisational design rather than tool selection. Even the most capable AI tools produce the paradox when organisations reward output volume over organisational outcomes, treat oversight labour as incidental, and fail to redesign workflows around the probabilistic nature of AI agents. Swapping tools without addressing coordination neglect and token maxxing simply shifts which AI generates the outputs that humans must then spend time integrating and validating.

What happens if an organisation ignores coordination neglect and keeps scaling AI usage?

The organisation burns through its AI budget without seeing commensurate returns, a pattern already observable at enterprise scale. AI tooling costs rise while the hidden human oversight costs compound silently in the background, consuming the individual productivity gains before they reach organisational performance metrics. Over time, the gap between perceived AI value and measured company outcomes widens, and leadership loses confidence in AI investment without understanding that the failure is in how AI is managed rather than in the technology itself.

The AI Productivity Paradox: Why Individual Gains Do Not Scale to Organisational Performance

Why do individual AI productivity gains not translate into organisational performance improvement?

How do you measure whether AI is actually improving organisational performance — not just individual throughput?

What makes managing AI agents fundamentally different from managing traditional enterprise software?

Frequently Asked Questions

Is AI actually making companies more productive, or just busier?

Will better AI models fix the coordination problem automatically?

What actually counts as a coordination cost in practice?

How much time do knowledge workers spend fixing or correcting AI outputs each week?

Are small organisations immune to the AI productivity paradox?

What does good AI governance look like when agents are probabilistic rather than deterministic?

How long does it take for organisations to start seeing measurable returns from AI adoption?

What should organisations budget for beyond AI tooling licences?

Is the problem that organisations are using the wrong AI tools, or that they are managing them incorrectly?

What happens if an organisation ignores coordination neglect and keeps scaling AI usage?

Related Articles

How a team extension can help your business achieve everything you want

10 SaaS startups that can cut months off your runway

AI is going to make remote teams and video calls less frustrating

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG