Business

SaaS

Technology

•

Jun 18, 2026

The Botsitter Economy: When Managing AI Agents Becomes the New Full-Time Job

Hero Section

Back in May and June 2026, Glean surveyed 6,000 digital workers and surfaced a statistic that reframes the whole enterprise AI conversation: the average knowledge worker spends 6.4 hours per week not using AI, but managing it. Feeding context. Correcting mistakes. Cleaning up errors. Switching between disconnected tools that each need the same organisational information fed to them separately.

The researchers called it “botsitting,” and the companion behaviour they uncovered, “botshitting” (shipping AI-generated work you cannot explain or defend), is admitted to by 69% of surveyed workers. The full Work AI Index 2026 report was published by the Work AI Institute, commissioned by Glean, and co-authored by researchers from Stanford, UC Berkeley, UC Santa Barbara, Emory, and several other universities. It is the first large-scale study to attach hours, percentages, and terminology to an experience workers have been living with but could not name.

These findings describe something more structural than a temporary adoption friction. They describe the emergence of a botsitter economy: a workplace in which supervising AI agents has become a material job function rather than an occasional task. This pillar page maps the three dimensions of that economy: the empirical reality, the organisational paradox it creates, and the multi-pronged response taking shape across psychology, regulation, and technical architecture. Each cluster article takes one dimension deep. Start here for the full picture.

In This Series

Botsitting and Botshitting: The Empirical Picture from the Work AI Index 2026 The data behind the 6.4-hour botsitting tax, the 36% AI session failure rate, and the 69% of workers who admit to shipping AI work they cannot explain. This is where the numbers live.

The AI Productivity Paradox: Why Individual Gains Do Not Scale to Organisational Performance Why 87% of workers use AI and 73% report personal gains, yet only 13% see organisational improvement. The gap that makes the botsitter economy an investment problem, not just an academic curiosity.

Psychology, Regulation, and Architecture: A Three-Pronged Response to the Botsitting Crisis How moral disengagement, the EU AI Act, and enterprise graphs are reshaping AI oversight. The three dimensions of the response are converging faster than most organisations realise.

What Is the Botsitter Economy, and Why Has It Suddenly Entered the Workplace Conversation?

The botsitter economy describes a workplace where the labour of supervising AI agents has become a substantial, ongoing job function rather than an occasional cost of using new software. It entered the conversation in mid-2026 because Glean’s Work AI Index quantified it for the first time at scale, measuring 6.4 hours per week across 6,000 surveyed workers. That number made visible a category of labour that had been hiding inside “AI adoption.”

Before this survey, workers and managers had a diffuse sense that AI required more hand-holding than expected, but no single study had attached hours, percentages, and terminology to the experience. The Work AI Index 2026 changed that. Paul Leonardi, Duca Family professor of technology management at UC Santa Barbara and one of the co-authors, said it plainly: “Most people don’t realize the amount of time that they’re spending working on the tools to get the time savings that they’re professing.” Rebecca Hinds, head of Glean’s Work AI Institute, called it “a vicious cycle that feeds itself” and emphasised the “massive, massive human labor that’s at the core of this.” The LA Times coverage and CIO.com’s analysis both unpack what this survey revealed.

The 6.4-hour figure represents 37% of workers’ total AI time. It is not a rounding error or an onboarding friction that will disappear as tools improve. The 36% AI session failure rate means oversight is mechanically required by current AI reliability, not optional. More than a third of AI sessions fail outright, requiring a full restart or substantial rework. And the 69% botshitting rate means the alternative to botsitting is not smooth autonomous operation. It is unverified output entering organisational workflows at scale.

The botsitter economy has three dimensions that any organisation adopting AI at scale must understand. The empirical reality: what the data says about how much oversight AI actually demands. The organisational paradox: why individual productivity gains fail to aggregate into organisational performance. And the multi-pronged response: how psychology, regulation, and technical architecture are converging to make AI oversight manageable. Each dimension gets its own cluster article. This pillar page gives you the overview so you know which questions matter and where to go for the answers.

Go deeper: Botsitting and Botshitting: The Empirical Picture from the Work AI Index 2026 breaks down the data in full, including how top AI achievers’ habits differ from low achievers’.

What Exactly Are Botsitting and Botshitting — and How Do They Fit Together?

Botsitting is the labour of keeping AI agents usable: feeding them context about your organisation, team, and goals; correcting their mistakes; cleaning up their output; and switching between the disconnected tools different agents require. Botshitting is its negative endpoint: shipping AI-generated work you have not verified, do not fully understand, or cannot defend if questioned. They are paired concepts because botshitting is what happens when the cognitive burden of botsitting exceeds a worker’s capacity or motivation to maintain it.

Botsitting encompasses three core activities identified in the Work AI Index 2026. Context-feeding is the most exhausting of the three. Workers must manually provide AI agents with organisational context they lack, repeatedly, across multiple disconnected tools. Error correction means identifying and fixing AI mistakes before outputs reach colleagues or customers. Output cleanup involves reworking AI-generated drafts into usable final form. Unlike traditional software configuration, which happens once at setup, botsitting is continuous. Every new task, every new context switch, every agent update can require fresh oversight labour.

The report itself describes it as a “thick, mostly invisible layer of human labor holding the whole thing together.” Rebecca Hinds pointed out something most organisations are not measuring: “It’s exhausting for workers to not only do this, but to have the work be unrecognized, often unrewarded and unacknowledged within the organization.”

Botshitting is the behaviour that emerges when botsitting labour is abandoned. Workers ship AI output they have not verified because the cognitive cost of verification exceeds their available attention or motivation. The 69% figure from the Work AI Index 2026 makes this a majority behaviour, not an edge case. Andrew Pope, a technology strategist, put it this way: “The smarter the tool, the sloppier we become. The apparent capability, helpfulness and humanness of our AI tools make us trust them more than we really should, becoming less critical of their outputs.”

The distinction between the two concepts matters. Botsitting is oversight labour. Botshitting is the failure mode that occurs when oversight is abandoned. They define two different conversations: one about reducing botsitting overhead (making oversight more efficient), and one about preventing botshitting (making oversight culturally and procedurally non-optional).

The accountability dimension creates a structural problem at organisational scale. The 40% of workers who blame AI for failures, versus only 29% who admit fault, reveals an accountability vacuum at the heart of the botsitter economy. Heavy AI users are 3.4 times more likely than light users to blame the tool when something goes wrong. When the majority of workers both ship unverified AI output and attribute failures to the AI rather than to their own supervision choices, organisations lose the ability to trace decisions, assess quality, or assign responsibility. Botsitting and botshitting are connected because the psychological pathway from one to the other is predictable: a work environment that demands AI usage, provides tools whose output looks plausible even when wrong, and offers no clear accountability framework for AI-mediated work, will produce exactly the behaviour the Work AI Index documents — a dynamic the three-pronged response article explores in depth.

Go deeper: Botsitting and Botshitting: The Empirical Picture from the Work AI Index 2026 has the full data breakdown, including how top AI achievers’ habits differ from low achievers’.

How Much of Your AI Time Goes to Botsitting, and What’s Left as Net Gain?

The Work AI Index 2026 found that workers save roughly 11 hours per week through AI use, but spend 6.4 of those hours on botsitting. The net gain is approximately 4.6 hours per week. That reframes AI from a productivity miracle to a narrow net improvement: real, solid, worth having, but far smaller than the headline “11 hours saved” suggests.

The arithmetic is straightforward: roughly 11 hours saved, minus roughly 6.4 hours of botsitting overhead, leaves about 4.6 hours of net productivity gain per week. The 6.4 hours represent 37% of total AI time. For every hour a worker spends getting useful output from AI, they spend roughly another hour making it usable. ITBrief’s coverage of the survey data put this net calculation front and centre.

Now here is where it gets uncomfortable for anyone who has signed off on an enterprise AI budget. Organisations budget for AI tooling licences, model API costs, and implementation. The human time cost of botsitting is rarely a budgeted line item. When you add the cost of 6.4 hours per worker per week to the direct AI expenditure, the total cost of AI adoption looks materially different from the tooling cost alone. Across a 1,000-person organisation, botsitting represents roughly 6,400 hours of labour per week that is not captured in any AI ROI calculation — which is exactly the kind of oversight cost the productivity paradox analysis argues organisations must start budgeting for.

Deloitte’s 2026 State of AI in the Enterprise report provides corroborating evidence that agentic AI adoption is outpacing governance and cost modelling. Worker access to AI rose by 50% in 2025, yet only one in five companies has a mature governance model. The botsitting tax is being paid in full. It is just not being measured.

Not all botsitting is equal. The Work AI Index 2026 reveals that top AI achievers spend less time on botsitting per productive AI session than low achievers. This gap compounds over weeks and months. The segmentation suggests that botsitting overhead is influenced by skill (knowing how to prompt and verify efficiently), tooling (access to context-rich environments versus fragmented tools), and organisational practices (whether AI outputs flow through structured review or ad-hoc checking).

There is also an equity concern here. If botsitting burden falls disproportionately on less-experienced or less-resourced workers, it becomes a compounding disadvantage. The workers who can least afford the overhead spend the most time on it. Top AI achievers, about 13% of the sample, apply active judgement and embrace what the researchers call “productive botsitting” rather than simply prompting and praying. Context-rich organisations see 64% less digital exhaustion and 31% less botshitting. The rest are paying the full tax.

Go deeper: Botsitting and Botshitting: The Empirical Picture from the Work AI Index 2026 covers the 36% session failure rate, the exhaustion multiplier, and the full segmentation data.

Why Don’t Individual AI Productivity Gains Translate into Better Organisational Performance?

The AI productivity paradox is the gap between individual experience and organisational reality. 87% of workers use AI regularly. 73% report personal productivity gains. Yet only 13% say their organisation is performing significantly better. The primary explanation is coordination neglect: AI helps individuals complete tasks faster, but the integration, review, and reconciliation of those outputs across teams creates a new coordination burden that organisations rarely measure or account for.

The personal productivity gains are genuine. The failure is in organisational systems that cannot translate individual acceleration into collective throughput. Coordination neglect, a concept from organisational research, describes exactly this pattern: investing in tools that accelerate individual work without investing in the coordination mechanisms (handoffs, reviews, approvals, alignment) that integrate that accelerated work into organisational output.

Rebecca Hinds gave a concrete example that captures the absurdity. One worker converts a single bullet point into a five-page report with AI. They send it to a colleague, who uses AI to convert it back into a single bullet point. Each person looks productive. The organisation spins on an “AI slop” hamster wheel. When every individual uses AI to produce more, faster, while the organisational chokepoints (review, approval, alignment, decision-making) receive more volume without more capacity, this pattern becomes the norm.

The Faros AI analysis of more than 10,000 developers across 1,255 teams provides a concrete software-engineering example. Individual developers complete 21% more tasks. But review times increase by 91%, teams merge 98% more PRs, and DORA metrics show no correlation with AI adoption at the company level. Individual speed gains pile up at team boundaries that have not accelerated.

The ILO’s “Aggregation Paradox of AI” research brief confirms this pattern across the economy. AI delivers large productivity gains at the task level (10% to 70% improvement). At the firm level, evidence is mixed. At the macroeconomic level, no clear AI-driven productivity growth has appeared in official statistics. Berkeley researchers have noted that the Solow Paradox (“You can see the computer age everywhere but in the productivity statistics”) remains at least partly relevant in the age of AI.

Why does this matter practically? Because it is a resource-allocation problem. Organisations investing heavily in AI tooling without investing in the coordination infrastructure that integrates AI outputs are spending on acceleration without building the road network. The 6.4-hour botsitting tax from the empirical data on AI oversight is the individual-level cost. At organisational scale, it compounds into systemic drag. Executives who measure only individual AI adoption and output volume are seeing a carefully framed picture that omits the coordination costs consuming the gains.

Go deeper: The AI Productivity Paradox: Why Individual Gains Do Not Scale to Organisational Performance covers measurement frameworks, budgeting heuristics, and the coordination-neglect research in full.

How Should You Measure Whether AI Is Improving Organisational Outcomes Versus Just Individual Throughput?

The distinction is between output-volume metrics and outcome-quality metrics. Output-volume metrics (tokens generated, tasks completed, time saved per individual) tell you whether AI is accelerating activity. Outcome-quality metrics (revenue per employee, decision accuracy, time-to-market, error rates in shipped work, botsitting hours trended over time) tell you whether that acceleration is translating into organisational improvement. The single most diagnostic question is: does AI-generated work survive peer review without more rework than human-generated work? If the answer is no, your AI is generating volume, not value.

Most organisations track AI adoption rates and individual productivity self-reports. Few have established organisational-performance metrics that isolate AI’s contribution. This creates a dangerous information asymmetry. The data that looks good (adoption, usage, tokens, self-reported time savings) is abundant and easy to collect. The data that reveals whether AI is actually working (decision quality, error rates, coordination costs, rework volume) is sparse and requires deliberate instrumentation.

The result is that organisations can be perfectly satisfied with their AI programme right up until they examine the organisational outcomes. Only 31% of AI spend can be attributed to specific business outcomes. The rest is being spent on acceleration that may or may not be improving anything.

Token maxxing is the failure mode that flourishes in this measurement vacuum. It describes the pattern of optimising AI output volume (tokens generated, tasks completed, drafts produced) without regard for whether the output improves organisational outcomes. One engineer at a major tech company, quoted in the Work AI Index, described deliberately inflating token numbers by asking AI questions already answered in documentation, prototyping features with no intention of building them, and defaulting to using agents even when manual work would be faster.

When organisations reward AI output volume without assessing output quality or organisational impact, they incentivise token maxxing. Workers learn that more AI output equals better performance reviews. Both reinforce each other while avoiding the harder question: is any of this making us better? This is Goodhart’s Law in action: when a measure becomes a target, it ceases to be a good measure. Breaking this loop requires shifting measurement from “how much AI output are we producing?” to “are AI-assisted decisions measurably better than pre-AI decisions?” — the full measurement framework analysis walks through the practical heuristics for making that shift.

Berkeley’s executive education researchers recommend a multi-dimensional measurement framework that covers efficiency, quality, capability, strategy, and human metrics like employee satisfaction and learning velocity. The point is not to adopt any one framework but to measure anything beyond output volume at all. Organisations that can answer three questions (are botsitting hours trending up or down? does AI-generated work need more or less rework at peer review? can you trace a measurable outcome improvement to a specific AI-assisted workflow?) are measuring AI’s organisational impact. Those that cannot are measuring AI activity.

Go deeper: The AI Productivity Paradox: Why Individual Gains Do Not Scale to Organisational Performance walks through measurement frameworks and the token-maxxing analysis in full.

How Does Managing AI Agents Differ from Managing Traditional Enterprise Software?

Traditional enterprise software is deterministic, versioned, auditable, and behaves identically across instances. It follows what infrastructure engineers call the “cattle” model (uniform, replaceable). AI agents are probabilistic, stateful, context-dependent, and behaviourally variable. They follow the “pets” model (individual, requiring bespoke attention). This distinction has concrete operational implications for how you test, audit, and budget for the systems your organisation depends on.

The Pets versus Cattle distinction is not a metaphor. It is an operating model. Standard software: you deploy it, monitor it for uptime, patch it on a schedule, and replace broken instances. AI agents: you deploy them, but then you must continuously feed them context, monitor their decision quality (not just their uptime), intervene when they drift from expected behaviour, and maintain their state across sessions. A conventional API either works or it does not. An AI agent can produce an output that is plausible, syntactically correct, and subtly wrong in a way that downstream systems may not catch immediately.

Palo Alto Networks has been making the case that businesses must stop treating AI agents like standard software. Through their Portkey acquisition in April 2026, they are establishing what they call an AI Gateway as a control plane for autonomous agents. Lee Klarich, their Chief Product and Technology Officer, put it directly: “As autonomous agents join the enterprise workforce, they also become a new, unmanaged attack surface.” Agents act as highly privileged insiders, executing a large volume of automated decisions across internal and external systems.

The operational implications are real and specific. Reliability: a standard software failure is typically binary (it works or it does not). An AI agent failure is often partial and ambiguous (the output looks plausible but is wrong). 70% of enterprise leaders name “non-deterministic outputs” as the number one production-readiness barrier for AI agents.

Cost: standard software costs are predictable (licences, hosting, maintenance). AI agent costs are variable (model calls, retries, tool usage, context size) and dependent on usage patterns that are hard to forecast. Governance: standard software governance is about access control and change management. AI agent governance must also address decision quality, output auditability, and liability for agent actions. Agents without named owners have 2.7 times lower production-conversion rates.

The vendor response to this distinction is coalescing around what you might call the managed-AI-operations model, covered in detail a few sections ahead. The short version: platforms and architectures that treat AI agents as a fleet requiring continuous stewardship rather than a system requiring occasional maintenance. Anthropic’s Managed Agents, UiPath’s AgentOps, Microsoft’s Agent Framework, and Galileo’s Agent Control are all building pieces of this new operational discipline — an architectural shift the response framework article covers in depth.

Go deeper: Read on below for the managed-AI-operations model, or jump to The AI Productivity Paradox for the full Pets vs. Cattle analysis.

What Psychological Toll Does Constant AI Supervision Take — and How Does It Lead to Unverified AI Output?

The psychological pathway from botsitting to botshitting runs through cognitive offloading, digital exhaustion, and satisficing. Cognitive offloading (delegating not just task execution but thinking itself to AI) creates the condition for moral disengagement, where workers distance themselves from responsibility for AI-generated outputs. Digital exhaustion from continuous context-switching between agent supervision and actual work produces satisficing behaviour: settling for “good enough” AI output rather than investing the cognitive effort to verify it.

Moral disengagement, in the AI workplace context, is the process by which workers rationalise shipping unverified AI output by displacing responsibility onto the tool. “The AI did it” becomes a psychologically available excuse because the worker did not personally produce the output. The 69% botshitting rate and the 40% versus 29% blame-deflection split are the empirical evidence that this mechanism is operating at scale.

This is a predictable psychological response to a work environment that demands AI usage, provides tools whose output looks plausible even when wrong, and offers no clear accountability framework for AI-mediated work. Research published in the NIH has found that long-term AI use is significantly associated with mental exhaustion, attention strain, and information overload, and inversely associated with decision-making self-confidence. Context-feeding and error correction are identified as the highest-exhaustion botsitting activities in the Work AI Index 2026.

When workers spend hours switching between AI tools, feeding each one organisational context it lacks, and correcting outputs that look correct but are substantively wrong, the cognitive load accumulates differently than it does from equivalent hours of focused work. This exhaustion creates the condition for satisficing, the behavioural-economics concept where decision-makers accept “good enough” rather than optimal because the cognitive cost of optimisation exceeds the perceived benefit. When a worker has spent three hours correcting AI errors, the motivation to spend another hour verifying the fourth output collapses.

Leaders face a genuine dilemma. Detecting botshitting requires some form of output review, but aggressive detection can drive AI use underground. Shadow AI is pervasive: over 80% of workers use unapproved AI tools. 54% of high AI achievers already use unapproved tools. 36% hide how much AI helps them. Banning AI does not work: nearly half of employees continue using personal AI accounts after a ban.

The AI teammate model offers a cultural framework that makes verification feel like professional practice rather than surveillance. It frames AI as a collaborator whose output requires the same collegial scrutiny as human work. AI detection tools like Pangram Labs provide a technical layer, but their deployment can undermine the psychological safety needed for workers to volunteer uncertainty about AI outputs. The goal is creating conditions where workers say “I’m not sure the AI got this right” rather than conditions where they hide their AI use entirely.

Go deeper: Psychology, Regulation, and Architecture: A Three-Pronged Response to the Botsitting Crisis covers the full moral-disengagement analysis and the detection-versus-psychological-safety tension.

How Should You Decide Which Work to Automate and Which to Keep Stubbornly Human?

Once you understand the psychological toll botsitting takes, the next question is practical: what work should you even be handing to AI in the first place? The decision is not purely about whether AI can do a task. It is about whether automating the task preserves the organisational value that the human performance of it produces. Some tasks create value beyond their output: they build craft skill, transmit organisational knowledge, create worker engagement, and maintain institutional memory. Automating a customer-service interaction might be efficient, but if it erodes the relationship knowledge and diagnostic skill that senior representatives develop through those interactions, your organisation loses capability it cannot easily regenerate.

The Meaning versus Automation Trade-off recognises that efficiency is only one dimension of work value. 41% of AI startups in Y Combinator’s portfolio are automating tasks people would prefer to keep human, according to research cited in the Work AI Index 2026. Tasks that build diagnostic intuition, maintain client relationships, develop junior staff, or preserve organisational memory may be technically automatable but strategically worth keeping human.

The customer-service-representative example illustrates the trade-off with uncomfortable clarity. Automating routine queries may free representatives for complex cases. But if routine queries are how representatives learn to handle complex cases, automation can erode the skill pipeline. Klarna made headlines replacing its customer service team with AI, only to hire them back. Many companies discover chatbots handle simple queries fine but completely fail on anything requiring judgment or empathy. Customer service rep jobs are declining just 4%, beating the broader job-loss benchmark, despite companies trying to automate customer service with AI. The work proves more resilient than the automation narrative suggests.

Rebecca Hinds framed the retention dimension bluntly: “Workers are being asked to automate the parts of their jobs they find most meaningful, while keeping the parts they find least engaging.” Workers who spend significant time managing AI rather than doing work they value are more likely to leave. This creates a compounding risk: automate the meaningful parts of work to save time, lose the workers who valued those parts, then find that the remaining workforce is both smaller and less engaged.

The framework for thinking about this is task mapping: evaluating tasks not just by automation feasibility but by the full spectrum of value they produce (output quality, skill development, worker engagement, organisational knowledge, relationship maintenance). Stanford Professor Emeritus Bob Sutton calls the reflex to solve problems by piling more on top instead of subtracting what is already there “addition sickness.” The automation conversation often suffers from exactly this reflex, adding AI to every workflow without asking whether the workflow should exist.

As Hinds put it: “We don’t just assume that because the technology can do something, because it can automate something, it should be automated. So much of work is meant to be messy. It’s meant to be full of friction because that friction builds ownership, good judgment, purpose, and pride.”

Go deeper: Psychology, Regulation, and Architecture: A Three-Pronged Response to the Botsitting Crisis has the full task-mapping framework, the customer-service case study, and the Bob Sutton organisational-design perspective.

What Does the EU AI Act Actually Require for AI Oversight — and When Do the Rules Bite?

The EU AI Act’s Article 14 mandates that high-risk AI systems must be designed to enable effective human oversight. Designated human operators must be able to understand, monitor, and intervene in the system’s outputs. The rules covering high-risk AI systems take effect on 2 August 2026. For organisations deploying AI in any of the eight high-risk categories (employment, essential services, law enforcement, critical infrastructure, education, biometric identification, administration of justice, and migration), botsitting transforms from an operational headache into a legal obligation.

The Act, formally Regulation 2024/1689, entered into force on 1 August 2024 with a phased implementation. Prohibited AI practices have been banned since February 2025. The high-risk system obligations take effect 2 August 2026. Full compliance for remaining requirements lands August 2027. This is not a distant regulatory horizon. It is a compliance deadline your organisation should be preparing for now.

The eight high-risk Annex III categories cover a broad swathe of enterprise AI use. Employment and worker management is the one most relevant to the botsitter economy conversation, covering AI systems used in recruitment, promotion, task allocation, and performance evaluation. If your organisation uses AI to make or inform employment decisions, Article 14 applies. The human oversight mandate requires oversight “by individuals with appropriate competence, training, authority, and support” with meaningful ability to intervene and override outputs.

What compliance involves: a conformity assessment for high-risk systems, a Fundamental Rights Impact Assessment (FRIA) for certain deployers, and ongoing real-world testing and monitoring. The CEN/CENELEC standards bodies are developing the technical standards that will operationalise these requirements. The European AI Office provides implementation guidance, and the EU AI Board coordinates national supervisory authorities.

The Act has extraterritorial scope. It applies to any organisation placing AI systems on the EU market or whose AI output is used within the EU, regardless of where the company is headquartered. Penalties are substantial: prohibited-practice violations carry up to EUR35 million or 7% of global annual turnover. Non-compliance with high-risk obligations carries EUR15 million or 3% of turnover.

This matters for the botsitter economy because the Act formalises and mandates a function that the Work AI Index 2026 data shows is already being performed haphazardly. The August 2026 effective date creates a compliance deadline that will force organisations to move from ad-hoc botsitting (the 6.4 hours of unstructured oversight the survey documents) to structured human oversight (the documented, auditable, role-defined oversight the Act requires). The Act gives the botsitter role legal teeth and a compliance budget, transforming it from an invisible cost centre to a compliance line item.

Anu Bradford, Professor of Law at Columbia University and author of The Brussels Effect, captured the shift: “AI regulation is no longer a theoretical debate, it is an operational reality. Companies that treat the EU AI Act as a compliance exercise rather than a governance opportunity will find themselves perpetually catching up.”

The readiness gap is significant. 60% of Fortune 100 companies will appoint AI governance heads in 2026, per Forrester predictions. AI-native companies and legacy organisations face different governance challenges. AI-native companies built their operations around the assumption of AI’s probabilistic, stateful nature. Legacy organisations are retrofitting governance frameworks designed for deterministic software onto systems that do not behave deterministically. Some organisations will clear the August 2026 deadline comfortably while others scramble.

Go deeper: Psychology, Regulation, and Architecture: A Three-Pronged Response to the Botsitting Crisis covers the eight high-risk categories, the conformity assessment process, and the FRIA requirements in full.

What Is an Enterprise Graph, and Why Might It Be the Single Most Effective Countermeasure to Botsitting?

An enterprise graph is a connected model of your organisation’s people, documents, projects, goals, tools, and permissions. It is the connective tissue that allows AI agents to understand context without you manually feeding it to them. The Work AI Index 2026 identifies context-feeding as the highest-exhaustion botsitting activity, and the data shows that context-rich environments see 64% less digital exhaustion and 31% less botshitting.

Context-feeding is the most exhausting part of botsitting because it is structurally unnecessary. The information exists in your organisation’s systems (documents, project boards, communication tools, permission structures). But it is fragmented across tools, inaccessible to AI agents, and maintained only in human workflows. Workers spend hours feeding the same organisational information into multiple disconnected AI tools. 53% of workers said critical information needed for their jobs was not accessible through their AI systems.

The enterprise graph solves this by creating a unified, machine-readable model of organisational context that agents can query directly. Glean, the organisation behind the Work AI Index, positions its enterprise graph as the primary technical solution to the botsitting problem it documented. The concept is straightforward: connect your organisation’s data, people, and tools into a graph that AI agents can traverse to understand who is working on what, what documents are relevant, and what decisions have been made. Eliminate the manual context-feeding that consumes the largest share of botsitting hours.

The numbers make the case. Context-rich environments see 64% less digital exhaustion, 52% less unexplained shipped work, 9% less time botsitting, and 31% less time botshitting compared to context-poor environments. Context anxiety, where AI agents wrap up tasks prematurely as they sense their context limit approaching, is a failure mode specific to context-poor environments. Agents with access to an enterprise graph can retrieve context on demand rather than having it pre-loaded into a finite context window.

The vendor implementation landscape is developing quickly. Anthropic’s Managed Agents architecture (June 2026) includes a “Harness” layer that manages context and state for Claude-powered agents, with scheduled deployments added in June 2026. The architecture decouples the agent’s reasoning engine (brain), tool-execution environment (hands), and session state (event log and context) so each can be managed independently. This reduces the coordination complexity that makes context-feeding so exhausting.

Agent-to-agent handoff protocols and agent sandboxing address related concerns about agent sprawl and uncontrolled agent proliferation. The Model Context Protocol has crossed 9,400 public servers, and multi-agent orchestration has reached 22% of production deployments. The infrastructure for context-aware agent management is being built. The question for most organisations is not whether to adopt it, but when and how.

Go deeper: Psychology, Regulation, and Architecture: A Three-Pronged Response to the Botsitting Crisis covers the enterprise graph architecture, the context-richness diagnostic framework, and the Claude Managed Agents implementation.

What Does the Managed-AI-Operations Model Look Like in Practice?

The managed-AI-operations model treats AI agents as a fleet requiring continuous stewardship rather than a system requiring occasional maintenance. It has three layers: a context layer (enterprise graph or equivalent that gives agents organisational awareness), a control layer (centralised policy enforcement that governs what agents can do, when they escalate to humans, and how their decisions are audited), and an operations layer (AgentOps, the discipline of monitoring agent performance, detecting drift, managing costs, and maintaining evaluation coverage across the agent fleet).

The context layer eliminates manual context-feeding by giving agents governed access to organisational data. Enterprise graphs and the Model Context Protocol are the primary implementations. The control layer converts unstructured human oversight into structured, auditable intervention. Galileo’s Agent Control enables centralised, hot-reloadable human-in-the-loop policies with confidence-based escalation. UiPath’s governance layers provide fleet-wide policy enforcement.

The operations layer, AgentOps, extends principles from DevOps and MLOps to agentic systems. It contends with non-deterministic behaviour, autonomous tool use, and context-dependent reasoning that conventional monitoring cannot address. Drift detection, cost management, evaluation coverage, and auditability are the core functions. Together, these three layers transform botsitting from a diffuse individual burden into a structured organisational capability.

The vendor response in mid-2026 signals that this model is coalescing rapidly. Anthropic’s Claude Managed Agents (June 2026) added scheduled deployments and the Harness control architecture, decoupling the agent’s reasoning from its execution environment. The system lets you describe an agent in a chat interface, and it writes the prompt, picks the model, wires up MCP tools, and creates a cloud environment. UiPath’s Agent Builder and Maestro provide AgentOps capabilities: agent creation, evaluation, governance, cost visibility, and fleet-wide orchestration. Microsoft’s Agent Framework provides multi-agent orchestration patterns (sequential, concurrent, group chat, handoff).

These products address different layers of the same managed-operations stack. The 12% of AI agent pilots that reach production share consistent traits that align with this model: 94% have a named agent owner with budget authority, 87% run automated evaluations on every change, 81% scope to a single workflow with binary success criteria, and 74% deploy with explicit human-in-the-loop checkpoints for the first 60 to 90 days.

Only 12% of agent pilots reach production. 88% fail. The top blockers: evaluation and observability (64%), governance and compliance (57%), model reliability and non-determinism (51%), and data quality and access (49%). The managed-operations model addresses each of these. It makes human oversight efficient, auditable, and scalable rather than eliminating it.

The August 2026 EU AI Act deadline and the June 2026 vendor releases are converging on the same operational model from different directions. Organisations that adopt the managed-operations model are building the infrastructure that converts the 6.4 hours of unstructured botsitting into structured, role-defined oversight labour. The same oversight the EU AI Act will require for high-risk systems. 56% of enterprises now name a dedicated AI agent owner, up from 11% in 2024, and the role of AI Agent Owner is projected to reach more than 80% penetration by 2027.

Go deeper: Psychology, Regulation, and Architecture: A Three-Pronged Response to the Botsitting Crisis covers the enterprise graph architecture, the Claude Managed Agents implementation, and the context-richness diagnostic framework in full.

The Full Picture: Why Botsitting Defines the Next Phase of AI at Work

The botsitter economy will persist as models improve because AI oversight is a structural feature of knowledge work, not a temporary adoption friction. It is the emerging shape of work in an AI-augmented organisation, a workplace where the labour of supervising AI agents is as material to organisational performance as the labour the agents perform. AI does not eliminate the need for human judgement in knowledge work. It relocates it from production to supervision. Understanding the dimensions of that shift is the first step toward managing it productively.

The three dimensions this pillar has mapped describe a single structural shift. The empirical reality — 6.4 hours, 36% failure rate, 69% botshitting — documented at scale in the Work AI Index 2026 establishes that AI oversight is a material category of labour, not an edge case. The organisational paradox — individual gains that fail to aggregate — explains why that labour matters at scale. Individual gains consumed by coordination overhead are not a measurement error. They are a structural feature of how AI-augmented work flows through organisations that have not adapted their coordination infrastructure.

The three-pronged response — addressing the psychology of disengagement, the regulatory mandate for oversight, and the technical architecture that makes oversight feasible — explored in full in the response framework — is three dimensions of the same question: how do you make AI supervision sustainable rather than exhausting, auditable rather than invisible, and productive rather than parasitic on the gains AI creates?

The EU AI Act’s 2 August 2026 effective date for high-risk AI system rules transforms the botsitter economy from an operational observation into a compliance reality. Organisations deploying AI in any of the eight high-risk categories will need documented human oversight. The Work AI Index 2026 data shows that most organisations are currently providing unstructured, unmeasured, and unsustainable oversight. The regulatory deadline creates urgency that did not exist when botsitting was purely an efficiency concern. It also creates a budget line: compliance-driven oversight is easier to resource than efficiency-driven oversight, even when the activity is operationally identical.

The vendor signal confirms the direction of travel. Anthropic’s June 2026 Claude Managed Agents release and the broader vendor response from UiPath, Galileo, and Microsoft are market signals that the managed-AI-operations model is coalescing. Palo Alto Networks’ thesis that AI requires a different operational model from standard software — the central insight of the productivity paradox analysis — is being validated by product releases. The 12% pilot-to-production conversion rate and the projected 80% plus AI Agent Owner penetration by 2027 suggest the organisational response is following the vendor response.

The botsitter economy is already here, and the organisations that recognise it as a structural shift rather than a temporary friction will be the ones that build the oversight infrastructure to sustain AI adoption at scale. The Work AI Index 2026 put it best: “Companies pulling ahead aren’t spending a greater share of their AI time using AI. They’re spending a greater share on the work around it: setting context, defining what ‘good’ looks like, building judgment, and deciding what should never have been handed to a model.”

Where to go next:

If you are new to the topic, start with the empirical picture from the Work AI Index 2026. The 6.4 hours, the 36% failure rate, and the 69% botshitting rate are the factual foundation everything else builds on.

If you are evaluating organisational AI strategy, the productivity paradox analysis provides measurement frameworks and budgeting heuristics for understanding why individual gains are not adding up the way your dashboards suggest.

If you are responsible for AI governance or implementation, the three-pronged response covers the psychology, regulation, and architecture dimensions you will need to address.

Resource Hub: The Botsitter Economy Deep Dives

The Empirical Picture

Botsitting and Botshitting: The Empirical Picture from the Work AI Index 2026

The full data breakdown from Glean’s survey of 6,000 digital workers. The 6.4-hour botsitting tax, the 36% AI session failure rate, the 69% botshitting admission rate, and what the net productivity calculation looks like after you subtract the oversight overhead. Includes segmentation data showing how top AI achievers’ habits differ from low achievers’. Estimated read: 8 minutes.

The Organisational Question

The AI Productivity Paradox: Why Individual Gains Do Not Scale to Organisational Performance

Why 87% of workers use AI and 73% report personal gains, yet only 13% see organisational improvement. Covers coordination neglect, token maxxing, the Pets vs. Cattle infrastructure distinction, and practical measurement frameworks for evaluating whether AI is improving organisational outcomes. Estimated read: 7 minutes.

The Response Framework

Psychology, Regulation, and Architecture: A Three-Pronged Response to the Botsitting Crisis

How moral disengagement explains the psychological pathway from botsitting to botshitting, how the EU AI Act (effective August 2026) transforms oversight into a legal mandate, and how enterprise graphs and managed-agent architectures provide the technical infrastructure to make oversight sustainable at scale. Estimated read: 8 minutes.

Suggested reading order: Start with Article 1 for the empirical foundation, continue to Article 2 for the organisational implications, and finish with Article 3 for the integrated response.

Frequently Asked Questions

Where can I find the Work AI Index 2026 report by Glean?

The Work AI Index 2026 was published by the Work AI Institute, commissioned by Glean. The survey covered 6,000 digital workers across the US, UK, and Australia between December 2025 and January 2026. It is the source for the 6.4-hour botsitting statistic, the 36% AI session failure rate, the 69% botshitting admission rate, and the segmentation data comparing high AI achievers to low achievers. The full empirical picture article provides the key findings with source attribution.

What percentage of enterprise AI agent pilots actually reach production?

Only 12% of AI agent pilots successfully reach production (88% fail). The 12% that succeed share consistent traits: a named agent owner (2.7 times higher conversion rates), scoped success criteria, automated evaluation coverage, and explicit human-in-the-loop checkpoints. Gartner, Forrester, and McKinsey have each published data supporting variations of this finding across different industry segments.

What is “coordination neglect” and how does it relate to the AI productivity paradox?

Coordination neglect is the pattern where organisations invest in tools that accelerate individual work (AI, automation) without investing in the coordination mechanisms (handoffs, reviews, approvals, alignment) that integrate that accelerated work into organisational output. Individual speed gains pile up at organisational chokepoints that have not been scaled, creating bottlenecks that consume the productivity gains. The productivity paradox article explores this in full.

How does “token maxxing” distort how organisations evaluate AI success?

Token maxxing describes the pattern of optimising for AI output volume (tokens generated, tasks completed, drafts produced) without regard for whether that output improves organisational outcomes. When organisations measure AI success by output volume rather than outcome quality, they inadvertently incentivise workers to produce more AI-generated content rather than better AI-assisted decisions. The measurement section of the productivity paradox article provides the framework for distinguishing output metrics from outcome metrics.

What is “context anxiety” and how does it contribute to botsitting overhead?

Context anxiety is the behaviour where large language models wrap up tasks prematurely as they sense their context window limit approaching. They rush to finish before running out of working memory. This creates unreliable outputs that require human intervention to complete or correct, adding directly to botsitting overhead. Context anxiety is most prevalent in context-poor environments where agents lack access to an enterprise graph and must operate within a finite context window. The enterprise graph section of Article 3 covers this in detail.

What is the “Pets vs. Cattle” distinction and why does it matter for AI operations?

Pets vs. Cattle is an infrastructure paradigm that distinguishes between systems you treat as unique, stateful, and requiring individual attention (pets) and systems you treat as uniform, replaceable, and managed at scale (cattle). Traditional enterprise software follows the cattle model. AI agents follow the pets model because they are probabilistic, context-dependent, and behaviourally variable. You cannot test an AI agent once and assume consistent behaviour, you cannot budget for AI operations using standard software cost models, and you need governance frameworks that account for decision quality rather than just uptime. The productivity paradox article covers this in its analysis of how managing AI differs from managing standard software.

Where can I find official EU AI Act text and compliance guidance?

The official EU AI Act text (Regulation 2024/1689) is published in the Official Journal of the European Union and accessible through EUR-Lex. The European AI Office provides implementation guidance, and the CEN/CENELEC standards bodies are developing the harmonised technical standards that will operationalise the Act’s requirements. The EU AI Board coordinates national supervisory authorities. The EU AI Act section of Article 3 provides an overview of the human oversight mandates and the August 2026 compliance timeline.

How do context-rich and context-poor AI environments compare in practice?

The Work AI Index 2026 data shows that context-rich environments (where AI agents can self-contextualise through an enterprise graph or equivalent infrastructure) see 64% less digital exhaustion and 31% less botshitting compared to context-poor environments. Context-rich environments also reduce context anxiety, lower the repetition burden of feeding the same organisational information to multiple tools, and enable more reliable agent outputs because agents operate with current, connected organisational data rather than stale or fragmented context. The enterprise graph section of Article 3 provides the full comparison.