Business

SaaS

Technology

•

Apr 17, 2026

Spec-Driven Development Is Replacing Vibe Coding as the Professional Standard for AI Teams

Ramp‘s internal coding agent now handles roughly 30% of all merged pull requests at the company. Not 30% of grunt work — 30% of production PRs. That number didn’t arrive through vibe coding.

When agents run unattended for hours at a time, you can’t avoid the workflow question: how do you reliably direct something that isn’t waiting for your next message? Vibe coding — Andrej Karpathy‘s term for fast, exploratory, AI-heavy programming — works well for prototyping. It falls apart the moment your agent is Level 3 or higher and executing multi-step tasks without you looking over its shoulder.

Spec-driven development is the professional answer. Write a structured, unambiguous specification before the agent starts. Define what “done” looks like before the work begins. This article is part of our complete guide to autonomous AI coding agents, which covers the full landscape of how teams are deploying and governing these tools at scale.

What is spec-driven development, and why is it becoming the standard workflow for AI-agent-managed teams?

Spec-driven development means writing a structured, unambiguous specification before engaging an AI coding agent — not after the fact, not alongside the work, and not as a conversational prompt.

The specification is the primary engineering artefact. It defines scope, acceptance criteria, constraints, and a verifiable “done” condition the agent can check without coming back to you. That’s categorically different from prompt engineering. A prompt tells an agent what to do next in a single interaction. A spec establishes what a completed outcome looks like and gives the agent the authority to reach it autonomously — across multiple steps and decisions, without your involvement.

As Addy Osmani (Head of Chrome Developer Experience at Google) put it: “I have more recently been focusing on the idea of spec-driven development, having a very clear plan of what it is that I want to build.”

Level 3+ task agents — Claude Code, GitHub Copilot Coding Agent, Cursor background agents — execute multi-step engineering tasks across multiple files and interact with CI/CD pipelines. At that level, vague input doesn’t produce gentle corrections. It produces compounding errors. Augment Code and leading engineering voices describe spec-driven development as the emerging professional standard, adopted alongside “context engineering” as a paired discipline. It connects directly to the complete guide to autonomous AI coding agents covering the full landscape.

Vibe coding vs. spec-driven development: when is each approach appropriate?

Vibe coding is a legitimate practice. Andrej Karpathy coined the term in early 2025 — his phrase was “fully giving in to the vibes, embracing exponentials, and forgetting that the code even exists.” It’s for prototyping, proof-of-concept work, personal projects — anywhere the cost of not fully understanding every generated line is low.

The issue is ownership. As Simon Willison puts it: “If an LLM wrote every line of your code, but you’ve reviewed, tested, and understood it all, that’s not vibe coding — that’s using an LLM as a typing assistant.” The difference is whether you can own the output.

At team scale, with Level 3+ agents executing multi-step tasks overnight, vibe coding’s exploratory posture creates compounding ambiguity. Agents interpret loose instructions differently across iterations. Review burden accumulates. Code rot sets in. The choice is a professional judgement call — prototyping versus production, individual experimentation versus team output, one-off scripts versus overnight autonomous loops.

Why do Level 3 and Level 4 agents require structured specifications to work reliably?

The five-level autonomy taxonomy developed by Swarmia defines agent levels by one variable: how much of the work does the agent do before returning to you for feedback? Understanding the autonomy levels that spec-driven development unlocks is the foundation for understanding why the methodology exists.

At Level 1 and Level 2, the human is in the loop continuously. Loose input gets corrected through dialogue. At Level 3 (Task Agent) and Level 4 (Autonomous Teammate), that loop is gone. Ambiguity doesn’t get corrected — it compounds. An agent writing code across ten files, running tests, and submitting a PR has made hundreds of micro-decisions before you see the result. Without a precise “done” condition, those decisions drift from intent.

The organisational consequence is measurable. GitClear analysed 211 million lines of code changes from 2020 to 2024 and found duplicate code blocks went from 8.3% to 12.3% of changed lines, while refactoring declined from 25% to under 10%. Volume goes up, quality diverges, and the codebase accumulates in ways no one fully owns.

The Continuous Coding Loop: what is the pattern behind safe overnight agent workflows?

The Continuous Coding Loop — also called the Ralph Wiggum Technique — is a six-step iterative pattern for running AI coding agents unattended across extended sessions. Geoffrey Huntley popularised it, Ryan Carson built a production implementation, and Addy Osmani documented it formally in January 2026.

The six steps: pick the next task from a defined backlog, implement — the agent writes or modifies code for that feature or fix, validate against the defined acceptance criteria, commit the result, update task status and log learnings, reset agent context — then repeat from step 1.

The context reset is the architectural key. Without it, context from earlier iterations accumulates and degrades the agent’s reasoning about the current task. As Osmani puts it: “By resetting its memory each iteration, the agent stays focused. This ‘stateless but iterative’ design is key to reliability.” Each loop begins with a clean context from the spec and current codebase state — not from what happened three tasks ago.

The “pick task” step only works if tasks are specified precisely enough for the agent to execute without clarification. Osmani again: “Define a SPEC and convert it to tasks — start by writing a clear specification for the feature, possibly with the help of an AI to flesh out edge cases.” Each task fits in one AI session with unambiguous pass/fail criteria. You’ll also want the sandbox infrastructure for overnight agent loops that the Continuous Coding Loop requires to operate safely.

How does spec-driven development reduce code review burden at source?

The review bottleneck is the operational problem that shows up most directly when you push high-autonomy coding agents hard. Faros AI telemetry from 10,000+ developers shows teams with high AI adoption merge 98% more PRs — but PR review time increases 91%. Each PR is 154% larger. Volume is up. Review capacity is down.

The root cause isn’t volume alone — it’s opacity. PRs generated from vague instructions require reviewers to reverse-engineer the agent’s intent before evaluating the implementation. Spec-driven development shifts the reviewer’s job from investigative to confirmatory. When a PR is anchored to a precise specification, the question is simply: did the agent meet the acceptance criteria? The spec doubles as the review checklist.

Research on 567 Claude Code PRs across 157 open-source projects found “too large” was among the top three rejection reasons, and nearly 40% of agentic PRs combined multiple tasks. Self-contained, well-specified tasks produce reviewable PRs. The review bottleneck problem that spec-driven development addresses at source becomes tractable once you have the input discipline in place.

Spec-driven development and context engineering: two sides of the same discipline

Spec-driven development handles the human side: what structured specification does the developer provide before the agent begins? Context engineering handles the agent side: what persistent configuration, standards, and architectural context does the agent read before any task?

Both are required at Level 3+. Teams that invest in context engineering — CLAUDE.md files, AGENTS.md files — without spec-driven workflows still produce ambiguous task instructions. Teams that write good specs without context files give agents well-defined tasks but no consistent behavioural grounding. Research found that AGENTS.md files alone produced a 29% reduction in median runtime and 17% reduction in output token consumption. Context files tell the agent how to behave consistently. Specs tell it what to accomplish specifically. Together they define the full input surface.

The METR randomised controlled trial is the clearest evidence for why both matter. Sixteen professional developers using Cursor Pro with Claude 3.5 Sonnet completed tasks 19% slower with AI assistance — while estimating they were 20% faster. The most plausible explanation is the absence of both disciplines. Context engineering as the complementary governance discipline is the other half of what makes autonomous agents work reliably at scale.

The Engineering Playbook: the strategic output of a spec-driven team

The Engineering Playbook is the formalised, versioned repository of a team’s architectural decisions, coding conventions, and engineering standards. It’s what a spec-driven culture produces and maintains over time — and what distinguishes it from a traditional wiki is governance.

The Playbook is treated as a first-class engineering artefact: versioned like code, reviewed like code, actively maintained against context drift. Packmind frames it directly: “In a world where agents write most of the implementation, the engineering playbook becomes the primary artifact of a team’s expertise.” It answers a practical CTO question: how do you ensure agents make decisions consistent with how your team wants to build software? A rule in one CLAUDE.md file won’t help the agent in a different IDE or the team onboarding next quarter. The Playbook scales the standard.

The compounding return is what makes it worthwhile. Each verified spec contributes a reusable pattern. Each context update contributes a governed standard. Packmind clients report a 40% increase in tech lead productivity and twice-as-fast developer onboarding. Spotify‘s background coding agent produced 1,500+ merged AI-generated pull requests — that result doesn’t happen without a well-governed engineering context behind it.

The playbook is already being written in every team using AI coding tools. The question is whether it’s being written intentionally — or assembled by accident, one undocumented correction at a time. For the full picture of AI coding agent workflows, the patterns here are part of a larger operating model covered in our complete guide to autonomous AI coding agents as engineering teammates.

Frequently Asked Questions

What is the difference between spec-driven development and writing a detailed prompt?

A prompt instructs an agent on what to do next in a single interaction. A spec defines what a completed outcome looks like and provides enough context for a Level 3+ agent to reach that outcome across multiple steps without returning for clarification. The key difference is the “done” condition: specs are verifiable; prompts are instructional.

Is vibe coding the same as not using AI properly?

No. Vibe coding is a legitimate approach coined by Andrej Karpathy for fast, exploratory, AI-heavy programming. It’s appropriate for prototyping, learning, and personal projects. The professional concern is applying vibe coding’s posture to production team workflows where agents run unattended and reviewers can’t reconstruct what the agent was trying to accomplish.

What makes a good specification for an AI coding agent?

An effective spec defines the specific task scope (no broader than one agent session), the acceptance criteria (verifiable outcomes the agent can check), the constraints (what the agent must not change), and the interface with adjacent code (what the agent can assume about the surrounding codebase). It doesn’t need to be long — it needs to be unambiguous.

What is the context reset step in the Continuous Coding Loop and why does it matter?

The context reset is step 6 in the six-step Continuous Coding Loop: after committing and updating task status, the agent’s context is cleared before the next iteration begins. Without this reset, context from earlier iterations accumulates and distorts the agent’s reasoning about the current task. The reset ensures each loop iteration begins from a clean state.

How do spec-driven development and context engineering work together?

Spec-driven development handles the human side of agent input: structured task specifications with verifiable outcomes. Context engineering handles the agent side: persistent configuration files (CLAUDE.md, AGENTS.md) that tell the agent how to behave consistently across all tasks. Both are required at Level 3+; neither alone is sufficient.

Can spec-driven development work with any AI coding tool?

Yes — spec-driven development is a workflow methodology, not a tool requirement. It works with Claude Code, GitHub Copilot Coding Agent, Cursor background agents, and any other Level 3+ tool. The spec is a human-authored document; the agent reads it as part of its task input regardless of which platform executes the task.

Who attributed the Continuous Coding Loop pattern and where does it come from?

The Continuous Coding Loop (also called the Ralph Wiggum Technique) was popularised by Geoffrey Huntley. Ryan Carson built a production implementation. Addy Osmani, Google Chrome engineering leader, documented the pattern and its application to self-improving coding agent workflows in January 2026.

How does spec-driven development reduce the code review bottleneck?

When a PR is generated from a precise specification, reviewers can evaluate whether the agent met the spec’s acceptance criteria — a confirmatory task. Without a spec, reviewers must reverse-engineer the agent’s intent from the code alone — an investigative task that’s significantly slower and more error-prone, especially as PR volume from agents increases.

What is the Engineering Playbook and how does it relate to context engineering?

The Engineering Playbook is the formalised, versioned repository of a team’s architectural decisions, coding conventions, and engineering standards. It’s the strategic artefact that context engineering produces and maintains over time. Each verified spec and each context file update contributes content to the Playbook.

At what team size does spec-driven development become necessary?

Spec-driven development becomes necessary when Level 3+ agents run multi-step tasks without continuous human feedback — regardless of team size. For a solo developer running overnight agent loops, it’s as necessary as for a 500-person engineering organisation. Team size determines the governance overhead around specs, not the underlying need for them.

What happens if an agent runs overnight without a well-defined spec?

Without a verifiable “done” condition, the agent fills specification gaps with plausible but potentially unmaintainable choices across hundreds of micro-decisions. The result is code that passes tests but accumulates in ways no one fully owns. The morning review becomes a forensic exercise rather than a confirmation.

How do I transition a team from vibe coding to spec-driven development?

The transition is a governance change, not a tooling change. Start by establishing a spec template for the task types your agents run most frequently. Introduce the Continuous Coding Loop pattern for overnight workflows. Build context files (CLAUDE.md/AGENTS.md) that reflect your architectural standards. Over time, formalise verified specs into an Engineering Playbook. Start with the highest-autonomy, highest-risk agent tasks — not all workflows at once.