Business

SaaS

Technology

•

Feb 17, 2026

Vibe Coding and the Death of Craftsmanship – The Complete Guide for Engineering Leaders

Q: Is vibe coding always bad for production systems?

Not necessarily. The distinction is about context-appropriate tool usage. Vibe coding works well for throwaway prototypes, learning experiments, and personal projects where long-term maintainability doesn't matter. Production systems require AI-assisted engineering with code review, testing, and full understanding—what Kent Beck calls 'augmented coding.' The risk is vibe coding practices migrating from prototypes to production without appropriate quality gates. Simon Willison is direct about it: 'Vibe coding your way to a production codebase is clearly risky.'

Q: How do I know if AI coding tools are actually helping my team?

Measure business outcomes using DORA metrics: deployment frequency, lead time for changes, change failure rate, and mean time to restore service. If AI tools are helping, you'll see faster deployment frequency with stable or improving change failure rates. If you see increased PR volume but longer cycle times, growing technical debt indicators, or more security vulnerabilities, the productivity gains are illusory.

Q: Should I mandate or ban AI coding tools for my team?

Neither extreme works well. Mandates create resistance and don't teach strategic usage. Bans prevent learning and alienate developers who've already adopted tools independently. Instead, establish clear policies: when to use AI (cognitive toil, boilerplate, greenfield), when to avoid it (complex architecture, security-sensitive code, poorly documented legacy), and what quality gates apply.

Q: How serious are the security vulnerabilities in AI-generated code?

The data is concerning for production systems. Research shows 2.74-10x increases in security vulnerabilities, with 322% more privilege escalation paths and 10x more security findings overall. AI tools excel at syntax but struggle with architectural security decisions—authentication, authorisation, secrets management. For regulated industries or enterprise customers requiring SOC2/ISO certification, AI-generated code without enhanced security review creates compliance risks.

Q: Will AI-dependent juniors become competent senior engineers?

Only if training evolves appropriately. Anthropic's research found that junior developers using AI assistance scored significantly lower on comprehension assessments—but the interaction pattern mattered enormously. Developers who used AI for generation and then asked for explanations scored nearly as well as hand-coders. AI doesn't inherently prevent learning; it shifts what must be taught. Train juniors to validate AI suggestions, understand generated code, and maintain architectural judgment.

Q: Why are senior developers more resistant to AI tools than juniors?

Senior resistance reflects legitimate expertise, not technological conservatism. Experienced engineers have maintained codebases where shortcuts became nightmares, debugged subtle bugs in code they didn't write, and accumulated technical debt that slowed delivery for years. They recognise patterns in AI-generated code that create long-term problems juniors haven't experienced yet. Additionally, seniors bear the review burden of AI-accelerated development.

Q: Can I use AI coding tools while maintaining software craftsmanship?

Yes, through what Kent Beck calls 'augmented coding'—using AI as a skilled assistant while maintaining code quality standards, testing discipline, and full understanding. The key is treating AI like a prolific but junior developer whose output requires review, not an oracle producing production-ready code. Beck used AI extensively on a complex project while maintaining strict TDD discipline, careful review of intermediate results, and readiness to intervene.

Q: What's the difference between vendor research and independent research on AI productivity?

Vendor research (Microsoft, GitHub, DX) typically shows 20-26% productivity gains using observational studies measuring task completion and developer satisfaction. Independent research (METR, GitClear, Apiiro) uses randomised controlled trials or longitudinal data measuring business outcomes and code quality, often showing slowdowns or no improvement despite increased activity. The gap reflects methodology: controlled experiments vs observations, objective metrics vs self-reporting, business outcomes vs activity metrics.

Q: How do I redesign code review to handle the increased volume of pull requests?

Code review capacity becomes the bottleneck when AI accelerates code generation. The approaches that work include automated quality gates that reduce the manual review surface, AI code review checklists highlighting common AI-generated issues, junior upskilling to handle routine reviews, strategic batching of related changes, and acceptance criteria defined before code generation rather than after. The goal isn't reviewing everything AI produces with equal rigour—it's risk-appropriate review calibrated to code complexity and how sensitive the area of the codebase is.

Q: What should my board know about AI coding tools and productivity?

Three key messages. First, developers are adopting AI tools enthusiastically and report feeling faster. Second, independent research shows organisations aren't delivering faster—individual velocity gains don't translate to business outcomes due to review bottlenecks, technical debt, and code quality issues. Third, selective adoption with quality gates can capture gains while mitigating risks, but requires investment in measurement, training, and process redesign. Avoid binary 'AI will 10x productivity' or 'AI destroys quality' narratives—a nuanced, evidence-based position is both more accurate and more credible.

Something strange is happening in software development. Your developers are excited about AI coding tools. They feel faster, more productive, more creative. They’re generating code at speeds that would have seemed impossible two years ago. And yet your delivery metrics haven’t moved.

You’re not imagining this. In a randomised controlled trial by METR, experienced developers predicted AI tools would make them 24% faster. They actually took 19% longer. And even after experiencing that slowdown, they still believed they’d been sped up by 20%. That’s a 43-percentage-point gap between perception and reality, and it’s playing out in engineering organisations worldwide.

Meanwhile, the term “vibe coding”—coined by OpenAI co-founder Andrej Karpathy in February 2025 to describe accepting AI-generated code without reviewing it—became Collins English Dictionary’s Word of the Year. A quarter of Y Combinator‘s Winter 2025 batch had codebases that were 95% AI-generated. And industry-wide adoption of AI coding assistants has hit 91% across surveyed companies.

This guide synthesises evidence from seven independent studies, explores arguments both for and against AI coding adoption, addresses the organisational challenges you’re dealing with right now—from junior skill development to senior scepticism to security risks—and provides actionable frameworks for responsible adoption. It’s built as a hub linking to eight detailed articles, each exploring a dimension of this challenge in depth.

What this guide covers

Start wherever makes sense for you. If you’re trying to understand the terminology, begin with definitions. If you’re ready to act, jump to the framework.

What Is Vibe Coding and How Does It Differ from AI-Assisted Engineering?

Vibe coding—coined by Andrej Karpathy in February 2025—describes accepting AI-generated code without reviewing its internal structure, essentially “giving in” to code you don’t fully understand. It contrasts sharply with AI-assisted engineering, where developers use AI tools while maintaining code review, testing, and complete understanding. The distinction matters because vibe coding produces throwaway prototypes effectively but creates mounting technical debt and security risks in production systems. Collins English Dictionary named it Word of the Year for 2025.

The distinction from professional practice matters. There’s a spectrum here, and where your team sits on it has real consequences for code quality, security, and long-term maintainability.

The spectrum from vibes to engineering

At one end, vibe coding treats AI output as final. You describe behaviour, accept the code, and move on. If something breaks, you paste the error message back into the AI and hope for the best. Karpathy described his own code “growing beyond my usual comprehension.” For prototypes and personal experiments, this works fine. For anything you need to maintain, it’s a problem.

In the middle sits what Kent Beck—creator of Test-Driven Development and signatory of the Agile Manifesto—calls “augmented coding.” Beck spent four weeks building a complex B+ tree implementation using AI tools while maintaining strict discipline: TDD cycles, careful review of intermediate results, small frequent commits, and readiness to intervene when the AI went off track. As Beck puts it: “In augmented coding you care about the code, its complexity, the tests, & their coverage.”

At the professional end, AI-assisted engineering uses AI tools while maintaining the same standards you’d apply to human-written code: review, testing, documentation, and full understanding. Simon Willison, co-creator of Django, draws a clear line: “If an LLM wrote every line of your code, but you’ve reviewed, tested, and understood it all, that’s not vibe coding in my book—that’s using an LLM as a typing assistant.”

The distinction matters because the problems showing up in research—the productivity paradox, security vulnerabilities, technical debt accumulation—stem largely from vibe coding practices migrating into production environments where they don’t belong.

For the full terminology breakdown, framework comparisons, and guidance on where each approach applies, see What Is Vibe Coding and How Does It Differ from AI-Assisted Engineering.

Why Do Developers Feel Fast But Deliver Slow? Understanding the Productivity Paradox

The productivity paradox reveals a 43-percentage-point gap between perception and reality: developers using AI tools predict 24% faster completion but measure 19% slower in controlled studies. This paradox occurs because AI accelerates code generation while creating bottlenecks elsewhere—review queues grow, pull requests become larger, and code churn doubles. Individual velocity increases, but organisational throughput stalls because development is a system, not just typing speed. METR’s randomised controlled trial—the gold standard for establishing causality—confirmed this invisible slowdown across 16 experienced developers and 246 real issues.

This paradox happens because AI accelerates the part of development that was never the bottleneck.

Why typing speed doesn’t equal delivery speed

Think of it through the lens of Amdahl’s Law, a concept familiar from systems design. If code generation is 20% of total development time and you make it 10x faster, your maximum system speedup is about 22%. But the other 80%—understanding requirements, designing architecture, reviewing code, writing tests, debugging, deploying—remains unchanged.

What actually happens when AI accelerates code generation is that downstream bottlenecks get worse. GitClear‘s analysis of 211 million lines of code found that refactoring—the maintenance work that keeps codebases healthy—dropped from 25% of changed lines in 2021 to under 10% by 2024. Code duplication increased approximately 4x. Code churn—prematurely merged code that gets rewritten shortly after merging—nearly doubled. Copy-pasted code exceeded moved code for the first time in two decades.

Faros AI‘s study of over 10,000 developers across 1,255 teams confirmed the pattern at organisational scale: developers completed 21% more tasks, but companies saw no measurable improvement in delivery velocity or business outcomes. Pull request volume increased dramatically while review times grew longer because the reviews themselves didn’t get any easier—there was just more to review.

The DORA 2025 Report captures this dynamic precisely: “AI’s primary role is as an amplifier, magnifying an organisation’s existing strengths and weaknesses.” If your processes are strong, AI can help. If they’re not, AI makes existing problems worse, faster.

For the complete evidence synthesis including all seven studies, the mechanisms behind the paradox, and how to explain it to your board, see The AI Productivity Paradox – Why Developers Feel Fast But Deliver Slow.

What Does the Research Actually Show? Evidence from 7 Major Studies

Seven independent studies reveal consistent patterns: individual developers generate code faster but organisations don’t deliver faster. METR’s RCT showed a 19% slowdown despite a perceived 24% speedup, GitClear documented 4x more code duplication, CodeRabbit found 1.7x more issues per pull request, Apiiro measured 10x more security vulnerabilities, and Faros AI confirmed that 21% more tasks completed delivered zero business outcome improvement. Vendor studies from Microsoft and Accenture report gains, but methodology differences matter: observational studies versus randomised trials, task completion versus business outcomes tell very different stories.

How to read the research landscape

The table below summarises the major studies so you can compare methodologies and findings at a glance.

Independent research vs vendor studies

Vendor research tells a consistently positive story. Microsoft reported a 26% task completion increase. Stack Overflow surveys show 72% favourable Copilot ratings and 84-90% adoption rates. DX‘s Q4 2025 report found developers report saving 3.6 hours per week.

The difference comes down to what gets measured and how. Vendor studies typically use observational methods and measure subjective satisfaction or task completion. Independent studies use randomised controlled trials or longitudinal data and measure business outcomes and code quality. Self-reported time savings don’t match clock time. Task completion doesn’t equal delivery velocity. And “feeling productive” is not the same as delivering value.

For your decision-making, weight independent research more heavily—especially studies using randomised controlled trials, which are the gold standard for establishing whether something actually causes an effect rather than merely correlating with one.

For the detailed methodology comparisons, study-by-study findings, and a framework for evaluating research claims, explore both The AI Productivity Paradox and The Case Against Vibe Coding.

The Case Against Vibe Coding: Why Understanding Code Still Matters

Critics argue vibe coding sacrifices long-term sustainability for short-term velocity. Evidence shows AI-generated code increases technical debt 30-41% (refactoring collapse, 4x duplication), security vulnerabilities 2.74-10x (privilege escalation, architectural flaws), and cognitive complexity 39%. Kent Beck, ThoughtWorks, and experienced engineers warn that code you don’t understand becomes unmaintainable, creates debugging nightmares (“the 70% Problem”), and erodes the craftsmanship practices that enable sustainable pace. For smaller teams without dedicated maintenance capacity, the impact is disproportionate.

Technical debt is accelerating

GitClear’s data shows what happens to codebases as AI adoption grows. Refactoring—the practice of improving code structure without changing behaviour—has been in decline since AI coding tools gained traction. This isn’t developers choosing to skip refactoring. It’s a structural shift: AI generates new code readily but doesn’t initiate the maintenance work that keeps codebases healthy. Meanwhile, code duplication has risen sharply, violating the DRY (Don’t Repeat Yourself) principle that’s fundamental to maintainability.

CodeRabbit’s analysis quantified the quality gap directly: AI co-authored code showed 1.7x more issues overall and 75% more logic errors—incorrect dependencies, flawed control flow, misconfigurations. These aren’t syntax problems. They’re the kind of bugs that pass automated tests but break in production under conditions the AI didn’t anticipate. The long-term costs of vibe coding go well beyond the obvious errors.

The 70% Problem

Experienced developers describe a recurring pattern with AI-generated code: it gets you roughly 70% of the way there and looks correct, but contains subtle issues that take longer to debug than writing the code from scratch would have. Authentication that works for the happy path but fails on edge cases. Data handling that works at small scale but creates race conditions under load. API integrations that pass basic tests but fail silently when services return unexpected responses.

Kent Beck, despite being enthusiastic about augmented coding, acknowledges this tension: “I feel good about the correctness and performance, not so good about the code quality. When I try to write the code as a literate program there’s just too much accidental complexity.”

If your team is already stretched thin, the implications are worth taking seriously. Larger enterprises can absorb a significant increase in technical debt across dedicated maintenance teams. You probably can’t. When your codebase becomes harder to maintain, it’s the same people who built it who have to clean it up, and they’re already busy building the next thing.

For the full risk assessment including quantified technical debt costs, the 70% Problem in detail, and Kent Beck’s and ThoughtWorks’ perspectives, see The Case Against Vibe Coding – Understanding, Craftsmanship, and Long-Term Costs.

The Case For AI Coding Tools: When They Actually Help

Proponents argue AI democratises software creation, eliminates cognitive toil (boilerplate, repetitive patterns), and enables non-programmers to build functional applications. Evidence shows gains exist in specific contexts: greenfield projects with no legacy constraints, simple repetitive patterns, prototyping, and learning. Stack Overflow reports 72% favourable Copilot ratings with 84-90% adoption. The key is strategic selective adoption—using AI for cognitive toil while avoiding complex architecture and security-critical code where it consistently underperforms.

Democratisation and cognitive toil

The strongest argument for AI coding tools is democratisation. Simon Willison puts it well: “I believe everyone deserves the ability to automate tedious tasks in their lives with computers. You shouldn’t need a computer science degree or programming bootcamp.” Kevin Roose, a New York Times journalist and non-programmer, used vibe coding to build several small-scale applications he described as “software for one”—personalised tools that would never have existed without AI lowering the barrier.

For experienced developers, the benefit is different. Kent Beck, working on his B+ tree implementation, described the experience as addictive: “I make more consequential programming decisions per hour, fewer boring vanilla decisions. Yak shaving mostly goes away.” He used AI to write a C extension for Python performance, run coverage testing, and propose tests—tasks that would have been daunting without AI assistance. The key is that Beck already understood what he was building. AI handled cognitive toil—boilerplate, repetitive patterns, test scaffolding—while he focused on architecture and design.

Where AI genuinely helps

The evidence points to specific contexts where AI tools deliver real gains:

Greenfield projects with no legacy constraints and well-defined requirements
Boilerplate and repetitive patterns that experienced developers already understand
Prototyping and exploration where code quality doesn’t need to survive beyond the experiment
Test generation and coverage expansion when the developer understands what’s being tested
Onboarding acceleration—DX’s research found AI reduced developer ramp-up time from 91 days to 49 days at six multinational enterprises

The problems arise when AI use extends from these appropriate contexts into complex architecture, security-sensitive code, or poorly documented legacy systems. Stack Overflow’s data illustrates this tension: 72% of developers rate Copilot favourably, but 66% report “almost right but not quite” frustration—code that looks correct but needs careful review and modification.

The 25% of Y Combinator’s Winter 2025 startups with 95% AI-generated codebases represent a real signal. But startups building greenfield MVPs face very different constraints than established companies maintaining production systems with paying customers. Understanding when AI coding tools actually help is the critical skill for any engineering leader navigating adoption decisions.

For the full balanced evaluation including selective adoption criteria and when AI genuinely helps vs hinders, see The Case For AI Coding Tools – Democratisation, Velocity, and When They Actually Help.

Who Trains the Next Generation? Junior Developers in the Age of AI

If junior developers use AI for tasks that traditionally built expertise—debugging, reading documentation, understanding error messages—how do they develop judgment and deep knowledge? This succession planning concern threatens your senior engineering pipeline: today’s juniors become tomorrow’s seniors, but AI-dependent skill development may create a generation unable to debug AI-generated code or make architectural decisions. The apprenticeship model requires evolution, not abandonment, and the training choices you make now determine your engineering capability in 2030.

The apprenticeship model under pressure

Software engineering has always relied on a form of apprenticeship. Juniors learn by doing hard things badly, getting feedback, and gradually developing intuition. They read code, debug failures, and build mental models of how systems work. AI tools shortcut this process. When a junior developer pastes an error message into an AI assistant instead of reading the stack trace, they get a fix faster but miss the learning that comes from understanding why the error occurred.

Anthropic‘s own research quantified this effect. In a randomised controlled trial, junior developers using AI assistance scored significantly lower on comprehension assessments—50% compared to 67% for those coding by hand, roughly a two-letter-grade difference. The largest gap appeared in debugging questions, precisely the skill you need most when reviewing AI-generated code.

The interaction pattern matters enormously. Anthropic found that developers who delegated completely to AI scored worst, while those who used AI for code generation but then asked for explanations—building comprehension after the fact—scored nearly as well as hand-coders. The tool isn’t the problem. How it’s used is.

Succession planning, not just training

Think of this as a succession planning challenge. Your senior engineers in 2030 are the juniors you’re hiring and training now. Kent Beck’s framework is useful here: AI deprecates some skills (manual boilerplate coding, memorising syntax) but amplifies others (architectural vision, strategic thinking, taste). Training needs to evolve so that juniors learn to validate AI suggestions, understand generated code deeply, and maintain the architectural judgment that no AI currently provides.

Pluralsight‘s research warns that “AI-induced skills decay isn’t visible—it appears through what falls through cracks” and becomes costly only after problems emerge in production. For smaller teams, you can’t afford to lose a generation of skill development. Your training programmes need to be deliberate about which skills AI is allowed to shortcut and which it absolutely must not. The question of who trains the next generation of engineers is one that requires strategic intent, not improvisation.

For the complete analysis including training frameworks, skill development strategies, and how to structure junior development in an AI-augmented environment, see Junior Developers in the Age of AI – Who Trains the Next Generation of Engineers.

How Serious Are Security Risks with AI-Generated Code?

Security vulnerabilities increase 2.74-10x with AI-generated code, with 45% failure rates on secure coding benchmarks. Specific risks include 322% more privilege escalation paths, 40% higher secrets exposure, and 153% more architectural design flaws. For organisations in regulated industries (healthcare, finance) or those with enterprise customers requiring SOC2, ISO, GDPR, or HIPAA compliance, AI-generated code without enhanced security review creates specific risks—audit trails, data residency, and liability questions that remain largely unresolved. Quality gates and security review processes require redesign for AI-generated code.

The architectural blind spot

What makes AI security risks worth attention is where they occur. Apiiro found that AI assistants reduced trivial syntax errors by 76%—the easy stuff that linters catch anyway—but created more of the architectural flaws that are genuinely harmful. Their blunt assessment: “AI is fixing the typos but creating the timebombs.”

The reason is structural. AI tools generate code based on patterns without understanding the security context of the broader system. They don’t know that a particular service handles authentication, that credentials shouldn’t be propagated across microservices, or that a database query needs parameterisation to prevent injection. In one documented case, a single AI-driven pull request changed an authorisation header across multiple services, but one downstream service wasn’t updated, creating a silent authentication failure.

Real incidents illustrate the stakes. Lovable, a Swedish vibe coding platform, had 170 out of 1,645 apps with security vulnerabilities that allowed personal information access by anyone. Replit Agent deleted a user’s database despite explicit instructions not to make changes. AI-assisted developers in Apiiro’s data exposed Azure Service Principals and Storage Access Keys nearly twice as often as those coding manually.

Compliance implications

If you operate in regulated industries or sell to enterprise customers requiring SOC2, ISO, GDPR, or HIPAA compliance, AI-generated code creates specific risks. Audit trails become harder to maintain when code is generated rather than written. Data residency concerns arise when proprietary code passes through external AI services. Liability questions remain largely unresolved.

Apiiro’s recommendation is straightforward: “If you’re mandating AI coding, you must mandate AI AppSec in parallel. Otherwise, you’re scaling risk at the same pace you’re scaling productivity.” Understanding the full scope of AI-generated code security risks is essential before expanding your team’s AI usage in production systems.

For the comprehensive security risk assessment, compliance framework analysis, specific vulnerability patterns, and a quality gates playbook, see AI-Generated Code Security Risks – Why Vulnerabilities Increase 2.74x and How to Prevent Them.

Why Are Senior Engineers More Sceptical About AI Coding Tools?

Senior engineers adopt AI tools at lower rates than juniors and express greater scepticism—not because they resist innovation, but because they’ve debugged enough production systems to recognise long-term sustainability risks. They see code quality degradation, understand maintenance burdens of technical debt, and worry about code review bottlenecks created by AI-accelerated development. Their scepticism is legitimate expertise, not Luddite resistance, and your AI adoption strategy requires their buy-in to succeed.

Experience breeds healthy caution

Senior engineers have spent years maintaining codebases where “clever” shortcuts became maintenance nightmares. They’ve debugged subtle bugs in code they didn’t write. They’ve accumulated technical debt that slowed delivery for months or years. When they look at AI-generated code and see excessive duplication, architectural shortcuts, and context-free implementations, they recognise patterns that create long-term problems—patterns juniors haven’t experienced yet.

They also face the direct consequences of AI-accelerated development. Apiiro’s data shows AI-assisted developers producing 3-4x more commits packaged into fewer but significantly larger pull requests. Bigger PRs slow review, dilute reviewer attention, and raise the odds that a subtle break slips through. The people doing those reviews? Predominantly senior engineers. They’re absorbing the cost of AI-accelerated development without the satisfaction of rapid code generation that juniors enjoy.

Fast Company reported in September 2025 on “the vibe coding hangover,” with senior software engineers describing “development hell” when working with AI-generated codebases. This isn’t abstract concern—it’s lived experience from people who have to make the code work in production.

Building consensus instead of mandates

The political dimension matters. Brian Armstrong at Coinbase mandated AI coding assistants for all engineers, reportedly firing those who refused. But he also admitted “we’re still figuring out” how to manage AI-coded codebases. Stripe‘s John Collison captured the tension precisely: “It’s clear that it is very helpful to have AI helping you write code. It’s not clear how you run an AI-coded codebase.”

Mandates create resistance and don’t teach strategic usage. Your organisation needs both the enthusiasm juniors bring and the quality instincts seniors provide. Building consensus requires validating senior concerns as legitimate expertise, not dismissing them as resistance to change. The patterns of how senior engineers are adapting—or not—reveal important signals about what makes AI adoption succeed or fail.

For adoption pattern analysis, team dynamics strategies, and a practical guide to building consensus across experience levels, see How Senior Engineers Are Adapting to AI Coding Tools or Resisting Them.

What Tools Exist and How Do They Differ?

AI coding tools span three categories with different risk profiles: autocomplete tools (GitHub Copilot‘s inline suggestions), chat-based assistants (ChatGPT, Claude for code generation), and agentic systems (Cursor Composer, Replit Agent making autonomous multi-file changes). Autocomplete carries lowest risk but limited impact; agents offer highest capability but greatest vulnerability potential. Tool selection depends on your team’s maturity, codebase complexity, and risk tolerance—though usage policy ultimately matters more than which specific tool you choose.

The tool spectrum

Autocomplete tools like GitHub Copilot’s inline suggestions offer the lowest risk and lowest impact. They suggest code completions as you type, similar to a context-aware spell checker for code. Most developers start here, and the vast majority of AI usage remains at this level—Faros AI found that even among active users, most rely only on autocomplete features.

Chat-based assistants—ChatGPT, Claude, Copilot Chat—allow conversational code generation. You describe what you need, get code back, and decide what to use. The quality depends entirely on your ability to evaluate the output. Simon Willison notes that Claude Artifacts provides a sandbox that “prevents accidents from causing harm elsewhere,” while Cursor, initially intended for professional developers, has fewer safety rails.

Agentic systems like Cursor’s Agent mode, Copilot’s coding agent, and Replit Agent make autonomous multi-file changes, run terminal commands, and create pull requests from issue descriptions. These offer the highest capability but the greatest vulnerability potential—a single agentic change can propagate errors across an entire codebase. GitHub Copilot’s coding agent created over a million pull requests between May and September 2025.

Tool choice matters less than usage policy

Here’s the thing: vibe coding is possible with any of these tools. A developer can accept Copilot suggestions without review just as easily as they can paste ChatGPT output without understanding it. Conversely, a disciplined engineer can use agentic tools responsibly by reviewing every change. Your usage policy—defining when to use AI, when to avoid it, and what quality gates apply—matters far more than which specific tool your team adopts.

When evaluating tools, look beyond popularity. Consider security posture (where does your code go?), data residency (does proprietary code leave your environment?), integration with existing workflows, and cost at your team’s scale. Enterprise tools typically offer better audit trails and compliance features, but they also carry higher licensing costs that need justifying against measurable outcomes.

For more on how tool selection fits into a broader adoption strategy, including decision frameworks and quality gate specifications, see A Framework for Responsible AI-Assisted Development – When to Use AI and When to Avoid It.

Whose Advice Should You Trust? Mapping Expert Perspectives

Expert perspectives span a spectrum: Andrej Karpathy coined “vibe coding” describing throwaway projects, Simon Willison argues AI-assisted programming requires code understanding, Kent Beck demonstrates “augmented coding” preserving craftsmanship, while Brian Armstrong (Coinbase CEO) mandates adoption despite admitting “we’re still figuring out” management. Trust research organisations (METR, DORA, GitClear) over vendor claims, and experienced engineers over executive enthusiasm when making evidence-based adoption decisions.

Researchers and practitioners

The most reliable voices are those with rigorous methodology and no product to sell. METR, a non-profit AI evaluation organisation, conducted the gold-standard randomised controlled trial referenced throughout this guide. DORA, Google Cloud’s research programme, published the 2025 report introducing an AI Capabilities Model that identifies the organisational factors determining whether AI scales. GitClear analysed 211 million lines of code longitudinally, tracking quality changes over five years.

Among practitioners, Kent Beck brings decades of software engineering credibility. His augmented coding framework demonstrates that AI and craftsmanship coexist when you maintain discipline. Simon Willison—over 80 vibe coding experiments published—provides pragmatic boundary definitions. He won’t commit code he couldn’t explain to someone else.

Executives and vendors

Executive enthusiasm tends to run ahead of evidence. Armstrong mandated AI adoption at Coinbase. Lemonade CEO Daniel Schreiber told employees “AI is mandatory.” Citi bank rolled out agentic AI to 40,000 developers. These are signals about market direction, not evidence about effectiveness.

Andrew Ng argued that the term “vibe coding” itself misleads people into assuming software engineers just “go with the vibes” when using AI tools—a fair point about terminology, even if it doesn’t change the underlying practices. Gary Marcus, a cognitive scientist, cautioned that AI-generated code reproduces existing patterns without the originality that comes from deep understanding. Jeremy Howard of Fast.ai warned about AI complacency, urging teams to “build to last” rather than optimise for speed alone.

Vendor research from Microsoft, GitHub, and Accenture consistently shows productivity gains—but using observational methods and self-reported metrics rather than the controlled experiments that independent researchers employ. Both types of evidence are useful. Neither type tells the complete story on its own.

The weight of evidence, methodology considered, supports selective adoption rather than either wholesale embrace or blanket resistance.

For more on how to evaluate competing claims and identify which research should inform your decisions, see The AI Productivity Paradox – Why Developers Feel Fast But Deliver Slow.

Your Action Plan: Frameworks for Responsible AI-Assisted Development

Responsible AI adoption requires four elements: decision frameworks (when to use AI vs. avoid based on task complexity, codebase maturity, security criticality), measurement approaches (DORA metrics, cycle time, not vanity metrics), quality gates (automated security scanning, review triggers, acceptance criteria upfront), and process redesigns (handling increased PR volume, training approaches, consensus-building). The goal isn’t banning or mandating AI tools—it’s strategic selective adoption that captures individual gains at organisational scale.

Decision frameworks

Not every coding task benefits equally from AI assistance. The evidence points to a matrix: task complexity and codebase maturity determine where AI helps and where it hurts.

AI works well for greenfield projects with well-defined requirements, boilerplate and repetitive patterns, test generation and coverage expansion, documentation drafts, and exploration of unfamiliar frameworks. AI struggles with complex architecture and distributed systems, security-sensitive code, poorly documented legacy systems, performance-critical sections, and system integration challenges.

Your policy should be specific enough that a developer facing a particular task can determine whether AI assistance is appropriate without asking their manager. The responsible AI-assisted development framework provides decision matrices and policy templates to make this practical.

Measurement that matters

Measure business outcomes, not activity. DORA’s four key metrics—deployment frequency, lead time for changes, change failure rate, and mean time to restore service—tell you whether your organisation is actually delivering faster. If AI tools help, you’ll see faster deployment frequency with stable or improving change failure rates.

Watch out for vanity metrics. Lines of code, PR count, and commit volume all increase with AI tools. None of them correlate with business value. If your PR volume increases but your cycle time stays flat or grows, the productivity gains are illusory.

Quality gates and process redesign

AI-generated code needs enhanced review, not less. Automated quality gates reduce the manual review burden, and your review processes need to handle the increased volume without burning out your senior engineers. Beck’s approach to augmented coding provides a useful model: maintain discipline around testing, keep commits small and focused, and treat AI output the way you’d treat code from a prolific but inexperienced contributor.

The DORA 2025 Report identifies organisational capabilities that determine whether AI tools scale positively: the greatest returns come not from the tools themselves, but from strategic focus on the underlying organisational system.

For the complete implementation playbook including decision matrices, metrics dashboards, quality gate specifications, and review process redesign templates, see A Framework for Responsible AI-Assisted Development – When to Use AI and When to Avoid It.

Resource Hub: Complete Vibe Coding and AI Coding Tools Library

Understanding the Landscape

What Is Vibe Coding and How Does It Differ from AI-Assisted Engineering Precise definitions, the terminology spectrum (vibe coding vs AI-assisted vs augmented coding), legitimate use cases, and when each approach applies. Essential foundation before exploring evidence or implementation.

The AI Productivity Paradox – Why Developers Feel Fast But Deliver Slow Comprehensive evidence synthesis from 7 major studies, productivity placebo mechanisms, Amdahl’s Law application, and why individual velocity doesn’t translate to organisational throughput. Critical for understanding why DORA metrics stay flat despite developer excitement.

Evaluating the Evidence and Arguments

The Case Against Vibe Coding – Understanding, Craftsmanship, and Long-Term Costs Quantified technical debt accumulation (30-41% increase), code quality degradation evidence, the 70% Problem, and craftsmanship perspectives from Kent Beck and ThoughtWorks. Essential for understanding long-term sustainability risks.

The Case For AI Coding Tools – Democratisation, Velocity, and When They Actually Help Balanced exploration of legitimate benefits, democratisation evidence, cognitive toil reduction, strategic selective adoption contexts, and when AI genuinely helps vs. hinders. Required reading for developing nuanced policy positions.

Organisational Challenges and Dynamics

Junior Developers in the Age of AI – Who Trains the Next Generation of Engineers Skill development implications, apprenticeship model evolution, succession planning concerns, and training frameworks for responsible AI usage. Critical for managing long-term engineering capability.

How Senior Engineers Are Adapting to AI Coding Tools or Resisting Them Adoption pattern differences, review bottleneck realities, team dynamics navigation, and consensus-building strategies. Essential for managing cultural tensions and process breakdowns.

Risk Mitigation and Security

AI-Generated Code Security Risks – Why Vulnerabilities Increase 2.74x and How to Prevent Them Comprehensive security risk assessment (2.74-10x vulnerability increases), compliance implications (SOC2/ISO/GDPR/HIPAA), quality gates implementation, and mitigation playbook. Required for organisations in regulated industries or with enterprise customers.

Implementation and Action

A Framework for Responsible AI-Assisted Development – When to Use AI and When to Avoid It Complete decision frameworks, measurement dashboards (DORA metrics + cycle time), quality gates implementation guides, code review redesign approaches, and training curricula. Your primary implementation resource translating understanding to operational execution.

Frequently Asked Questions

Is vibe coding always bad for production systems?

Not necessarily. The distinction is about context-appropriate tool usage. Vibe coding works well for throwaway prototypes, learning experiments, and personal projects where long-term maintainability doesn’t matter. Production systems require AI-assisted engineering with code review, testing, and full understanding—what Kent Beck calls “augmented coding.” Our guide to what vibe coding is and how it differs from AI-assisted engineering maps this spectrum precisely. The risk is vibe coding practices migrating from prototypes to production without appropriate quality gates. Simon Willison is direct about it: “Vibe coding your way to a production codebase is clearly risky.”

How do I know if AI coding tools are actually helping my team?

Measure business outcomes using DORA metrics: deployment frequency, lead time for changes, change failure rate, and mean time to restore service. If AI tools are helping, you’ll see faster deployment frequency with stable or improving change failure rates. If you see increased PR volume but longer cycle times, growing technical debt indicators, or more security vulnerabilities, the productivity gains are illusory. Our productivity paradox article details what to measure and how.

Should I mandate or ban AI coding tools for my team?

Neither extreme works well. Mandates create resistance and don’t teach strategic usage. Bans prevent learning and alienate developers who’ve already adopted tools independently. Instead, establish clear policies: when to use AI (cognitive toil, boilerplate, greenfield), when to avoid it (complex architecture, security-sensitive code, poorly documented legacy), and what quality gates apply. Our framework article provides policy templates and consensus-building approaches.

How serious are the security vulnerabilities in AI-generated code?

The data is concerning for production systems. Research shows 2.74-10x increases in security vulnerabilities, with 322% more privilege escalation paths and 10x more security findings overall. AI tools excel at syntax but struggle with architectural security decisions—authentication, authorisation, secrets management. For regulated industries or enterprise customers requiring SOC2/ISO certification, AI-generated code without enhanced security review creates compliance risks. Our security risks article details specific vulnerability patterns and mitigation strategies.

Will AI-dependent juniors become competent senior engineers?

Only if training evolves appropriately. Anthropic’s research found that junior developers using AI assistance scored significantly lower on comprehension assessments—but the interaction pattern mattered enormously. Developers who used AI for generation and then asked for explanations scored nearly as well as hand-coders. AI doesn’t inherently prevent learning; it shifts what must be taught. Train juniors to validate AI suggestions, understand generated code, and maintain architectural judgment. Our junior developer skill development guide provides training frameworks.

Why are senior developers more resistant to AI tools than juniors?

Senior resistance reflects legitimate expertise, not technological conservatism. Experienced engineers have maintained codebases where shortcuts became nightmares, debugged subtle bugs in code they didn’t write, and accumulated technical debt that slowed delivery for years. They recognise patterns in AI-generated code that create long-term problems juniors haven’t experienced yet. Additionally, seniors bear the review burden of AI-accelerated development. Our senior engineers article explores adoption patterns and consensus-building strategies.

Can I use AI coding tools while maintaining software craftsmanship?

Yes, through what Kent Beck calls “augmented coding”—using AI as a skilled assistant while maintaining code quality standards, testing discipline, and full understanding. The key is treating AI like a prolific but junior developer whose output requires review, not an oracle producing production-ready code. Beck used AI extensively on a complex project while maintaining strict TDD discipline, careful review of intermediate results, and readiness to intervene. Our case against article and framework article explore this approach in depth.

What’s the difference between vendor research and independent research on AI productivity?

Vendor research (Microsoft, GitHub, DX) typically shows 20-26% productivity gains using observational studies measuring task completion and developer satisfaction. Independent research (METR, GitClear, Apiiro) uses randomised controlled trials or longitudinal data measuring business outcomes and code quality, often showing slowdowns or no improvement despite increased activity. The gap reflects methodology: controlled experiments vs observations, objective metrics vs self-reporting, business outcomes vs activity metrics. Both are useful. For decision-making, weight independent research more heavily. Our productivity paradox article provides the complete methodology comparison.

How do I redesign code review to handle the increased volume of pull requests?

Code review capacity becomes the bottleneck when AI accelerates code generation. The approaches that work include automated quality gates that reduce the manual review surface, AI code review checklists highlighting common AI-generated issues, junior upskilling to handle routine reviews, strategic batching of related changes, and acceptance criteria defined before code generation rather than after. The goal isn’t reviewing everything AI produces with equal rigour—it’s risk-appropriate review calibrated to code complexity and how sensitive the area of the codebase is. Our framework article provides complete review redesign approaches.

What should my board know about AI coding tools and productivity?

Three key messages. First, developers are adopting AI tools enthusiastically and report feeling faster. Second, independent research shows organisations aren’t delivering faster—individual velocity gains don’t translate to business outcomes due to review bottlenecks, technical debt, and code quality issues. Third, selective adoption with quality gates can capture gains while mitigating risks, but requires investment in measurement, training, and process redesign. Avoid binary “AI will 10x productivity” or “AI destroys quality” narratives—a nuanced, evidence-based position is both more accurate and more credible. Our productivity paradox and framework articles provide board communication frameworks.

Where to Go from Here

The vibe coding debate is fundamentally about whether speed and understanding can coexist. The evidence says they can—but only with deliberate effort.

AI coding tools aren’t going away. Adoption is at 91% and climbing. The question for your organisation isn’t whether to use them but how to use them without sacrificing the code quality, security, and engineering capability that your business depends on.

Start with honest measurement. Baseline your DORA metrics before making adoption decisions. Define clear policies that distinguish between appropriate and inappropriate AI usage contexts. Implement quality gates that catch AI-specific problems before they reach production. Redesign your review processes to handle increased volume. And invest in your junior engineers’ development—they’re your senior engineering pipeline, and the skills they build now determine your organisation’s capability in five years. If you operate in regulated industries, understanding the compliance implications of AI-generated code should be an early priority.

The path forward isn’t vibe coding and it isn’t banning AI. It’s the harder middle ground: context-specific adoption with the discipline to use these tools where they help and avoid them where they don’t.

If you’re ready to start building that capability, our framework for responsible AI-assisted development is the place to begin. If you’re still assessing the evidence, start with the productivity paradox. If the technical debt and craftsmanship concerns resonate with you, the case against vibe coding provides the quantified evidence you need. And if you’re navigating team dynamics around adoption, the senior engineers article addresses that directly.