Business

SaaS

Technology

•

Mar 13, 2026

Governing AI-Generated Code in a Compressed Engineering Team

Your engineering team is smaller than it was a year ago. The code output is bigger. Anthropic’s data shows 79% of Claude Code conversations are classified as automation, pull requests per author are up 20%, and PR size is up 18%. The number of humans reviewing those PRs has not kept pace.

This article is part of our comprehensive team compression and its implications for engineering leadership, where we explore every dimension of what AI is doing to engineering organisations. This piece focuses on the compressed engineering team context: what breaks when AI writes most of your code, and how to build a governance model that actually survives AI velocity.

Slapping an approval gate on every deployment is not going to survive this velocity. What you need is a governance model that protects code quality and institutional knowledge without killing the speed your AI tooling delivers.

This article gives you that model: guardrails over gates, multi-agent validation, processes-as-code, and a governance checklist you can adapt to your team right now.

Why does the code review function break when AI writes most of your pull requests?

The review-to-risk ratio is inverting. CodeRabbit’s December 2025 report looked at 470 open-source PRs and found AI-authored changes produced 1.7x more issues per PR and a 24% higher incident rate compared to human-only code. Logic and correctness issues are 75% more common. Security issues run up to 2.74x higher.

PRs are getting larger, approximately 18% more additions as AI adoption increases, while change failure rates are up roughly 30%. Meanwhile, your review team is the same size or shrinking.

Then there is the developer experience side of things. 45% of developers say debugging AI-generated code takes longer than debugging code they wrote themselves. 46% actively distrust AI tool accuracy. And 66% cite “almost right but not quite” as their primary frustration. That last one is the killer: code that looks correct, passes a quick scan, but hides logic errors that blow up in production.

The structural mismatch goes deeper than volume. 43.8% of AI coding sessions are directive — meaning there is minimal human interaction during generation. You are getting production-grade code volumes with prototype-grade human oversight. Reviewer fatigue compounds this: more AI code means more cursory reviews means more incidents means even less time for thorough reviews. It is a vicious cycle.

As Greg Foster from Graphite put it: “If we’re shipping code that’s never actually read or understood by a fellow human, we’re running a huge risk.”

You cannot solve this by asking your remaining engineers to review harder. The governance has to be systemic.

What is the difference between guardrails and gates — and why does it matter for AI-generated code?

Gates are blocking checkpoints. Hard approvals, manual sign-offs, one-size-fits-all templates that stop deployment until someone ticks a box. They work fine when code output is measured in a handful of PRs per developer per week. They fall apart when AI agents generate code faster than your team can open the PRs, let alone review them.

Guardrails are different. They are proactive, embedded controls that shape how developers and AI agents behave by default. Nick Durkin, Field CTO at Harness, describes the goal as making it hard to do the wrong thing rather than stopping people from doing anything at all. In practice, 80-90% of security, compliance, and resilience requirements get baked into the pipeline automatically. The remaining space is where your team innovates.

Here is the key difference: gates require human bandwidth at every checkpoint. Guardrails require human bandwidth at design time, then enforce automatically from that point forward.

Plenty of organisations learned this the hard way. Standardised templates sounded like good governance until they became too restrictive — constant exceptions, or teams quietly working around the process entirely. When something fails in a guardrails model, the system explains why and shows the next step forward. Policy violations become learning moments, not blockers.

In a pipeline this looks like secret detection pre-commit hooks that catch credentials before they reach version control, dependency vulnerability checks that block at a severity threshold you set, and automated scanning on every PR. None of these need a human to intervene on each change. All of them catch problems a fatigued reviewer might miss.

The most successful teams will rely on flexible templates combined with policy-driven pipelines. The guardrails model is the only viable approach when code output exceeds human review capacity. If your team is compressed, it already does.

How does multi-agent code validation work — and can AI reliably check AI?

The “who watches the watcher” problem has a practical answer: you use multiple watchers. The multi-model code review approach runs code through different LLMs. One model generates the code, a different model audits it. Different models have different biases and different failure modes, so cross-checking surfaces issues the generating model would miss on its own.

Harness predicts teams will use specialised AI agents each designed to perform a narrow, well-defined role, mirroring how effective human teams already operate. CodeRabbit’s agentic validation already goes beyond syntax errors — it understands context, reasons about logic, predicts side effects, and proposes solutions.

Properly configured AI reviewers can catch 70-80% of low-hanging fruit, freeing your humans to focus on architecture and business logic. But multi-agent validation does struggle with business logic, architectural intent, and context-dependent decisions. It does not know why your system is built the way it is. It cannot evaluate whether a technically correct change violates an unwritten architectural principle that only exists in your senior engineer’s head.

This is exactly why the senior engineer as the primary governance owner matters so much in compressed teams. Multi-agent validation handles volume; senior engineers carry the judgement that no automated layer can replicate. The practical requirement: validation agents must run in the CI/CD pipeline as automated guardrails, not as optional steps someone remembers to trigger. Treat multi-agent validation as a risk-reduction layer, not a replacement for human judgement on high-stakes code paths.

As Addy Osmani puts it: “Treat AI reviews as spellcheck, not an editor.”

What does DevSecOps look like in a pipeline where AI agents write production code?

When AI generates changes faster than humans can review them, security cannot sit downstream as a separate team delivering reports weeks later. That model simply does not survive AI velocity.

In teams that are getting this right, security is fully integrated into the delivery lifecycle. Security teams define the policies. Engineers understand the rules. Pipelines enforce them automatically. Nobody is waiting on a report.

The governance gap is real though. 91% of executives plan to increase AI investment, but only 52% have implemented AI governance or regulatory-aligned policies. As Karen Cohen, VP Product Management at Apiiro, put it: “In 2026, AI governance becomes a compliance line-item.”

So what does baked-in security actually look like? Automated SAST/DAST scanning on every PR. Dependency vulnerability checks. Secret detection pre-commit. Licence compliance scanning. Container image scanning. These run automatically on every AI-generated change without anyone needing to remember to trigger them.

But automated scanning has limits. Traditional AppSec tools like SAST and SCA were built to detect known vulnerabilities and code patterns. They were not designed to understand how or why code was produced. This is where human review earns its keep.

The rule is straightforward: if AI-generated code touches authentication, payments, secrets, or untrusted input, require a human threat model review regardless of what the automated guardrails say. If you are in a regulated industry — SOC 2, HIPAA, financial services — this is not optional. Harness AI already labels every AI-generated pipeline resource with ai_generated: true and logs it in the Audit Trail. That is where things are heading.

How does processes-as-code make governance scale independently of team size?

If you have worked with infrastructure-as-code (Terraform, Pulumi, CloudFormation) you already know the value proposition. Declarative configurations that are version-controlled, reviewable, repeatable, and auditable.

Forrester expects 80% of enterprise teams to adopt genAI for processes-as-code by 2026. The idea extends that same principle to governance and security policies. Your automated controls become declarative policy files stored in Git, enforced automatically, and auditable via version control.

AI lowers the syntax barrier here. Instead of digging through documentation for domain-specific languages, you describe what you want in plain English. Harness AI generates OPA policies from plain-English descriptions: “Create a policy that requires approval from the security team for any deployment to production that hasn’t passed a SAST scan.” The AI generates the Rego code. Your experts review and approve. Governance scales without bottlenecks.

This is how you answer the question: “How do I scale review when the review team is smaller than the code output?” You do not scale the team. You scale the rules.

When a regulator or auditor asks “how do you ensure X,” the answer is a Git commit history, not a process document. All AI-generated resources become traceable, auditable, and compliant by design.

Automated governance handles the pipeline. The remaining risk lives in the humans operating it.

What is skill atrophy — and how do you prevent it when AI writes most of your code?

When AI generates the majority of your code and human review becomes cursory, developers gradually lose the ability to read, debug, and reason about code at the level required to catch the problems AI introduces. This is skill atrophy, and it is a governance risk — not just a personal development concern.

The evidence is building. Software developer employment for ages 22-25 declined nearly 20% by September 2025 compared to its late 2022 peak. Fewer juniors entering the pipeline means fewer seniors in five years. As Stack Overflow put it: “If you don’t hire junior developers, you will someday never have senior developers.”

GitClear’s 2025 research found an 8-fold increase in frequency of code blocks duplicating adjacent code. That is a signature of declining code ownership. People are accepting AI output without reading it closely enough to notice it repeats what is already there.

The hidden cost: fewer people doing less manual coding means tacit knowledge — the “why” behind the system — erodes faster. Your most experienced engineers carry institutional knowledge no AI model has. If they stop reading code because AI writes it, that knowledge layer thins out.

Here is what to do about it.

Deliberate code-reading exercises. Run weekly sessions where engineers review AI-generated code to understand it, not just approve it. Think of it as a book club for your codebase.

AI-off sprints. Deliberately allocate time for periodic manual coding to keep debugging intuition sharp. Even one sprint per quarter keeps the skills warm.

Deep review mandates. AI-generated code touching high-stakes paths gets genuine engagement with the logic, not a rubber stamp.

Pair programming with AI as the third participant. One human writes, one human reviews, AI assists. The review skill is preserved because a human is always reading code.

Rotation of review responsibilities. If only one person understands a given code path and they leave, you lose the knowledge and the review capability in one hit.

As Bill Harding, CEO of GitClear, warned: “If developer productivity continues being measured by commit count or lines added, AI-driven maintainability decay will proliferate.” Measure understanding, not output.

What does a minimum viable governance framework look like for a compressed engineering team?

Your governance posture has to be proportionate to your team size, risk profile, and regulatory exposure. A 15-person SaaS team and a 150-person FinTech team need different frameworks, but both need a framework. Here is a checklist you can adapt.

Pipeline Automation (guardrails):

Automated SAST/DAST scanning on every AI-generated PR
Dependency vulnerability checks with automatic blocking at critical severity
Secret detection pre-commit hooks
Licence compliance scanning
AI-generated PR labelling — every PR produced by or with AI tagged, distinguishing directive (fully AI-generated) from collaborative (human-AI) code

Review Protocols:

Multi-agent validation layer where the generation model differs from the audit model
Human review mandate for code touching auth, payments, secrets, untrusted input
Higher test coverage thresholds for AI-generated code — 80%+ line coverage given the 24% higher incident rate per PR
Review protocols requiring reviewers to demonstrate understanding, not just approval

Governance-as-Code:

All governance policies stored in Git, version-controlled, auditable
Pre-commit policy hooks enforcing rules at the point of integration
Declarative policy files (OPA/Rego or equivalent) for compliance requirements

Human Capability Maintenance:

Weekly code-reading exercises for the team
Periodic AI-off development sprints
Rotation of review responsibilities to prevent specialisation atrophy
Documentation of architectural decisions and institutional knowledge

The bottom line: you cannot safely reduce team size without a governance framework that compensates for fewer human reviewers. Teams with strong pipelines, clear policies, and shared rules will move faster than ever. Teams without them will ship riskier and blame AI for problems that already existed.

If you have not started, start small. Secret detection pre-commit hooks, automated SAST on every PR, and AI-generated PR labelling can be implemented in days, not weeks. Build from there. For the broader context on what AI team compression means for engineering organisations — from labour market evidence through role transformation to planning frameworks — the complete resource covers every dimension.

To see how leading companies are approaching AI code governance in practice — and which patterns are holding up — the Shopify, Klarna, and Tailwind case studies are instructive. And if you are ready to think about how governance readiness affects headcount confidence, the headcount model guide covers governance as a direct input to your compression decisions.

FAQ

Is vibe coding acceptable in a governed engineering pipeline?

Vibe coding — shipping AI-generated code without deep review — is rejected professionally by 72% of developers. In a governed pipeline, it is acceptable only for prototyping and non-production code. Anything entering production must pass through your automated guardrails and, for high-risk paths, human review. No exceptions.

What percentage of AI-generated code should receive human review?

All AI-generated code should pass through automated guardrails — 100%. Human review should be mandatory for code touching authentication, payments, secrets, and untrusted input. For everything else, a risk-based sampling approach is more sustainable than trying to review every line.

How do I label AI-generated pull requests in my pipeline?

Set up automated PR tagging that flags any commit produced by or with AI coding tools. Most CI/CD platforms support metadata tagging. Distinguish between fully AI-generated code (directive pattern) and human-AI collaborative code (feedback loop pattern), because the review burden is different for each.

What is the difference between policy-as-code and processes-as-code?

Policy-as-code refers to machine-readable declarative files (e.g., OPA, Rego) encoding specific compliance and security rules. Processes-as-code is the broader Forrester concept: entire governance workflows expressed as version-controlled, auditable configurations. Policy-as-code is the implementation layer; processes-as-code is the organisational model.

How do I convince my board that AI code governance is worth the investment?

Lead with the governance gap: 91% of executives plan to increase AI investment but only 52% have governance frameworks. AI-generated code creates 1.7x more problems with a 24% higher incident rate. Governance is cheaper than incident remediation, regulatory fines, or reputational damage. That is a straightforward business case.

Can multi-agent validation replace human code review entirely?

No. It catches syntactic, structural, and known-pattern issues effectively but struggles with business logic, architectural intent, and context-dependent decisions. It reduces the human review burden for routine code but cannot replace human judgement on high-stakes paths.

What test coverage should I require for AI-generated code?

Higher thresholds than human-written code, given the documented higher incident rate. 80%+ line coverage and mandatory integration tests for any AI code interacting with external systems or data stores is a defensible baseline.

How does the Anthropic directive interaction pattern affect my review process?

43.8% of Claude Code conversations follow a directive pattern — the user specifies the task and the AI completes it with minimal interaction. That means nearly half of AI-generated code is produced with limited human oversight during generation. Your review process must compensate: directive-pattern code needs the same rigour as code from an untrusted contributor.

What governance is required for AI-generated code in regulated industries?

Beyond standard DevSecOps guardrails: auditable policy-as-code with full Git history, mandatory human threat model review for code touching financial transactions or patient data, automated compliance checks mapped to specific requirements (SOC 2, HIPAA), and evidence that AI-generated code passes the same quality gates as human-written code.

How small can my engineering team realistically get while maintaining adequate governance?

There is no universal floor. It depends on three variables: the percentage of code generated by AI, the risk profile of your application, and the maturity of your automated governance pipeline. A team with mature guardrails, multi-agent validation, and processes-as-code can operate smaller than one relying on manual review. But you still need humans who understand the system well enough to know when the guardrails are not enough.

Governing AI-Generated Code in a Compressed Engineering Team

Why does the code review function break when AI writes most of your pull requests?

What is the difference between guardrails and gates — and why does it matter for AI-generated code?

How does multi-agent code validation work — and can AI reliably check AI?

What does DevSecOps look like in a pipeline where AI agents write production code?

How does processes-as-code make governance scale independently of team size?

What is skill atrophy — and how do you prevent it when AI writes most of your code?

What does a minimum viable governance framework look like for a compressed engineering team?

FAQ

Is vibe coding acceptable in a governed engineering pipeline?

What percentage of AI-generated code should receive human review?

How do I label AI-generated pull requests in my pipeline?

What is the difference between policy-as-code and processes-as-code?

How do I convince my board that AI code governance is worth the investment?

Can multi-agent validation replace human code review entirely?

What test coverage should I require for AI-generated code?

How does the Anthropic directive interaction pattern affect my review process?

What governance is required for AI-generated code in regulated industries?

How small can my engineering team realistically get while maintaining adequate governance?

Related Articles

5 Platforms For Optimising Your Agents Compared

Spec Driven Development Looks Like Programming If You Do It Right

Team extension, extended team & out-sourcing FAQ

Need a reliable team to help achieve your software goals?

BUSINESS HOURS

SYDNEY

YOGYAKARTA

BANDUNG