Insights Business| SaaS| Technology Understanding Anti-Patterns and Quality Degradation in AI-Generated Code
Business
|
SaaS
|
Technology
Dec 10, 2025

Understanding Anti-Patterns and Quality Degradation in AI-Generated Code

AUTHOR

James A. Wondrasek James A. Wondrasek

OX Security described AI coding tools as an “army of talented junior developers—fast, eager, but fundamentally lacking judgment”. They can implement features rapidly, sure. But they miss architectural implications, security concerns, and maintainability considerations.

Vulnerable code reaches production faster than your teams can review it—this deployment velocity crisis is the real challenge with AI coding tools. Implementation speed has skyrocketed while review capacity remains static.

This article explores specific anti-patterns in AI-generated code with examples showing before/after comparisons. We examine cognitive complexity and maintainability concerns, explain why traditional code review processes miss AI-specific quality issues, and compare how different AI coding assistants differ in output quality. This analysis is part of our comprehensive guide on understanding the shift from vibe coding to context engineering.

OX Security analyzed 300+ repositories to identify 10 distinct anti-patterns that appear systematically in AI-generated code. We’re going to walk through these patterns, explain their impact via cognitive complexity metrics, and demonstrate why traditional code review processes miss AI-specific quality issues.

Different AI tools—Copilot, Cursor, Claude Code—exhibit these patterns differently based on context window constraints. And there’s a practical way to constrain AI output quality using test-driven development.

Let’s get into it.

What are the most common anti-patterns in AI-generated code?

OX Security’s research covered 50 AI-generated repositories compared against 250 human-coded baselines. They found 10 distinct anti-patterns that go against established software engineering best practices.

These aren’t random errors. They’re systematic behaviours that show how AI tools approach code generation.

The patterns break down by how often they occur:

Very High (90-100% occurrence):

High (80-90%):

Medium (40-70%):

Francis Odum, a cybersecurity researcher, put it well: “Fast code without a framework for thinking is just noise at scale”.

How does the “army of juniors” metaphor explain AI code behaviour?

Junior developers can write syntactically correct code that solves immediate requirements. But they miss long-term maintainability, security implications, and system-wide coherence.

AI exhibits the same limitation. Strong pattern matching for common scenarios. Weak architectural judgment for edge cases and system integration.

Here’s the key insight: AI implements prompts directly without considering refactoring opportunities, architectural patterns, or maintainability trade-offs. It just adds what you asked for.

The metaphor explains why refactoring avoidance occurs 80-90% of the time. AI doesn’t think “this new feature would fit better if I restructured the existing authentication module first.” It just adds the new feature wherever you asked.

There’s one key difference though. AI doesn’t learn from mistakes within a session or across projects, unlike actual juniors who improve over time.

The implication for your team? Position AI as implementation support while humans focus on architecture, product management, and strategic oversight. Organisations must fundamentally restructure development roles.

What is cognitive complexity and why does it matter for AI code?

Cognitive complexity measures how difficult it is to read and understand code, considering nesting, conditional logic, and flow. Unlike cyclomatic complexity—which counts linearly independent code paths—cognitive complexity focuses on human comprehension difficulty.

AI-generated code often has high cognitive complexity despite passing traditional metrics like unit test coverage. The “comments everywhere” anti-pattern causes increased cognitive load. Same with edge case over-specification—each hypothetical scenario adds mental overhead.

Cognitive complexity scores above 15 typically indicate code that requires significant mental effort to understand. Here’s what different AI tools actually generated when measured:

Static analysis tools like SonarQube can measure cognitive complexity automatically, giving you objective quality metrics. During code review and pull requests, high-complexity functions receive targeted attention.

High cognitive complexity creates maintainability debt. Code becomes harder to debug, extend, and refactor over time. The costs compound.

How do context window constraints affect AI code quality?

Context window is the token limit determining how much code and conversation history an AI model can process simultaneously.

Tool comparison:

When context fills, AI loses architectural understanding and creates inconsistent implementations across files. Cursor may reduce token capacity dynamically for performance, shortening input or dropping older context to keep responses fast.

Context blindness manifests as duplicated logic, inconsistent naming conventions, parallel implementations of the same functionality, and failure to maintain architectural patterns.

Example: AI reimplements authentication logic in multiple files because it can’t retain the original implementation beyond its context limit. You end up with three different approaches to the same problem scattered across your codebase.

Larger context windows provide better architectural coherence in large codebases but don’t eliminate the fundamental limitation. Code duplication percentage serves as a context blindness indicator. Track it.

What is “vibe coding” and how does it create technical debt?

Vibe coding is an AI-dependent programming style popularised by Andrej Karpathy in early 2025. Developers describe project goals in natural language and accept AI-generated code liberally without micromanagement.

The workflow: initial prompt → AI generation → evaluation → refinement request → iteration until “it feels right.” The developer shifts from manual coding to guiding, testing, and giving feedback about AI-generated source code.

It prioritises development velocity over code correctness, relying on iterative refinement instead of upfront planning.

This creates technical debt through:

OX Security experimented with vibe coding a Dart web application. New features progressively took longer to integrate. The AI coding agent never suggested refactoring, resulting in monolithic architecture with tightly coupled components.

The trade-off: faster prototyping and feature implementation versus long-term maintainability costs and increased cognitive complexity.

A 2025 Pragmatic Engineer survey reported ~85% of respondents use at least one AI tool in their workflow. Most are doing some variation of vibe coding.

It’s best suited for rapid ideation or “throwaway weekend projects” where speed is the primary goal. For production systems, you need constraints. Learn how to transition your development team from vibe coding to context engineering for sustainable AI development.

Why do traditional code reviews miss AI-specific quality issues?

Traditional code review focuses on line-by-line inspection for syntax errors, style violations, and obvious bugs.

AI code appears syntactically correct and often has high unit test coverage, passing superficial review criteria. But traditional code review cannot scale with AI’s output velocity.

The numbers tell the story. Developers on teams with high AI adoption complete 21% more tasks and merge 98% more pull requests, but PR review time increases 91%.

Individual throughput soars but review queues balloon. This velocity gap forces teams into a false choice between shipping quickly and maintaining quality.

Reviewers miss:

AI can confidently invent a function call to a library that doesn’t exist, use a deprecated API with no warning. A human reviewer might assume the non-existent function is part of a newly introduced dependency, leading to broken builds.

You need a multi-layered review framework:

Layer 1: Automated Gauntlet

Layer 2: Strategic Human Oversight

When AI meaningfully improves developer productivity alongside proper review processes, code quality improves in tandem. 81% of developers who use AI for code review saw quality improvements versus 55% without AI review.

For practical implementation strategies, see our guide on building quality gates for AI-generated code.

How do GitHub Copilot, Cursor, and Claude Code differ in code quality?

All three tools exhibit the 10 anti-patterns. But context window size affects severity of context blindness and architectural inconsistency issues.

Context Window Comparison:

Use Case Strengths:

Model Support:

GitHub Copilot remains the most widely adopted with approximately 40% market share and over 20 million all-time users.

Consider codebase size—larger projects benefit from bigger context windows. Think about workflow preference: GUI versus CLI, IDE-based versus terminal-first development.

Tool choice doesn’t eliminate anti-patterns but affects their severity and detectability. For a comprehensive comparison helping you select the right toolkit, read our analysis of comparing AI coding assistants and finding the right context engineering toolkit.

How can test-driven development constrain AI code quality?

TDD workflow with AI: Write test defining expected behaviour → Prompt AI to implement code satisfying the test → Run test to verify correctness → Refactor AI output if needed.

Tests act as constraints preventing anti-patterns:

Refactoring Avoidance: Tests force interface stability during restructuring. You can refactor implementation details while tests ensure behaviour stays consistent.

Edge Case Over-Specification: Tests define actual requirements, not hypothetical scenarios. If you didn’t write a test for OAuth integration, AI won’t add it.

Hallucinated Code: Non-existent functions fail tests immediately. No ambiguity.

TDD encourages smaller, focused functions that pass specific tests rather than monolithic implementations. This directly reduces cognitive complexity.

Typical quality gate implementations include:

The trade-off: TDD slows initial development velocity compared to vibe coding but reduces technical debt accumulation and review burden.

Generating working code is no longer the challenge; ensuring that it’s production-ready matters most. AI copilots can quickly produce functional implementations, but speed often masks subtle flaws.

FAQ Section

What is “insecure by dumbness” in AI-generated code?

The phenomenon where non-technical users develop and deploy production applications without cybersecurity knowledge. Neither developers nor AI assistants possess knowledge to identify what security measures to implement or how to remediate vulnerabilities. The resulting code is not insecure by malpractice or malicious intent, but rather insecure by ignorance.

How many security alerts are typical in AI-augmented development?

According to OX research, organisations were dealing with an average of 569,000 security alerts at any given time before AI adoption. With AI accelerating deployment velocity, the alert volume increases proportionally while remediation capacity remains constant, creating an unsustainable detection-led security approach.

Can I use AI code in production safely?

Yes, with appropriate guardrails: automated security scanning, static analysis for complexity metrics, and focused human review on architecture and business logic. Human review on every AI-generated pull request prevents automated scanning tools from missing logical flaws. Deploying AI code faster than quality assurance can scale creates the primary risk.

Why does AI code avoid refactoring?

AI implements prompts directly without considering existing code structure opportunities. It lacks the human developer instinct to recognise “this new feature would fit better if I restructured the existing authentication module first.” Each prompt generates additive code rather than integrative improvements, leading to 80-90% occurrence of refactoring avoidance.

What is the difference between cyclomatic and cognitive complexity?

Cyclomatic complexity counts linearly independent code paths, a structural metric. Cognitive complexity measures human comprehension difficulty by weighting nested control structures and complex logic patterns. Cognitive complexity evaluates how difficult it is to read and understand code, giving insight into maintainability.

How do I know if my team is doing vibe coding?

Indicators include: rapid iteration cycles with minimal upfront planning, acceptance of AI code with minor tweaks rather than architectural review, high deployment velocity with increasing bug reports, lack of refactoring in commit history, and developers describing workflows as “I asked the AI to add X and it worked”.

Which AI coding tool has the largest context window?

Claude Code and Cursor Max mode both offer 200K-token context windows. However, Claude Code maintains consistent capacity across sessions while Cursor may dynamically reduce tokens for performance. Copilot operates primarily on file-specific context, significantly smaller than repository-wide awareness tools.

What are hallucinated code patterns and how do I detect them?

Hallucinated code occurs when AI generates functions, methods, or APIs that appear plausible but don’t actually exist. The library itself is usually correct, and functionality seems to belong in the library, but it simply doesn’t exist. Detection requires systematically verifying every function call references real library methods. Use IDE error checking and validate against official documentation.

Can AI tools understand our existing architecture patterns?

AI tools can recognise patterns within their context window but lack persistent understanding across sessions. They may follow patterns in currently loaded files but won’t maintain architectural consistency when context exceeds token limits. This leads to context blindness and parallel implementations of existing functionality.

Should I worry about technical debt from AI coding?

Yes, but the concern is deployment velocity, not AI quality per se. AI code accumulates technical debt similarly to junior developer code through refactoring avoidance and edge case over-specification, but reaches production faster than traditional review can process. Implement automated quality gates and focused architectural review to manage this risk.

How much faster can I develop with AI coding assistants?

Development velocity varies by task complexity and tool proficiency. Research shows significant productivity gains, but OX Security research indicates AI enables code to reach production faster than human review capacity can scale. The bottleneck shifts from implementation speed to quality assurance throughput.

What metrics should I track for AI code quality?

Priority metrics: Cognitive complexity scores via SonarQube or similar static analysis, refactoring frequency in commit history to detect avoidance patterns, code duplication percentage as context blindness indicator, security alert volume and remediation time, and ratio of automated versus human-detected issues in review. Defect density in production reveals real-world reliability.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices
Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Jakarta

JAKARTA

Plaza Indonesia, 5th Level Unit
E021AB
Jl. M.H. Thamrin Kav. 28-30
Jakarta 10350
Indonesia

Plaza Indonesia, 5th Level Unit E021AB, Jl. M.H. Thamrin Kav. 28-30, Jakarta 10350, Indonesia

+62 858-6514-9577

Bandung

BANDUNG

Jl. Banda No. 30
Bandung 40115
Indonesia

Jl. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660