Business

SaaS

Technology

•

May 29, 2026

The 30-Plus Framework Landscape — Navigating Spec-Driven Development Options in 2026

Spec-driven development started as the industry’s answer to vibe coding — Andrej Karpathy’s term for casually prompting AI to generate code with no structure or planning. That correction worked, maybe too well. There are now more than 30 frameworks, tools, and templates all making the case that they’re the right way to put specs at the centre of AI-assisted development. For engineering leaders who need to pick one, that number is a genuine headache.

The three-tier mental model in this article is your navigation shortcut. Vendor-backed tools like AWS Kiro and GitHub SpecKit sit in Tier 1. Community-led frameworks like BMAD-METHOD, GSD, and Cursor Plan Mode sit in Tier 2. Niche-optimised tools like OpenSpec and Tessl sit in Tier 3. But before you even consult the tiers, there’s one filter question that eliminates half the field straight away: are you working in an existing codebase, or starting from scratch?

By the end of this article you’ll have that three-tier model, a concrete evaluation axis, and RanTheBuilder’s per-feature cost benchmarks to bring into a real team conversation.

How do you navigate 30-plus spec-driven development frameworks?

The ecosystem has no official taxonomy. Tools like OpenSpec, GSD, SpecKit, Superpowers, Taskmaster AI, Antigravity AgentKit, BMAD-METHOD, and Agent OS have all appeared without any coordinating body to organise them. The awesome-openspec curated list includes a spec-compare repository with six tools and scoring matrices, which helps — but even that’s a partial picture.

The six frameworks worth evaluating seriously are: AWS Kiro and GitHub SpecKit (Tier 1, vendor-backed), BMAD-METHOD, GSD, and Cursor Plan Mode (Tier 2, community-led), and OpenSpec and Tessl (Tier 3, niche-optimised). The rest — Taskmaster AI, Superpowers, Antigravity AgentKit, Agent OS, and others — have thin coverage and require significant primary research to evaluate.

The brownfield versus greenfield question is the first filter. It eliminates half the options immediately. Keep it in mind as you read through the tiers.

What do vendor-backed frameworks — AWS Kiro and GitHub SpecKit — offer that community tools don’t?

Tier 1 is defined by what “vendor-backed” actually means in practice: long-term maintenance, commercial support, IDE-native integration, and documentation that survives procurement reviews. That combination matters for regulated industries, enterprise teams with compliance requirements, or any organisation where a single-maintainer framework would fail procurement.

AWS Kiro is Amazon’s agentic IDE built on Bedrock. It runs a three-phase workflow — requirements, design, implementation — and generates user stories with acceptance criteria in EARS (Easy Approach to Requirements Syntax) notation, which structures acceptance criteria into machine-readable, testable statements rather than freeform prose. The main limitation is that it was designed for greenfield. As one Broadcom engineer put it, “most developers don’t start from a greenfield idea — they start from an existing codebase, a messy bug, or a design they already agreed on.” Kiro has added Design-first and Bugfix workflows since launch, but its core assumption is still a new project.

GitHub SpecKit is the open-source, Microsoft/GitHub-backed option. Currently at v0.8.7 with 93,000+ GitHub stars and support for 30+ AI coding agents. Its core mechanism is the “constitution” — a markdown rules file containing high-level principles that apply to every change across every session. Define it once, and every spec inherits those rules. Templates flag unknowns as NEEDS CLARIFICATION rather than guessing. One flag worth noting: the RanTheBuilder benchmark found known upgrade issues that overwrite customisation files — worth testing thoroughly before committing.

Both Kiro and SpecKit skew greenfield. Neither was designed for delta-spec work on an existing codebase. When to choose Tier 1: you’re in a regulated industry, you need a support contract, or a single-maintainer bus factor would fail procurement. For everything else, start with Tiers 2 and 3.

Augment Code / Intent holds the highest EU AI Act compliance tier among SDD tools — if that’s your constraint, the governance deep-dive covers it in full.

What are BMAD-METHOD, GSD, and Cursor Plan Mode — and how do community-led frameworks differ from vendor tools?

Tier 2 frameworks are MIT-licensed, community-maintained, and have no vendor lock-in. They move faster than Tier 1 and adapt to your team’s conventions rather than imposing their own. The trade-off is straightforward: no support contract and higher bus-factor risk.

BMAD-METHOD (Build More Architect Dreams) is the grassroots anchor of the SDD ecosystem. Version 6.6.0 shipped on April 29, 2026 with 46,700+ GitHub stars and more than 5,500 forks. It orchestrates 12+ specialised AI agent roles — Mary (Business Analyst), Preston (Product Manager), Winston (Architect), Sally (Product Owner), Simon (Scrum Master), Devon (Developer), Quinn (QA Engineer), and more. Each agent has a defined persona, prompt set, and artefact responsibility. Planning artefacts — PRD, architecture doc, story breakdown — flow into the project’s docs/ folder. The /bmad-help command reads current project state and tells you which agent to invoke next. By any measure BMAD is the healthiest open-source project in this space: 458 commits in 90 days, 94.3% issue close rate, 1-day PR median age.

BMAD’s structured artefacts are how it tackles context rot — the failure mode where a long AI coding session’s accumulated context degrades output quality until the agent starts contradicting earlier decisions. That matters a lot for large, complex features. For a well-scoped solo task, the benefit is smaller.

GSD (Get Shit Done) is the lean alternative. 61,000 GitHub stars in under five months since its December 2025 initial commit. Where BMAD adds sprint ceremonies, GSD’s philosophy is that complexity should live in the system, not the workflow. It spawns parallel researchers, planners, executors, and verifiers — each in a fresh 200K token context window, so context rot is zero by design. The Minimal Install Profile cuts the system prompt from ~12,000 to 700 tokens, making it viable for local LLMs and metered APIs.

Cursor Plan Mode is the “no framework required” entry point. Built into the Cursor AI editor — no installation, no configuration. The agent produces a detailed implementation plan, asks clarifying questions, maps affected files, and presents it for your review before writing a single line of code. No spec lifecycle, no drift detection, no living-spec synchronisation. It’s SDD-lite. And it’s the lowest-friction way to build the habit that matters.

BMAD’s two operating modes — Full and Quick — are the most consequential decision within Tier 2.

BMAD Full vs BMAD Quick — when does the extra overhead pay for itself?

The cost difference is substantial, so make this decision explicitly.

BMAD Full produces a complete PRD, architecture document, and story breakdown before any implementation code runs. The adversarial code review (/bmad-bmm-code-review) surfaces design issues before passing; “Party Mode” runs multiple agent personas through the design before implementation begins. Per RanTheBuilder’s benchmark: roughly $200 per feature and six days to first PR. Highest specification quality score of any framework tested.

BMAD Quick produces a single combined spec document. Same elicitation quality as Full, but it skips the full artefact set and the adversarial review loop. Roughly $85 per feature in about two days.

RanTheBuilder’s recommendation is direct: choose Full when design correctness is critical — the adversarial review and course-correction workflow catch design mistakes before they compound. It’s also the right call when the feature involves significant architectural decisions or cross-team dependencies, when the codebase is large and context rot is a known problem, or when a non-developer stakeholder will review the spec.

Choose Quick when the feature is well-scoped and self-contained, when time-to-PR is the main constraint, or when your team is still working out whether to commit to BMAD at all.

One important note: RanTheBuilder’s cost data is per feature at the individual developer level. It does not translate directly to team cost or total cost of ownership.

What makes OpenSpec and Tessl niche-optimised — and when do you need them?

Tier 3 tools aren’t general-purpose SDD frameworks. They solve a specific problem better than anything in Tiers 1 or 2.

OpenSpec (Fission AI) is brownfield-first by design. Its key architectural decision is separating the Source of Truth (what is currently implemented) from proposed changes. Each feature request or bugfix becomes an independent subfolder under changes/, containing a proposal.md, specs/, design.md, and tasks.md. Delta markers — ADDED, MODIFIED, REMOVED — track what changes relative to existing functionality. You specify only what’s changing, not the whole system.

The workflow is propose → apply → archive. Only after a change is implemented and accepted does the delta merge back into the main specs/ directory, building up a growing source-of-truth document over time. RanTheBuilder’s benchmark has OpenSpec at $95 per feature and roughly one day to first PR — the highest overall score across all four frameworks tested (4.00/5). It’s currently single-maintainer (Fission AI), so organisations with low bus-factor tolerance need to weigh that against how well the brownfield fit suits them. The awesome-openspec list is the best navigation resource for tutorials and comparisons.

Tessl solves a different problem entirely. Founded by Guy Podjarny, creator of Snyk, Tessl is built around the Spec Registry — an open registry of over 10,000 machine-readable specifications for external libraries and APIs. Think of it as npm for specifications: agents pull the correct spec for a package by name rather than guessing library behaviour from training data. That prevents a specific and expensive failure mode — AI-hallucinated method signatures that reach production.

What makes Tessl different is the living-spec model. Specs update as libraries update, unlike the point-in-time specs produced by BMAD, SpecKit, OpenSpec, and Kiro. Still in closed/limited beta as of mid-2026, so it’s not a same-day option. But when it is, it’s the right choice for any team integrating third-party APIs at scale.

Brownfield vs. greenfield: why is this the most important question before choosing a framework?

Most teams at 50–500 person organisations are working in brownfield codebases, not greenfield ones. The greenfield-first framing of most SDD articles doesn’t match that reality. The actual job for most teams is “build new things fast without breaking anything already in production.”

Greenfield-biased frameworks — Kiro, SpecKit, BMAD Full — were designed to define a system from scratch. Spec completeness is their primary value. They struggle with the “specify only what is changing” use case because the change surface is a small delta in a large existing system, not a blank page. Enterprise analysis of Kiro and SpecKit describes them as working well for greenfield projects and prototypes, but as a different game entirely for enterprise software with existing systems and long-term maintainability requirements.

Brownfield-first frameworks — OpenSpec — were built around the delta-spec model from the start. Only the change surface needs specifying. The accumulating source-of-truth document grows with the codebase rather than replacing it.

Dual-mode frameworks — BMAD with Quick workflow, GSD — can be adapted for brownfield work but require the team to scope the spec manually. Workable, but not native.

Cursor Plan Mode operates at the individual-change level. The plan is scoped to the current task, not the whole system. That works in both contexts, which is why it’s a sensible starting point regardless of your codebase situation.

If compliance and audit trail requirements are part of your brownfield evaluation, that adds a second axis. The governance and compliance detail lives in a separate article.

What does per-feature cost data tell us about framework overhead — and what does it leave unanswered?

The brownfield/greenfield axis narrows the shortlist. Cost data calibrates the decision within it.

RanTheBuilder (Ran Isenberg) ran the same real feature through multiple frameworks and scored them across 13 dimensions from an enterprise perspective. It’s the only published peer-level cost benchmark in the SDD ecosystem. All figures are at the individual developer level — they do not translate directly to team cost or total cost of ownership.

The benchmarked per-feature estimates are:

BMAD Full: roughly $200 per feature and six days to first PR — highest cost in the benchmark, highest specification quality score.

BMAD Quick: roughly $85 per feature and two days to first PR.

SpecKit: roughly $75 per feature and one day to first PR. Overall score 2.77/5 — lowest of the four, with community health signals to match (37 commits in 90 days, 36.8% issue close rate, 62-day PR median age).

OpenSpec: roughly $95 per feature and one day to first PR. Highest overall score (4.00/5).

RanTheBuilder’s summary on the BMAD Quick / SpecKit / OpenSpec cluster: “They land in the same ballpark on both speed and cost. The differences are noise.” BMAD Full is the outlier — worth it for architecturally complex work, hard to justify for everything else.

What the data doesn’t cover: GSD, AWS Kiro, and Tessl have no equivalent published benchmark. Name that gap when presenting these numbers to your team. Bus factor, long-term maintenance cost, and downstream quality outcomes aren’t captured either. The benchmark measures specification quality; it doesn’t tell you what happens six months down the track.

How do you evaluate and pilot a spec-driven development framework before committing?

Running a pilot on your own codebase is more useful than any benchmark. Here’s a six-step process that doesn’t turn the evaluation into a research project.

Step one: apply the brownfield/greenfield filter. Eliminate frameworks that don’t match your primary codebase context. Step two: apply the tier filter — if vendor support or compliance documentation is a hard requirement, start in Tier 1. Step three: select one framework per shortlist tier for a pilot sprint using a representative feature — not the simplest task on the backlog, not the most fraught one. Step four: capture time-to-PR, specification quality (reviewed by a non-developer if possible), token cost, and team friction. Team friction is the one that doesn’t show up in benchmarks. Step five: compare your results against the RanTheBuilder benchmarks as a calibration baseline. Step six: evaluate sustainability signals — bus factor, community activity, documentation quality, and how easy it is to migrate specs if needed.

For teams with no SDD experience at all: start with Cursor Plan Mode for two weeks. No new tooling, no configuration, no framework adoption decision. Just build the habit of reviewing AI plans before execution. Once that’s a team norm, adopting GSD or BMAD is a much smaller step. If you’re not using Cursor, GSD is the lowest-ceremony standalone option. The spec-driven development as a movement article covers the broader context for any team still deciding whether SDD is the right shift at all.

The community philosophy underlying all of these frameworks is worth understanding before you commit. The frameworks differ significantly in approach and ceremony, but they share a common premise about what makes AI-assisted coding reliable at scale.

Frequently asked questions

What is the easiest way to get started with spec-driven AI development?

Cursor Plan Mode. It requires no additional tooling beyond the Cursor editor itself. The plan-review step before code execution is the core SDD habit. Establishing it costs nothing and prepares a team for adopting a fuller framework later. If you’re not using Cursor, GSD is the lowest-ceremony standalone framework — fast installation, 14 supported runtimes including Claude Code, Windsurf, and Gemini CLI.

What is BMAD-METHOD and what do the 12-plus agent roles do?

BMAD-METHOD is an open-source, MIT-licensed, community-led SDD framework that orchestrates 12+ specialised AI agent roles across the full development lifecycle. Named roles include Mary (Business Analyst), Preston (Product Manager), Winston (Architect), Sally (Product Owner), Simon (Scrum Master), Devon (Developer), and Quinn (QA Engineer). Each operates with a defined persona, prompt set, and artefact responsibility. The /bmad-help command reads current project state and tells you which agent to invoke next.

What is the difference between BMAD Full and BMAD Quick?

BMAD Full produces a complete artefact set — PRD, architecture document, story breakdown — before any code runs. Roughly $200 per feature and six days to first PR. BMAD Quick produces a single combined spec document, skips the full artefact set, and comes in at roughly $85 per feature and two days to first PR. Per RanTheBuilder: choose Full for large, architecturally complex features; choose Quick for well-scoped work or when evaluating BMAD for the first time.

Is BMAD-METHOD too complicated for a small development team?

BMAD Full has the highest learning curve of any SDD tool. The multi-agent approach adds ceremony that may be excessive for small teams or simple features. Solo developers and small teams typically get more value from GSD or Cursor Plan Mode. GSD’s Minimal Install Profile cuts the system prompt by 94% for lightweight use.

What is OpenSpec and how does it handle brownfield codebases?

OpenSpec (Fission AI) is a brownfield-first SDD tool that uses delta markers (ADDED / MODIFIED / REMOVED) to specify only what is changing in an existing codebase. The propose → apply → archive workflow keeps specs compact and reviewable. The archive step accumulates a growing source-of-truth document. Roughly $95 per feature and one day to first PR (RanTheBuilder benchmark). Currently single-maintainer — bus-factor risk applies.

What is the Tessl Spec Registry and how does it prevent API hallucinations?

The Tessl Spec Registry is an open registry of over 10,000 machine-readable specifications for external libraries and APIs — “npm for specifications.” AI coding agents pull the correct spec for a package by name rather than guessing library behaviour from training data. Tessl uses a living-spec model: specs update as libraries update, unlike the point-in-time specs produced by BMAD, SpecKit, OpenSpec, and Kiro. Still in closed/limited beta as of mid-2026.

GSD vs BMAD — which is better for solo developers?

GSD. It has 61,000+ GitHub stars, low ceremony, native Claude Code orchestration, and fast onboarding. One solo developer documented producing 100,000 lines of code in two weeks using GSD with zero context management overhead. BMAD Full’s 12-agent artefact overhead is designed for feature complexity and team coordination — solo developers rarely need that level of governance.

How do vendor-backed SDD tools (Kiro, SpecKit) compare to community tools (BMAD, GSD) for enterprise teams?

Vendor-backed tools offer long-term maintenance guarantees, commercial support, and compliance documentation. Community tools offer faster iteration, no vendor conventions, and no licensing constraints beyond MIT. For regulated industries or enterprise procurement, Tier 1 typically clears compliance requirements more easily. The governance evaluation is covered in detail in a separate article on spec-driven development in regulated industries.

Is spec-driven development just waterfall with extra steps?

No. SDD specs are per-feature (or per-change in OpenSpec’s model), not upfront system-wide design documents. They’re generated iteratively, reviewed by humans, and consumed immediately by the AI coding agent in the same session — not handed off to a separate implementation phase. A waterfall requirements document takes weeks and goes stale. An SDD spec takes minutes, is reviewed in the same session, and is either archived or discarded after implementation.

What is context rot and how do SDD frameworks prevent it?

Context rot is the failure mode where a long AI coding session’s accumulated context degrades output quality — the agent starts contradicting earlier decisions or losing track of constraints. GSD combats it by executing each plan in isolated sub-agents with fresh 200K token context windows. BMAD and OpenSpec both use persistent artefacts — the docs/ folder and the accumulating source-of-truth document respectively — that outlast any single AI session.

What should I look for when evaluating an SDD framework’s long-term sustainability?

Bus factor first: is the framework maintained by a single person (OpenSpec/Fission AI), an organisation (GitHub/SpecKit, AWS/Kiro), or a broad community (BMAD, GSD)? Then community activity: commit frequency, issue response time, documentation completeness. Then ecosystem lock-in: how easy is it to migrate specs and workflows if needed? RanTheBuilder’s benchmark provides community health data for BMAD (458 commits/90 days, 94.3% issue close rate), SpecKit (37 commits/90 days, 36.8% issue close rate), and OpenSpec (158 commits/90 days, 24.2% issue close rate).

Can I use multiple SDD frameworks in the same organisation?

Yes. The three-tier model is a selection framework, not a mandate to pick one and exclude all others. A common pattern: Cursor Plan Mode for individual developers, BMAD or GSD for team-level feature specs, and Tessl’s Spec Registry as a complement to any framework for API-heavy work. OpenSpec’s delta-spec model can be layered on top of a GSD or BMAD greenfield spec once the codebase matures into brownfield territory.