You’ve just inherited a Java 8 codebase. 200,000 lines of business logic. Your board wants it modernised. Fast.
Most spec-driven content talks about greenfield projects—starting fresh with perfect specs and AI doing the heavy lifting. But you’re not building from scratch. You’re looking at legacy systems with decades of undocumented decisions, hardcoded workarounds, and implicit business rules that nobody wrote down because the person who understood them left five years ago.
AI-assisted migration isn’t magic. It requires a strategic approach and realistic expectations. This guide is part of our comprehensive guide to spec-driven development, covering advanced migration decision frameworks, proven patterns like the Strangler Fig, and hybrid workflows that combine AI with manual coding. You’ll learn what works, what doesn’t, and when to stick with traditional approaches.
We’ll walk through COBOL to Java migrations with real success metrics, Java version upgrades comparing OpenRewrite and AI-assisted tools, and API migrations like REST to GraphQL. The goal is practical strategies you can apply to your actual migration projects.
How do I assess if my legacy code is actually modernisable with AI tools?
Assess your legacy code modernisation potential by evaluating three factors: code complexity, test coverage, and documentation state.
Start with complexity analysis. Code with cyclomatic complexity under 15 per function works well with AI migration. Above that, AI starts making mistakes because it can’t follow all the branching logic paths. Use static analysis tools like SonarQube to get these metrics across your codebase.
Test coverage significantly impacts migration safety. You need minimum 60% coverage for safe AI migration. Without comprehensive tests, you can’t validate that the AI-generated code preserves the original behaviour. No tests means you’re flying blind—and that’s when migrations go sideways.
Documentation state determines your timeline. Legacy systems often lack the specifications required for spec-driven approaches, requiring costly reverse engineering. Missing specs add 30-40% to your project timeline. You’ll spend weeks extracting business logic from code, comments, and whatever technical documentation exists. For guidance on creating effective specifications from legacy code, see our advanced specification patterns guide.
Context window limitations create hard boundaries. Files over 2000 lines need chunking, which loses context. Claude handles 200k tokens, GPT-4 handles 128k, Gemini handles 1M—but large legacy files still cause problems. The AI loses track of dependencies between sections when you split them up.
Dependency depth creates another constraint. More than 5 levels of dependencies requires manual mapping. AI struggles to track complex call chains where modules depend on modules that depend on other modules. It misses the implicit connections that make legacy systems work.
These factors combine into a risk score. Low complexity, high test coverage, decent documentation? AI-assisted migration works. High complexity, poor tests, missing specs? You’re looking at a traditional Global Systems Integrator-led migration. The hybrid approach—AI for simple transformations, manual for complex logic—sits in the middle.
Warning signs that AI migration is too risky: security-critical financial calculations, real-time performance requirements, complex state machines, undocumented business logic that only exists in the code, regulatory compliance requirements.
Run your codebase through static analysis. Look at the metrics. Be honest about test coverage. Check what documentation exists. That assessment tells you which path to take.
What is the Strangler Fig Pattern and how does it work for AI-powered legacy modernisation?
The Strangler Fig Pattern enables safe incremental migration by running new AI-generated code alongside legacy systems, gradually routing traffic to the new implementation while preserving the legacy fallback for risk mitigation.
Named after strangler fig trees that grow around host trees and eventually replace them, the pattern provides a controlled approach to modernisation. Your existing application continues functioning during the modernisation effort. No big-bang cutover. No “hope this works” moments.
Here’s how it works. A facade or proxy intercepts requests going to the back-end legacy system. This proxy routes requests either to the legacy application or to the new services. You start with all traffic going to legacy. Then you gradually shift specific functionality to the new implementation.
The rollback capability makes this pattern valuable for AI migration. AI-generated code looks good in testing but sometimes behaves unexpectedly in production. With the Strangler Fig Pattern, you keep the legacy code running. If the AI-generated code fails, you route traffic back to the legacy system. No downtime. No emergency.
Implementation uses three components: an API gateway for routing, feature flags for traffic control, and monitoring to validate behaviour. The API gateway sits between your users and your systems, deciding which implementation handles each request. Feature flags let you control the rollout percentage—start at 5%, watch the metrics, move to 25%, then 50%, then 100%. Monitoring compares outputs between legacy and new implementations to catch discrepancies early.
Shopify used this pattern to refactor their Shop model, a God Object with over 3,000 lines of code. They created a new interface, redirected existing calls, created a new data source, and gradually transitioned read operations. Zero downtime. Continuous validation. Reversible changes at every step.
For AI migration specifically, the pattern reduces risk. Generate modern code with AI. Deploy it parallel to legacy. Route 5% of traffic to test behaviour with real production data. Monitor for errors, performance issues, incorrect outputs. If everything looks good, increase traffic. If something breaks, route back to legacy and fix the AI-generated code.
A typical phased rollout runs 5% → 25% → 50% → 100% over 4-6 weeks. Spend a week at each phase. Watch the metrics. Look for edge cases the AI missed. The gradual approach lets you detect problems before they affect most users.
The pattern works because it acknowledges reality: AI-generated code isn’t perfect on the first try. But with proper testing in production using real traffic, you validate behaviour incrementally and maintain the ability to rollback at any point.
How do I migrate COBOL systems to Java using AI-assisted code generation?
COBOL to Java migration using AI achieves 93% accuracy with multi-agent orchestration, but requires manual intervention for complex business logic and undocumented dependencies.
Bankdata achieved 93% accuracy with AI-driven COBOL to Java conversion, reducing code complexity by 35% and coupling by 33%. The remaining 7% required manual intervention—and that 7% represents the most complex, business-critical code.
The multi-agent approach uses specialised agents for analysis, transformation, testing, documentation, and coordination. Microsoft Semantic Kernel orchestrates these agents, distributing tasks across the system. The COBOLAnalyzerAgent performs deep semantic analysis extracting program structure, data divisions, variable definitions, procedure flow, and SQL statements. The DependencyMapperAgent maps dependencies between programs and copybooks, identifying complexity for each component. The JavaConverterAgent generates modern, microservice-ready code—in Bankdata’s case, using Quarkus.
Timeline runs 6-12 months for medium complexity mainframe applications (200k-500k lines). That’s 40-60% faster than traditional GSI-led migration, which typically takes 10-18 months. Poor legacy documentation requires months of specification reverse engineering.
Dependency mapping poses significant challenges. COBOL systems include complex call chains where programs invoke other programs that invoke copybooks that modify shared state. AI models struggle to correctly interpret business logic embedded in decades-old COBOL code. Undocumented behaviours, hardcoded workarounds, implicit domain rules—none of that appears in formal specs because it was never formally specified.
Business logic preservation requires golden master testing. Capture outputs from the legacy COBOL system across diverse inputs. Run the same inputs through the migrated Java code. Compare results. Any discrepancy indicates the migration didn’t preserve behaviour. This testing strategy catches the subtle business rules that weren’t documented anywhere but existed in the code.
Tool selection matters. Microsoft Semantic Kernel works for complex COBOL migration with multi-agent orchestration. GitHub Copilot handles simpler transformations with fewer dependencies. Amazon Q Code Transformation focuses on enterprise compliance and security scanning. Most successful projects use a hybrid approach—different tools for different components based on complexity. For detailed guidance on choosing tools for legacy systems, consider factors like brownfield support and context window limits.
When COBOL migration fails: security-critical financial systems where accuracy is non-negotiable, real-time performance requirements that can’t tolerate Java’s garbage collection, mainframes with complex job scheduling that doesn’t map cleanly to modern architectures, systems with extensive undocumented business logic where reverse engineering costs exceed rewrite costs.
Cost comparison shows AI-assisted migration runs 30-50% cheaper than traditional approaches when code is suitable. But unsuitable code makes AI more expensive due to rework. Run your risk assessment first. Understand your code characteristics. Then decide on AI-assisted versus traditional migration.
Factor in 2-3 months for parallel running and validation before full cutover. You need time to validate behaviour with production traffic before decommissioning the COBOL system. For comprehensive guidance on production validation for migrations, consider implementing systematic validation frameworks alongside your migration strategy.
What’s the difference between OpenRewrite and AI-assisted migration for Java version upgrades?
OpenRewrite provides deterministic recipe-based transformations for well-defined Java migrations, while AI-assisted tools handle complex custom code requiring context understanding.
OpenRewrite uses recipes for standard framework migrations, dependency updates, and API changes. Zero hallucinations. Repeatable transformations. Extensive recipe library covering Java 8/11/17/21 upgrades, Spring Framework migrations, JUnit 4 to 5, and well-defined transformation patterns. No context limits because it’s rule-based, not LLM-based. Lower cost—you’re not paying for AI tokens.
But OpenRewrite only does simple replacements and cannot grasp overarching context. It sometimes makes incomplete or erroneous updates when code doesn’t match the recipe patterns exactly. Custom frameworks? Unique architectural patterns? Business logic transformation? OpenRewrite struggles.
AI-assisted tools handle what OpenRewrite misses. They understand business context, adapt to unique patterns, generate tests, and work with custom code that doesn’t follow standard patterns. Amazon Q Code Transformation provides enterprise-focused Java migration with security scanning and compliance. GitHub Copilot Agent Mode offers developer-centric multi-file coordination.
The hybrid approach delivers the best results. Use OpenRewrite first for standard patterns, then AI for remaining customizations—this reduces cost by 50-70%. OpenRewrite handles 60-70% of transformations deterministically. AI addresses the remaining 30-40% of custom code. Manual review covers 5-10% of edge cases.
One migration example shows 2300 lines of Java code migrated from JDK 8 to JDK 17 in about 1.5 minutes, including the Javax to Jakarta migration. That’s OpenRewrite doing the heavy lifting on standard transformations.
The workflow combines OpenRewrite recipes for standard transformations with LLM-based debugging for edge cases. When OpenRewrite transformations cause compilation or test failures, AI tools analyse the build output to generate fixes. This semi-automatic, context-aware approach handles the edge cases that recipes miss.
When to use OpenRewrite: Java version upgrades, Spring Framework migrations, well-defined API changes, dependency updates, any transformation with an existing recipe. It’s fast, deterministic, and cheap.
When to use AI-assisted migration: Custom frameworks, unique architectural patterns, business logic transformation, legacy code without standard patterns, scenarios requiring context understanding beyond simple replacement rules.
When to use both: Most real-world Java migrations. Let OpenRewrite handle the standard transformations, then bring in AI for the custom code that doesn’t fit recipe patterns.
How do I implement a hybrid workflow that combines AI code generation with manual coding for migration projects?
Effective hybrid migration workflows allocate tasks strategically: AI handles boilerplate transformations (60-70%), manual coding addresses security-critical components (10-15%), and collaborative review validates business logic (15-25%).
Task allocation follows security and complexity patterns. AI excels at generating boilerplate code, database models, and basic layouts—the repetitive stuff. Manual coding handles security-critical code: authentication, authorization, cryptographic implementations, financial calculations, anything where errors have serious consequences. Complex business logic gets the hybrid approach: AI generates initial code, domain experts validate behaviour and edge cases.
Developers hold AI-generated code to the same standards as code written by human teammates. Every piece of AI-generated code goes through human validation. You’re checking for accuracy, security vulnerabilities, performance implications, and maintainability. The code review checklist for AI migrations focuses on business logic preservation, edge case handling, and whether the AI understood the requirements correctly.
Team structure distributes work by experience level. Senior developers validate AI output and handle the 7-10% of complex scenarios that AI can’t solve. Mid-level developers refine prompts, handle edge cases, and iterate on AI-generated code. Junior developers manage test automation—running the tests, tracking coverage, reporting failures.
Prompt engineering becomes a skill your team needs. Context management strategies for large codebases, effective prompts for code transformation, techniques for handling files that exceed context windows. Developers desire involvement not only during code review, but also in directing the AI to produce desired output using their knowledge and the AI’s capabilities.
When to stop using AI and switch to manual coding: performance-sensitive algorithms where milliseconds matter, cryptographic implementations where errors create vulnerabilities, complex state management with race conditions, security-critical code in financial or healthcare domains, scenarios where AI repeatedly generates incorrect code after prompt refinement.
REST to GraphQL migration shows the split clearly. AI generates GraphQL schemas from REST endpoints—that’s straightforward mapping. AI creates basic resolvers for simple CRUD operations. Manual design handles schema structure decisions: how to organise types, what relationships to expose, caching strategies. Manual coding implements complex business logic in resolvers. Hybrid review validates that the GraphQL API preserves REST API behaviour.
Common mistakes in hybrid workflows: over-relying on AI for security code, skipping manual review to save time, inadequate testing of AI transformations, treating AI as infallible, not investing in prompt engineering skills.
Organisations treating AI as a process challenge rather than a technology challenge achieve better outcomes. This means establishing code review processes, defining what AI can and can’t do, training teams on effective AI usage, and building workflows that combine AI and manual work strategically.
The iterative approach works best. Generate small code segments with AI. Test them. Make manual improvements. Feed those refinements back into AI prompts. This cycle of AI generation, testing, manual improvement, and prompt refinement produces better results than trying to generate everything at once.
What are the realistic limitations of spec-driven development for code migration?
Spec-driven migration fails when legacy code lacks specifications (requires 30-40% more time for reverse engineering), exceeds context windows (files over 2000 lines), or contains undocumented business logic.
The specification gap presents the primary challenge. Legacy code complexity includes systems developed and patched over decades where original developers are gone and documentation is incomplete or absent. Codebases include non-standard constructs, embedded business rules, platform-specific optimisations tightly coupled with hardware. None of that appears in specs because it was never formally specified.
Reverse engineering specs from code means extracting business logic essence from the code itself, existing comments, technical documentation, user handbooks, and conversations with subject matter experts. That process adds 20-30% to costs.
Context window limitations remain despite advances in AI models. Large legacy files still need chunking, which loses context between sections. Files over 2000 lines need intelligent splitting at logical boundaries, but dependency-aware splitting that keeps related code together often isn’t possible—the dependencies are too tangled.
Even high accuracy rates achieved by multi-agent systems still leave the most complex, business-critical code requiring manual intervention. This remaining fraction often requires disproportionate effort relative to its size.
The big-bang rewrite anti-pattern fails because attempting full AI-generated rewrites creates high risk, difficult rollback, and overwhelming testing challenges. Trying to migrate everything at once means you can’t validate incrementally. When problems appear—and they will—you’re stuck. Incremental approaches win because they let you validate each piece before moving to the next.
Hidden dependencies create failure scenarios AI can’t handle. Runtime dependencies that only appear under specific conditions. Implicit business rules where one module depends on side effects from another module. State machines with transitions that aren’t documented. AI can’t detect these from code analysis alone.
Early experimentation with GPT-4 and GitHub Copilot resulted in a mix of educated guesses and hallucinational gibberish for COBOL migration. The AI didn’t understand the code well enough to preserve behaviour. Later iterations with better prompting and multi-agent architectures improved results, but the fundamental limitation remains: AI needs context, and legacy code often lacks the structure to provide that context.
Cost reality check: AI-assisted migration isn’t always cheaper. Reverse engineering specs adds 30-40% to timelines. Validation adds 20-30% to costs. Remediation of AI errors adds 10-20%. If your legacy code characteristics don’t fit AI capabilities, traditional migration might cost less.
When to call traditional GSI: security certifications required (financial, healthcare, government systems), regulatory compliance demands formal verification, risk profile too high for AI experimentation, cost-benefit analysis favours traditional approach, code characteristics exceed AI capabilities (extreme complexity, poor documentation, business-critical systems where errors are unacceptable).
Some systems are too risky for AI experimentation. Mainframe systems running core banking. Healthcare systems where errors harm patients. Trading systems where milliseconds and accuracy determine profitability. Use traditional approaches for these. The risk isn’t worth the potential cost savings.
How do I test AI-migrated code to ensure it preserves business logic?
Validate AI-migrated code using golden master testing to capture legacy outputs, parallel running to compare production behaviour, and domain expert review for business logic accuracy.
Golden master testing captures outputs from the legacy system across diverse inputs. Run the same inputs through migrated code. Automatically compare results. Any discrepancy indicates the migration didn’t preserve behaviour. This testing strategy is fundamental for migration projects—it’s the only way to verify the AI understood the business logic correctly. For systematic testing strategies for legacy migrations, implement comprehensive validation at every stage of your transformation.
Implementation requires comprehensive test data covering edge cases, boundary conditions, historical production scenarios, and regulatory compliance test cases. Generate this data from production logs where possible. Supplement with synthetic data for edge cases you know exist but don’t appear frequently in production.
Parallel running deploys migrated code alongside legacy. Route duplicate traffic to both systems. Monitor discrepancies in real-time. This validates behaviour with actual production workloads—the realistic scenarios that test suites often miss.
Shadow testing runs new implementations processing production requests in parallel with legacy components without returning results to users. You get real-world validation without risk. The shadow system processes every request. You compare outputs. Users only see results from the legacy system until validation passes.
Incremental canary deployments expose new implementations to limited traffic volumes. Start with 5% of traffic. Monitor performance, correctness, resource utilisation. If metrics look good, expand to 25%, then 50%, then 100%. This gradual rollout builds confidence and catches problems before they affect all users.
The integration with Strangler Fig Pattern provides rollback capability. Feature flags control which implementation handles requests. Monitoring tracks error rates, response times, output discrepancies. Define rollback criteria before deployment: error rate above X%, response time above Y milliseconds, Z output discrepancies per hour. Automate the rollback when metrics breach thresholds.
Airbnb broke migration into discrete, per-file steps that could be paralleled. Each file advanced through the pipeline only if the current step succeeded. Stages included transformation, fixes, lint and TypeScript checks, and final validation. This step-based approach enabled tracking progress, improving failure rates for specific steps, and rerunning files when needed.
Each file was stamped with a machine-readable comment recording its migration progress. This visibility helped the team identify common failure points, repeat offenders, and areas where AI-generated code needed help. The annotations provided feedback for improving prompts and processes.
Common testing mistakes: relying only on AI-generated tests (they miss what the AI misses), insufficient edge case coverage, skipping the parallel running phase to save time, not involving domain experts in validation, inadequate monitoring during rollout.
Testing timeline matters. Budget 2-3 months for parallel running and validation before full cutover. You need time to observe behaviour across different scenarios, different load patterns, different times of day. Don’t rush this phase—catching problems in parallel running is cheap, catching them after cutover is expensive.
FAQ
Can AI tools handle REST to GraphQL API migration automatically?
AI tools generate GraphQL schemas from REST endpoints and basic resolvers, but schema design decisions requiring business context—how to structure types, what relationships to expose, caching strategies—need human judgment. Expect AI to handle 50-60% of straightforward mappings. Manual design is required for complex business logic in resolvers and optimisation decisions.
Which AI coding tool is best for Java legacy modernisation?
Best tool depends on use case: Amazon Q Code Transformation for enterprise Java version upgrades with compliance needs, GitHub Copilot Agent Mode for developer-centric multi-file refactoring, OpenRewrite for deterministic recipe-based migrations, Microsoft Semantic Kernel for complex COBOL-to-Java with multi-agent orchestration. Hybrid approach using multiple tools typically delivers best results.
How long does AI-assisted COBOL to Java migration take compared to traditional approaches?
AI-assisted COBOL migration typically takes 6-12 months for medium complexity applications (200k-500k lines), about 40-60% faster than traditional GSI-led migration (10-18 months). However, add 30-40% time for specification reverse engineering if legacy documentation is poor, and factor in 2-3 months for parallel running and validation before full cutover.
What’s the failure rate for AI-assisted code migration projects?
Limited public data exists, but case studies suggest 70-80% success rate for well-scoped AI migrations with proper testing. Common failure causes: inadequate test coverage before migration (35%), context window limitations on large files (25%), undocumented business logic not captured (20%), security vulnerabilities in AI-generated code (15%), performance regressions (5%).
Should I use incremental refactoring or big-bang rewrite for legacy modernisation?
Incremental refactoring using Strangler Fig Pattern is safer and more successful for legacy modernisation. Big-bang rewrites fail 60-70% of the time due to scope creep, inadequate testing, and inability to rollback. Incremental approach allows validation at each step, reversible changes, gradual confidence building, and lower business risk.
How do I manage context window limitations with large legacy codebases?
Manage context limits through intelligent chunking at logical boundaries (classes, modules), dependency-aware splitting that keeps related code together, iterative processing where each chunk informs the next, and selective context inclusion focusing on business logic. For files over 2000 lines, consider manual decomposition before AI migration. Even models with large context windows struggle with entire legacy applications when dependencies span multiple modules.
What skills does my team need for AI-assisted migration projects?
Critical skills: legacy codebase expertise (understand business logic), prompt engineering (effective AI instructions), test automation (comprehensive validation), code review (validate AI output), and domain knowledge (ensure business logic preservation). Senior developers should lead validation, mid-level developers refine prompts and handle edge cases, and junior developers manage test automation. Don’t expect AI to replace domain expertise.
Can OpenRewrite and GitHub Copilot work together for Java migrations?
Yes, hybrid approach is optimal: Use OpenRewrite first for standard framework migrations, dependency updates, and API changes (handles 60-70% deterministically), then apply GitHub Copilot Agent Mode for custom business logic transformation and complex refactoring (handles remaining 30-40%). This combination reduces cost, improves accuracy, and leverages strengths of both approaches.
How do I know when to stop AI migration and use traditional GSI approach?
Use traditional GSI when security certifications required (financial, healthcare, government), regulatory compliance demands formal verification, AI risk assessment shows high failure probability, cost-benefit analysis favours traditional approach, or legacy code characteristics exceed AI capabilities (extreme complexity, poor documentation, business-critical systems). Some systems are too risky for AI experimentation.
What’s the typical cost breakdown for AI-assisted vs traditional migration?
AI-assisted migration: 25-35% tool costs (AI subscriptions, infrastructure), 40-50% validation and testing, 15-25% remediation of AI errors, 10-15% project management. Traditional GSI: 60-70% labour costs, 15-20% project management, 10-15% testing, 5-10% tools. AI-assisted typically 30-50% cheaper overall, but only when code is suitable for AI migration. Poor code fit can make AI approach more expensive due to rework.
How do I implement rollback procedures for failed AI migrations?
Implement rollback through feature flags allowing instant traffic routing back to legacy code, comprehensive monitoring with automated rollback triggers (error rate thresholds, performance degradation), version control with tagged rollback points, database migration reversibility (backward-compatible schema changes), and documented rollback runbooks. Test rollback procedures before production deployment. Strangler Fig Pattern makes rollback straightforward by maintaining legacy code parallel to new implementation.
What are the security risks of AI-generated migration code?
Key security risks: SQL injection vulnerabilities in generated database code, authentication bypass in access control logic, cryptographic implementation errors, race conditions in concurrent code, and information disclosure through verbose error handling. Mitigation: security-focused code review for all AI-generated code, automated security scanning (SAST tools), penetration testing of migrated components, and manual coding of security-critical components instead of AI generation.
Conclusion
Advanced spec-driven migration requires strategic thinking beyond simple code transformation. Whether you’re modernising COBOL systems, upgrading Java versions, or migrating APIs, success depends on realistic assessment, appropriate tooling, and systematic validation. The Strangler Fig Pattern provides safety through incremental rollout. Hybrid workflows balance AI capabilities with human judgment. Golden master testing validates business logic preservation.
For a complete overview of spec-driven development approaches across the entire development lifecycle, see our spec-driven development overview.