You’ve just inherited a Java 8 codebase. 200,000 lines of business logic. Your board wants it modernised. Fast.
Most spec-driven content talks about greenfield projects—starting fresh with perfect specs and AI doing the heavy lifting. But that’s not you. You’re looking at legacy systems with decades of undocumented decisions, hardcoded workarounds, and implicit business rules that nobody wrote down because the person who understood them left five years ago.
AI-assisted migration isn’t magic. It requires a strategic approach and realistic expectations. This guide covers migration decision frameworks, proven patterns like the Strangler Fig, and hybrid workflows that combine AI with manual coding. What works, what doesn’t, and when to stick with traditional approaches.
We’ll walk through COBOL to Java migrations with real success metrics, Java version upgrades comparing OpenRewrite and AI-assisted tools, and API migrations like REST to GraphQL. The goal is practical strategies you can apply to your actual migration projects.
How do I assess if my legacy code is actually modernisable with AI tools?
Assess your legacy code modernisation potential by evaluating three factors: code complexity, test coverage, and documentation state.
Start with complexity analysis. Code with cyclomatic complexity under 15 per function works well with AI migration. Above that, AI starts making mistakes because it can’t follow all the branching logic paths. Use static analysis tools like SonarQube to get these metrics across your codebase.
Test coverage significantly impacts migration safety. You need minimum 60% coverage for safe AI migration. Without comprehensive tests, you can’t validate that the AI-generated code preserves the original behaviour. No tests means you’re flying blind—and that’s when migrations go sideways.
Documentation state determines your timeline. Legacy systems often lack the specifications required for spec-driven approaches, requiring costly reverse engineering. Missing specs add 30-40% to your project timeline.
Then there’s context window limitations. Files over 2000 lines need chunking, which loses context. Claude handles 200k tokens, GPT-4 handles 128k, Gemini handles 1M—but large legacy files still cause problems. The AI loses track of dependencies between sections when you split them up.
These factors combine into a risk score. Low complexity, high test coverage, decent documentation? AI-assisted migration works. High complexity, poor tests, missing specs? You’re looking at a traditional Global Systems Integrator-led migration.
Warning signs that AI migration is too risky: security-critical financial calculations, real-time performance requirements, complex state machines, undocumented business logic, regulatory compliance requirements.
So run your codebase through static analysis. Look at the metrics. Be honest about test coverage. Check what documentation exists. That assessment tells you which path to take.
What is the Strangler Fig Pattern and how does it work for AI-powered legacy modernisation?
The Strangler Fig Pattern enables safe incremental migration by running new AI-generated code alongside legacy systems, gradually routing traffic to the new implementation while preserving the legacy fallback for risk mitigation.
It’s named after strangler fig trees that grow around host trees and eventually replace them, and it provides a controlled approach to modernisation. Your existing application continues functioning during the modernisation effort. No big-bang cutover. No “hope this works” moments.
Here’s how it works. A facade or proxy intercepts requests going to the back-end legacy system. This proxy routes requests either to the legacy application or to the new services. You start with all traffic going to legacy. Then you gradually shift specific functionality to the new implementation.
The rollback capability makes this pattern valuable for AI migration. AI-generated code looks good in testing but sometimes behaves unexpectedly in production. With the Strangler Fig Pattern, you keep the legacy code running. If the AI-generated code fails, you route traffic back to the legacy system. No downtime. No emergency.
Implementation uses three components: an API gateway for routing, feature flags for traffic control, and monitoring to validate behaviour. Feature flags let you control the rollout percentage—start at 5%, watch the metrics, move to 25%, then 50%, then 100%. Monitoring compares outputs between legacy and new implementations to catch discrepancies early.
For AI migration specifically, the pattern reduces risk. Generate modern code with AI. Deploy it parallel to legacy. Route 5% of traffic to test behaviour with real production data. Monitor for errors, performance issues, incorrect outputs. If everything looks good, increase traffic. If something breaks, route back to legacy and fix the AI-generated code.
A typical phased rollout runs 5% → 25% → 50% → 100% over 4-6 weeks. The gradual approach lets you detect problems before they affect most users.
The pattern works because it acknowledges reality: AI-generated code isn’t perfect on the first try. But with proper testing in production using real traffic, you validate behaviour incrementally and maintain the ability to rollback at any point.
How do I migrate COBOL systems to Java using AI-assisted code generation?
COBOL to Java migration using AI achieves 93% accuracy with multi-agent orchestration, but requires manual intervention for complex business logic and undocumented dependencies.
Bankdata achieved 93% accuracy with AI-driven COBOL to Java conversion, reducing code complexity by 35% and coupling by 33%. The remaining 7% required manual intervention—and that 7% represents the most complex, business-critical code.
The multi-agent approach uses specialised agents for analysis, transformation, testing, documentation, and coordination. Microsoft Semantic Kernel orchestrates these agents. The COBOLAnalyzerAgent performs deep semantic analysis. The DependencyMapperAgent maps dependencies between programs and copybooks. The JavaConverterAgent generates modern, microservice-ready code.
Timeline runs 6-12 months for medium complexity mainframe applications (200k-500k lines). That’s 40-60% faster than traditional GSI-led migration.
Dependency mapping poses significant challenges. COBOL systems include complex call chains where programs invoke other programs that invoke copybooks that modify shared state. AI models struggle to correctly interpret business logic embedded in decades-old COBOL code. Undocumented behaviours, hardcoded workarounds, implicit domain rules—none of that appears in formal specs because it was never formally specified.
Business logic preservation requires golden master testing. Capture outputs from the legacy COBOL system across diverse inputs. Run the same inputs through the migrated Java code. Compare results. Any discrepancy indicates the migration didn’t preserve behaviour.
Tool selection matters. Microsoft Semantic Kernel works for complex COBOL migration with multi-agent orchestration. GitHub Copilot handles simpler transformations. Amazon Q Code Transformation focuses on enterprise compliance and security scanning. Most successful projects use a hybrid approach—different tools for different components based on complexity.
When COBOL migration fails: security-critical financial systems where accuracy is non-negotiable, real-time performance requirements that can’t tolerate Java’s garbage collection, mainframes with complex job scheduling, systems with extensive undocumented business logic.
Cost comparison shows AI-assisted migration runs 30-50% cheaper than traditional approaches when code is suitable. But unsuitable code makes AI more expensive due to rework. Run your risk assessment first. Then decide on AI-assisted versus traditional migration.
Factor in 2-3 months for parallel running and validation before full cutover.
What’s the difference between OpenRewrite and AI-assisted migration for Java version upgrades?
OpenRewrite provides deterministic recipe-based transformations for well-defined Java migrations, while AI-assisted tools handle complex custom code requiring context understanding.
OpenRewrite uses recipes for standard framework migrations, dependency updates, and API changes. Zero hallucinations. Repeatable transformations. Extensive recipe library covering Java 8/11/17/21 upgrades, Spring Framework migrations, JUnit 4 to 5. No context limits because it’s rule-based, not LLM-based. Lower cost—you’re not paying for AI tokens.
But OpenRewrite only does simple replacements and cannot grasp overarching context. Custom frameworks? Unique architectural patterns? Business logic transformation? OpenRewrite struggles.
AI-assisted tools handle what OpenRewrite misses. They understand business context, adapt to unique patterns, generate tests, and work with custom code that doesn’t follow standard patterns.
The hybrid approach delivers the best results. Use OpenRewrite first for standard patterns, then AI for remaining customizations—this reduces cost by 50-70%. OpenRewrite handles 60-70% of transformations deterministically. AI addresses the remaining 30-40% of custom code. Manual review covers 5-10% of edge cases.
One migration example shows 2300 lines of Java code migrated from JDK 8 to JDK 17 in about 1.5 minutes. That’s OpenRewrite doing the heavy lifting on standard transformations.
When to use OpenRewrite: Java version upgrades, Spring Framework migrations, well-defined API changes, dependency updates. It’s fast, deterministic, and cheap.
When to use AI-assisted migration: Custom frameworks, unique architectural patterns, business logic transformation, legacy code without standard patterns.
When to use both: Most real-world Java migrations. Let OpenRewrite handle the standard transformations, then bring in AI for the custom code that doesn’t fit recipe patterns.
How do I implement a hybrid workflow that combines AI code generation with manual coding for migration projects?
Effective hybrid migration workflows allocate tasks strategically: AI handles boilerplate transformations (60-70%), manual coding addresses security-critical components (10-15%), and collaborative review validates business logic (15-25%).
Task allocation follows security and complexity patterns. AI excels at generating boilerplate code, database models, and basic layouts—the repetitive stuff. Manual coding handles security-critical code: authentication, authorisation, cryptographic implementations, financial calculations. Complex business logic gets the hybrid approach: AI generates initial code, domain experts validate behaviour and edge cases.
Developers hold AI-generated code to the same standards as code written by human teammates. Every piece of AI-generated code goes through human validation. You’re checking for accuracy, security vulnerabilities, performance implications, and maintainability.
Team structure distributes work by experience level. Senior developers validate AI output and handle the 7-10% of complex scenarios that AI can’t solve. Mid-level developers refine prompts and handle edge cases. Junior developers manage test automation.
When to stop using AI and switch to manual coding: performance-sensitive algorithms where milliseconds matter, cryptographic implementations where errors create vulnerabilities, complex state management with race conditions, security-critical code in financial or healthcare domains.
REST to GraphQL migration shows the split clearly. AI generates GraphQL schemas from REST endpoints—that’s straightforward mapping. AI creates basic resolvers for simple CRUD operations. Manual design handles schema structure decisions: how to organise types, what relationships to expose, caching strategies. Manual coding implements complex business logic in resolvers.
Common mistakes in hybrid workflows: over-relying on AI for security code, skipping manual review to save time, inadequate testing of AI transformations, treating AI as infallible, not investing in prompt engineering skills.
What are the realistic limitations of spec-driven development for code migration?
Spec-driven migration fails when legacy code lacks specifications (requires 30-40% more time for reverse engineering), exceeds context windows (files over 2000 lines), or contains undocumented business logic.
The specification gap presents the primary challenge. Legacy code complexity includes systems developed and patched over decades where original developers are gone and documentation is incomplete or absent. Codebases include non-standard constructs, embedded business rules, platform-specific optimisations. None of that appears in specs because it was never formally specified.
Context window limitations remain despite advances in AI models. Large legacy files still need chunking, which loses context between sections. Files over 2000 lines need intelligent splitting at logical boundaries, but often the dependencies are too tangled.
Even high accuracy rates leave the most complex, business-critical code requiring manual intervention. This remaining fraction often requires disproportionate effort relative to its size.
The big-bang rewrite anti-pattern fails because attempting full AI-generated rewrites creates high risk, difficult rollback, and overwhelming testing challenges. Incremental approaches win because they let you validate each piece before moving to the next.
Hidden dependencies create failure scenarios AI can’t handle. Runtime dependencies that only appear under specific conditions. Implicit business rules where one module depends on side effects from another module. State machines with transitions that aren’t documented. AI can’t detect these from code analysis alone.
Cost reality check: AI-assisted migration isn’t always cheaper. Reverse engineering specs adds 30-40% to timelines. Validation adds 20-30% to costs. Remediation of AI errors adds 10-20%. If your legacy code characteristics don’t fit AI capabilities, traditional migration might cost less.
When to call traditional GSI: security certifications required (financial, healthcare, government systems), regulatory compliance demands formal verification, risk profile too high for AI experimentation, code characteristics exceed AI capabilities.
Some systems are too risky for AI experimentation. Mainframe systems running core banking. Healthcare systems where errors harm patients. Trading systems where milliseconds and accuracy determine profitability. Use traditional approaches for these. The risk isn’t worth the potential cost savings.
How do I test AI-migrated code to ensure it preserves business logic?
Validate AI-migrated code using golden master testing to capture legacy outputs, parallel running to compare production behaviour, and domain expert review for business logic accuracy.
Golden master testing captures outputs from the legacy system across diverse inputs. Run the same inputs through migrated code. Automatically compare results. Any discrepancy indicates the migration didn’t preserve behaviour.
Implementation requires comprehensive test data covering edge cases, boundary conditions, historical production scenarios, and regulatory compliance test cases. Generate this data from production logs where possible.
Parallel running deploys migrated code alongside legacy. Route duplicate traffic to both systems. Monitor discrepancies in real-time. This validates behaviour with actual production workloads.
Shadow testing runs new implementations processing production requests in parallel with legacy components without returning results to users. You get real-world validation without risk.
Incremental canary deployments expose new implementations to limited traffic volumes. Start with 5% of traffic. Monitor performance, correctness, resource utilisation. If metrics look good, expand to 25%, then 50%, then 100%.
The integration with Strangler Fig Pattern provides rollback capability. Feature flags control which implementation handles requests. Define rollback criteria before deployment: error rate above X%, response time above Y milliseconds, Z output discrepancies per hour. Automate the rollback when metrics breach thresholds.
Common testing mistakes: relying only on AI-generated tests (they miss what the AI misses), insufficient edge case coverage, skipping the parallel running phase to save time, not involving domain experts in validation, inadequate monitoring during rollout.
Testing timeline matters. Budget 2-3 months for parallel running and validation before full cutover. Don’t rush this phase—catching problems in parallel running is cheap, catching them after cutover is expensive.
FAQ
Can AI tools handle REST to GraphQL API migration automatically?
AI tools generate GraphQL schemas from REST endpoints and basic resolvers, but schema design decisions requiring business context—how to structure types, what relationships to expose, caching strategies—need human judgment. Expect AI to handle 50-60% of straightforward mappings. Manual design is required for complex business logic in resolvers and optimisation decisions.
Which AI coding tool is best for Java legacy modernisation?
Best tool depends on use case: Amazon Q Code Transformation for enterprise Java version upgrades with compliance needs, GitHub Copilot Agent Mode for developer-centric multi-file refactoring, OpenRewrite for deterministic recipe-based migrations, Microsoft Semantic Kernel for complex COBOL-to-Java with multi-agent orchestration. Hybrid approach using multiple tools typically delivers best results.
How long does AI-assisted COBOL to Java migration take compared to traditional approaches?
AI-assisted COBOL migration typically takes 6-12 months for medium complexity applications (200k-500k lines), about 40-60% faster than traditional GSI-led migration (10-18 months). However, add 30-40% time for specification reverse engineering if legacy documentation is poor, and factor in 2-3 months for parallel running and validation before full cutover.
What’s the failure rate for AI-assisted code migration projects?
Limited public data exists, but case studies suggest 70-80% success rate for well-scoped AI migrations with proper testing. Common failure causes: inadequate test coverage before migration (35%), context window limitations on large files (25%), undocumented business logic not captured (20%), security vulnerabilities in AI-generated code (15%), performance regressions (5%).
Should I use incremental refactoring or big-bang rewrite for legacy modernisation?
Incremental refactoring using Strangler Fig Pattern is safer and more successful for legacy modernisation. Big-bang rewrites fail 60-70% of the time due to scope creep, inadequate testing, and inability to rollback. Incremental approach allows validation at each step, reversible changes, gradual confidence building, and lower business risk.
How do I manage context window limitations with large legacy codebases?
Manage context limits through intelligent chunking at logical boundaries (classes, modules), dependency-aware splitting that keeps related code together, iterative processing where each chunk informs the next, and selective context inclusion focusing on business logic. For files over 2000 lines, consider manual decomposition before AI migration. Even models with large context windows struggle with entire legacy applications when dependencies span multiple modules.
What skills does my team need for AI-assisted migration projects?
Critical skills: legacy codebase expertise (understand business logic), prompt engineering (effective AI instructions), test automation (comprehensive validation), code review (validate AI output), and domain knowledge (ensure business logic preservation). Senior developers should lead validation, mid-level developers refine prompts and handle edge cases, and junior developers manage test automation. Don’t expect AI to replace domain expertise.
Can OpenRewrite and GitHub Copilot work together for Java migrations?
Yes, hybrid approach is optimal: Use OpenRewrite first for standard framework migrations, dependency updates, and API changes (handles 60-70% deterministically), then apply GitHub Copilot Agent Mode for custom business logic transformation and complex refactoring (handles remaining 30-40%). This combination reduces cost, improves accuracy, and leverages strengths of both approaches.
How do I know when to stop AI migration and use traditional GSI approach?
Use traditional GSI when security certifications required (financial, healthcare, government), regulatory compliance demands formal verification, AI risk assessment shows high failure probability, cost-benefit analysis favours traditional approach, or legacy code characteristics exceed AI capabilities (extreme complexity, poor documentation, business-critical systems). Some systems are too risky for AI experimentation.
What’s the typical cost breakdown for AI-assisted vs traditional migration?
AI-assisted migration: 25-35% tool costs (AI subscriptions, infrastructure), 40-50% validation and testing, 15-25% remediation of AI errors, 10-15% project management. Traditional GSI: 60-70% labour costs, 15-20% project management, 10-15% testing, 5-10% tools. AI-assisted typically 30-50% cheaper overall, but only when code is suitable for AI migration. Poor code fit can make AI approach more expensive due to rework.
How do I implement rollback procedures for failed AI migrations?
Implement rollback through feature flags allowing instant traffic routing back to legacy code, comprehensive monitoring with automated rollback triggers (error rate thresholds, performance degradation), version control with tagged rollback points, database migration reversibility (backward-compatible schema changes), and documented rollback runbooks. Test rollback procedures before production deployment. Strangler Fig Pattern makes rollback straightforward by maintaining legacy code parallel to new implementation.
What are the security risks of AI-generated migration code?
Key security risks: SQL injection vulnerabilities in generated database code, authentication bypass in access control logic, cryptographic implementation errors, race conditions in concurrent code, and information disclosure through verbose error handling. Mitigation: security-focused code review for all AI-generated code, automated security scanning (SAST tools), penetration testing of migrated components, and manual coding of security-critical components instead of AI generation.