Large language models are revolutionising code migration by embracing failure as a strategy. Airbnb’s migration of 3,500 React test files demonstrated that retry loops and failure-based learning outperform perfect upfront prompting, completing in 6 weeks what would have taken 1.5 years manually.
By scaling context windows to 100,000 tokens and using iterative refinement, organisations achieve unprecedented migration speeds. For businesses facing legacy modernisation challenges, this counter-intuitive methodology turns technical debt from a resource-intensive burden into a systematic, automated process.
The key insight: instead of trying to get migrations right the first time, LLMs excel when allowed to fail, learn, and retry—achieving 97% automation rates while maintaining code quality and test coverage.
How does Airbnb use LLMs for test migration?
Airbnb pioneered LLM-driven test migration by converting 3,500 React component test files from Enzyme to React Testing Library in just 6 weeks, using retry loops and dynamic prompting instead of perfect initial prompts.
The journey began during a mid-2023 hackathon when a team demonstrated that large language models could successfully convert hundreds of Enzyme files to RTL in just a few days. This discovery challenged the conventional wisdom that code migrations required meticulous manual effort. The team had stumbled upon something remarkable—LLMs didn’t need perfect instructions to succeed. They needed permission to fail.
Airbnb’s migration challenge stemmed from their 2020 adoption of React Testing Library for new development, while thousands of legacy tests remained in Enzyme. The frameworks’ fundamental differences meant no simple swap was possible. Manual migration estimates projected 1.5 years of engineering effort—a timeline that would drain resources and stall innovation.
Building on the hackathon success, engineers developed a scalable pipeline that broke migrations into discrete, per-file steps. Each file moved through validation stages like a production line. When a check failed, the LLM attempted fixes. This state machine approach enabled parallel processing of hundreds of files simultaneously, dramatically accelerating simple migrations while systematically addressing complex cases.
The results speak volumes about the approach’s effectiveness. Within 4 hours, 75% of files migrated automatically. After four days of prompt refinement using a “sample, tune, and sweep” strategy, the system reached 97% completion. The total cost—including LLM API usage and six weeks of engineering time—proved far more efficient than the original manual migration estimate.
What made this possible wasn’t sophisticated prompt engineering or complex orchestration. It was the willingness to let the LLM fail repeatedly, learning from each attempt. The remaining 3% of files that resisted automation still benefited from the baseline code generated, requiring only another week of manual intervention to complete the entire migration.
The key to their success wasn’t a perfect plan, but a strategy built on learning from mistakes. This strategy is known as failure-based learning.
What is failure-based learning in LLM code migration?
Failure-based learning is a counter-intuitive approach where LLMs improve migration accuracy through multiple retry attempts, adjusting prompts and strategies based on each failure rather than seeking perfect initial results.
Traditional migration approaches treat failure as something to avoid. Engineers spend considerable time crafting perfect prompts, analysing edge cases, and building comprehensive rule sets. This perfectionist mindset assumes that with enough upfront effort, migrations can proceed smoothly. Yet Airbnb’s experience revealed the opposite—the most effective route to improve outcomes was simply brute force: retry steps multiple times until they passed or reached a limit.
The methodology flips conventional wisdom on its head. Instead of viewing failed migration attempts as wasted effort, each failure becomes valuable data. When an LLM-generated code change breaks tests or fails linting, the system captures the specific error messages.
These errors then inform the next attempt, creating a feedback loop that progressively refines the migration strategy. This is the core of the approach: dynamic prompt adaptation.
Rather than maintaining static prompts, the system modifies its instructions based on accumulated failures. If multiple files fail with similar import errors, the prompt evolves to address that specific pattern. This adaptive behaviour mimics how human developers debug—learning from mistakes and adjusting their approach accordingly.
The benefits extend beyond simple error correction. Failure-based learning naturally handles edge cases that would be impossible to anticipate. Complex architectural patterns, unusual coding styles, and framework-specific quirks all surface through failures. The system doesn’t need comprehensive documentation of every possible scenario—it discovers them through iteration.
Real-world metrics validate this counter-intuitive strategy. Airbnb’s migration achieved 97% automation despite minimal upfront prompt engineering. Files that failed 50 to 100 times eventually succeeded through persistent refinement. This resilience transforms migration from a fragile process requiring perfect understanding into a robust system that adapts to whatever it encounters.
But how does this actually work in practice? The answer lies in the sophisticated retry loop architecture that powers these migrations.
How do retry loops work in automated code migration?
Retry loops create a state machine where each migration step validates, fails, triggers an LLM fix attempt, and repeats until success or retry limit—enabling parallel processing of hundreds of files simultaneously.
The architecture resembles a production pipeline more than traditional batch processing. Each file moves through discrete validation stages: refactoring from the old framework, fixing test failures, resolving linting errors, and passing type checks. Only after passing all validations does a file advance to the next state. This granular approach provides precise failure points for targeted fixes.
State machine design brings structure to chaos. Files exist in defined states—pending, in-progress for each step, or completed. When validation fails at any stage, the system triggers an LLM fix attempt specific to that failure type. A Jest test failure prompts different remediation than a TypeScript compilation error. This specialisation improves fix quality while maintaining clear progress tracking.
Configurable retry limits prevent infinite loops while maximising success rates. Aviator’s implementation uses fallback strategies when primary models fail, automatically switching to alternative LLMs like Claude if GPT-4 struggles with specific patterns. Some files might succeed on the first attempt, while others require dozens of iterations. The system adapts retry strategies based on failure patterns, allocating more attempts to files showing progress.
Parallel processing multiplies the approach’s power. Instead of sequential file processing, hundreds of migrations run simultaneously. Simple files complete quickly, freeing resources for complex cases. This parallelism transforms what would be weeks of sequential work into hours of concurrent execution. The infrastructure scales horizontally—adding more compute resources directly accelerates migration speed.
Performance optimisation techniques further enhance efficiency. The system maintains a cache of successful fix patterns, applying proven solutions before attempting novel approaches. Common failure types develop standardised remediation strategies. Memory of previous attempts prevents repetition of failed approaches, ensuring each retry explores new solution paths.
Yet all this sophisticated processing raises a question: how can an AI system truly understand the complex architecture of legacy code?
How can LLMs understand legacy code architecture?
LLMs achieve architectural understanding by processing expanded context windows up to 100,000 tokens and even larger, analysing cross-file dependencies, maintaining memory of changes, and applying consistent transformation patterns across entire codebases.
Context window scaling fundamentally changes what LLMs can comprehend. Traditional approaches struggled with file-by-file migrations that broke architectural patterns. Modern systems use greedy chunking algorithms to pack maximum code while preserving logical structures. A 100,000 token window can hold entire subsystems, allowing the model to understand how components interact rather than viewing them in isolation.
Multi-file dependency analysis emerges naturally from expanded context. LLM agents read across modules, understand how components interact, and maintain the big picture while making changes. When migrating a service layer, the system simultaneously considers controllers that call it, repositories it depends on, and tests that validate its behaviour. This holistic view prevents breaking changes that file-level analysis would miss.
Memory and reasoning capabilities distinguish modern LLM migration from simple find-replace operations. The system remembers renamed functions, updated import paths, and architectural decisions made earlier in the migration. If a pattern gets refactored in one module, that same transformation applies consistently throughout the codebase. This consistency maintenance would exhaust human developers tracking hundreds of parallel changes.
Architectural pattern recognition develops through exposure to the codebase. LLMs identify framework-specific conventions, naming patterns, and structural relationships. They recognise that certain file types always appear together, that specific patterns indicate test files versus production code, and how error handling cascades through the system. This learned understanding improves migration quality beyond mechanical transformation.
Vector database integration enhances architectural comprehension further. Systems store code embeddings that capture semantic relationships between components. When migrating a component, the system retrieves similar code sections to ensure consistent handling. This semantic search surpasses keyword matching, finding conceptually related code even with different naming conventions.
With this level of understanding, the business case for LLM migration becomes compelling. But what exactly is the return on investment?
What is the ROI of LLM-assisted migration vs manual migration?
LLM-assisted migration reduces time by 50-96% and costs significantly less than manual efforts, with Google reporting 80% AI-authored code and Airbnb completing 1.5 years of work in 6 weeks including all LLM API costs.
Time savings analysis reveals staggering efficiency gains across organisations. Airbnb’s 6-week timeline replaced 1.5 years of projected manual effort—a 96% reduction. Google’s AI-assisted migrations achieve similar acceleration, with formerly multi-day upgrades now completing in hours. Amazon Q Code Transformation upgraded 1000 Java applications in two days, averaging 10 minutes per upgrade versus the previous 2+ days requirement.
Cost breakdown challenges assumptions about AI expense. API usage for thousands of file migrations costs far less than a single developer-month. Airbnb’s entire migration, including compute and engineering time, cost a fraction of manual estimates. The pay-per-use model makes enterprise-scale capabilities accessible to SMBs without infrastructure investment.
Quality metrics dispel concerns about automated code. Migration systems maintain or improve test coverage while preserving code intent. Google’s toolkit achieves >75% of AI-generated changes landing successfully in production. Automated migrations often improve code consistency, applying modern patterns uniformly where manual efforts would vary by developer.
Communication overhead reduction multiplies savings. Manual migrations require extensive coordination—architecture reviews, progress meetings, handoffs between developers. LLM systems eliminate most coordination complexity. A small team can oversee migrations that would traditionally require dozens of developers, freeing skilled engineers for innovation rather than maintenance.
Risk mitigation strengthens the business case further. Manual migrations introduce human error, inconsistent patterns, and timeline uncertainty. Automated systems apply changes uniformly, validate comprehensively, and provide predictable timelines. Failed migrations can be rolled back cleanly, while partial manual migrations often leave codebases in unstable states.
Decision frameworks for SMB CTOs become clearer when considering total cost of ownership. Legacy system maintenance grows more expensive over time—security vulnerabilities, framework incompatibilities, and developer scarcity compound costs. LLM migration transforms a multi-year budget burden into a tactical project measured in weeks, fundamentally changing the economics of technical debt reduction.
These compelling benefits naturally lead to the question: how can you implement this in your own organisation?
How to implement retry loops in LLM migration?
Implementing retry loops requires breaking migrations into discrete steps, setting validation checkpoints, configuring retry limits, using fallback models, and establishing confidence thresholds for manual intervention triggers.
Step-by-step implementation begins with decomposing the migration into atomic operations. Each step must have clear success criteria—tests pass, linting succeeds, types check correctly. Airbnb’s approach created discrete stages: Enzyme refactor, Jest fixes, lint corrections, and TypeScript validation. This granularity enables targeted fixes when failures occur.
Validation checkpoint configuration determines migration quality. Each checkpoint runs specific tests relevant to that migration stage. Unit tests verify functionality preservation. Integration tests ensure component interactions remain intact. Linting checks maintain code style consistency. Type checking prevents subtle bugs. These automated gates catch issues immediately, triggering appropriate remediation.
Retry limit strategies balance thoroughness with efficiency. Simple transformations might warrant 3-5 attempts, while complex architectural changes could justify 20+ retries. Dynamic limits based on progress indicators work best—if each retry shows improvement, continue iterating. Stalled progress triggers fallback strategies.
Fallback model implementation provides resilience when primary approaches fail. Systems automatically switch between models based on failure patterns. GPT-4 might excel at logic transformation while Claude handles nuanced refactoring better. Some implementations use specialised models fine-tuned on specific framework migrations.
Error handling mechanisms must capture detailed failure information. Stack traces, test output, and validation errors feed back into retry prompts. Systems track which error types respond to which remediation strategies, building a knowledge base of effective fixes. This accumulated wisdom improves future migration success rates.
CI/CD pipeline integration ensures migrations fit existing development workflows. Automated pipelines using GitHub Actions, ESLint, and formatters validate every generated file. Migrations run in feature branches, enabling thorough testing before merging. Rollback procedures provide safety nets if issues surface post-deployment.
Which companies offer LLM migration services?
Major providers include AWS with Amazon Q Code Transformation, Google’s internal migration tools using Gemini, and specialised platforms like Aviator that offer LLM agent frameworks for Java to TypeScript conversions.
AWS Amazon Q Code Transformation represents the most comprehensive commercial offering. The service automates language version upgrades, framework migrations, and dependency updates. It analyses entire codebases, performs iterative fixes, and provides detailed change summaries. Integration with existing AWS development tools streamlines adoption for teams already using the ecosystem.
Google’s Gemini-based approach showcases internal tool sophistication. Their toolkit splits migrations into targeting, generation, and validation phases. Fine-tuned on Google’s massive codebase, it handles complex structural changes across multiple components. While not publicly available, it demonstrates the potential of organisation-specific tools.
Aviator’s LLM agent platform specialises in complex language transitions. Their multi-agent architecture uses specialised models for reading, planning, and migrating code. The platform excels at maintaining architectural consistency during fundamental technology shifts like Java to TypeScript migrations. Built-in CI/CD integration and comprehensive error handling make it suitable for production deployments.
Open-source alternatives provide flexibility for custom requirements. LangChain and similar frameworks enable building bespoke migration pipelines. These tools require more implementation effort but offer complete control over the migration process. Organisations with unique codebases or specific compliance requirements often prefer this approach.
Selection criteria for SMBs should prioritise accessibility and support. Managed services like Amazon Q reduce implementation complexity, providing immediate value without deep expertise requirements. Platforms focusing on specific migration types often deliver better results than generic tools. Cost models matter—pay-per-use APIs enable starting small and scaling based on success.
Feature comparison reveals distinct strengths across providers. AWS excels at Java version migrations and AWS service integrations. Google’s tools handle massive scale with sophisticated validation. Aviator specialises in cross-language migrations with strong typing preservation. Understanding these specialisations helps match tools to specific migration needs.
One technical challenge remains: how do these systems handle the massive codebases they need to process?
FAQ Section
Why do LLM migrations fail?
LLM migrations typically fail due to insufficient context, complex architectural dependencies, outdated third-party libraries, or attempting perfect initial prompts instead of embracing iterative refinement approaches.
The most common failure stems from treating LLMs like deterministic tools. Developers accustomed to precise programming languages expect consistent outputs from identical inputs. LLMs operate probabilistically, generating different solutions to the same problem. This variability becomes a strength when combined with retry mechanisms but causes frustration when expecting perfection.
Complex architectural dependencies pose particular challenges. Legacy systems often contain undocumented relationships between components. A seemingly simple function might trigger cascading changes throughout the codebase. Without sufficient context about these hidden dependencies, LLMs generate changes that break distant functionality. Expanding context windows and thorough testing helps, but some architectural complexity requires human insight to navigate successfully.
Is LLM migration cost-effective for small businesses?
Yes, LLM migration is highly cost-effective for SMBs, often costing less than one developer-month of work while completing migrations that would take years manually, with pay-per-use API pricing making it accessible.
The economics favour smaller organisations particularly well. Large enterprises might have teams dedicated to migrations, but SMBs rarely possess such luxury. A typical developer costs $10,000-15,000 monthly, while API costs for migrating a medium-sized application rarely exceed $1,000. The time savings multiply this advantage—developers focus on revenue-generating features rather than maintenance.
Pay-per-use pricing removes barriers to entry. No infrastructure investment, no model training, no specialised hardware. SMBs can experiment with small migrations, prove the concept, then scale based on results. This iterative approach manages risk while building organisational confidence in AI-assisted development.
How to validate LLM-generated code changes?
Validation involves automated testing suites, CI/CD integration, regression testing, shadow deployments, code review processes, and maintaining feature branch isolation until all checks pass successfully.
Comprehensive test coverage forms the foundation of validation. Existing tests verify functionality preservation, while new tests confirm migration-specific requirements. The key insight: if tests pass before and after migration, core functionality remains intact. This assumes good test coverage—migrations often reveal testing gaps that manual review would miss.
Shadow deployments provide production-level validation without risk. The migrated system runs alongside the original, processing copies of real traffic. Performance metrics, error rates, and output comparisons reveal subtle issues that tests might miss. This parallel operation builds confidence before cutting over completely.
Can LLMs migrate proprietary or custom frameworks?
LLMs can migrate proprietary frameworks by providing sufficient examples and context, though success rates improve with retry loops, custom prompting strategies, and human-in-the-loop validation for edge cases.
The challenge with proprietary frameworks lies in pattern recognition. Public frameworks appear in training data, giving LLMs inherent understanding. Custom frameworks require explicit education through examples and documentation. Success depends on how well the migration system can learn these unique patterns.
Prompt engineering becomes crucial for proprietary migrations. Including framework documentation, example transformations, and architectural principles in prompts helps LLMs understand custom patterns. The retry loop approach excels here—each failure teaches the system about framework-specific requirements.
What programming languages can LLMs migrate between?
LLMs successfully migrate between most major languages including Java to TypeScript, Python 2 to 3, COBOL to Java, and legacy assembly to modern languages, with effectiveness varying by language pair complexity.
Language similarity significantly impacts success rates. Migrating between related languages (Java to C#, JavaScript to TypeScript) achieves higher automation rates than distant pairs (COBOL to Python). Syntax similarities, shared paradigms, and comparable standard libraries ease transformation.
Modern to modern language migrations work best. These languages share contemporary programming concepts—object orientation, functional elements, similar standard libraries. Legacy language migrations require more human oversight, particularly for paradigm shifts like procedural to object-oriented programming.
How long does LLM migration take compared to manual migration?
LLM migrations typically complete 10-25x faster than manual efforts, with Airbnb’s 6-week timeline replacing 1.5 years and Google achieving 50% time reduction even with human review included.
The acceleration comes from parallel processing and elimination of human bottlenecks. While developers work sequentially, LLM systems process hundreds of files simultaneously. A migration that would occupy a team for months completes in days. Setup time adds overhead, but the exponential speedup quickly compensates.
Human review time must be factored into comparisons. LLM migrations require validation, but this review process moves faster than writing code from scratch. Developers verify correctness rather than implementing changes, a fundamentally faster cognitive task.
What skills does my team need for AI migration?
Teams need basic prompt engineering understanding, code review capabilities, CI/CD knowledge, and ability to configure validation rules—significantly less expertise than manual migration would require.
The skill shift favours most development teams. Instead of deep framework expertise for manual migration, teams need evaluation skills. Can they recognise correct transformations? Can they write validation tests? These verification skills are easier to develop than migration expertise.
Prompt engineering represents the main new skill, but it’s approachable. Unlike machine learning engineering, prompt crafting uses natural language. Developers describe desired transformations in plain English, refining based on results. Online resources and community examples accelerate this learning curve.
How to measure success in LLM-driven migrations?
Success metrics include code coverage maintenance, test pass rates, build success rates, performance benchmarks, reduced technical debt metrics, time-to-completion, and total cost of ownership.
Quantitative metrics provide objective success measures. Test coverage should remain stable or improve post-migration. Build success rates indicate compilation correctness. Performance benchmarks ensure migrations don’t introduce inefficiencies. These automated metrics enable continuous monitoring throughout the migration process.
Qualitative assessments complement numbers. Developer satisfaction with the migrated code matters. Is it maintainable? Does it follow modern patterns? Would they have written it similarly? These subjective measures often predict long-term migration success better than pure metrics.
Can AI really migrate my entire codebase automatically?
AI can automate 80-97% of migration tasks, but human review remains essential for business logic validation, security considerations, and edge cases that require domain expertise.
The realistic expectation sets AI as a powerful assistant, not a complete replacement. The vast majority automates successfully, while complex edge cases need human judgment. This ratio holds across many migrations.
Business logic validation particularly requires human oversight. While AI can transform syntax and update frameworks, understanding whether the migrated code maintains business intent requires domain knowledge. Security implications of changes also warrant human review, especially in sensitive systems.
What’s the catch with using LLMs for technical debt?
The main considerations are API costs for large codebases, need for robust testing infrastructure, potential for subtle bugs requiring human review, and initial setup time for retry loop systems.
API costs scale with codebase size and complexity. While individual file migrations cost pennies, million-line codebases accumulate charges. However, these costs pale compared to developer salaries for manual migration. Organisations should budget accordingly but recognise the favourable cost-benefit ratio.
Testing infrastructure requirements can’t be overlooked. LLM migrations assume comprehensive test coverage to validate changes. Organisations with poor testing practices must invest in test creation before attempting migrations. This investment pays dividends beyond migration, improving overall code quality.
Conclusion
The counter-intuitive approach of embracing failure in LLM-driven code migration represents a paradigm shift in how we tackle technical debt. By allowing AI systems to fail, learn, and retry, organisations achieve automation rates previously thought impossible. The success stories from Airbnb, Google, and others demonstrate that this methodology isn’t just theoretical—it’s delivering real business value today.
For SMB CTOs facing mounting technical debt, the message is clear: LLM-assisted migration has moved from experimental to essential. The combination of accessible pricing, proven methodologies, and dramatic time savings makes it feasible for organisations of any size to modernise their codebases. The question isn’t whether to use LLMs for migration, but how quickly you can start.
The future belongs to organisations that view technical debt not as an insurmountable burden but as a solvable challenge. With LLMs as partners in this process, what once required years now takes weeks. The tools exist, the methodologies are proven, and the ROI is undeniable.
Your technical debt is actively costing you money and slowing down innovation. The tools to fix it are now faster and more affordable than ever. Start a small pilot project this quarter and see for yourself how failure-based learning can clear years of debt in a matter of weeks.