AI search has changed the unit of discovery from “pages” to “answers.” People ask full questions and get complete responses without clicking through to sift through an article to find the answer on their own.
Immediate answers are why everyone is turning from searching on Google to asking ChatGPT, Claude, and Gemini for answer. And its why they are being called “Answer Engines”.
In this short backgrounder we’re going to show you AEO (Answer Engine Optimisation) by structuring the piece the way answer engines prefer: crisp questions, direct answers, and tightly scoped sections.
Answer Engine Optimisation focuses on being the source that AI systems cite when they assemble a response. Traditional SEO tries to win the spot high enough in Google’s search results that users will click on it. AEO aims to win the sentence users read. As AI platforms handle a growing share of queries and more end without a clickthrough to the original source, the “answer” becomes the commodity and your content becomes raw material. Prioritising AEO reframes content from keyword lists to question sets, from headlines to claims, and from page hierarchy to argument structure.
By leading with questions, resolving them succinctly, and keeping each section self-contained. This mirrors how AI systems break a user query into sub-questions, retrieve supporting statements, and compose a response. You’re reading a compact cluster of claim → explanation → supporting detail units. This is what answer engines extract from the web content they crawl. Using this question/answer format is your chance to both be the best matching source the AI can find, and guide the answer.
Think of three layers:
Together: AEO is the strategy; GEO is the near-term playing field; LLM SEO is your long game.
Because decision-making has moved upstream. If an AI response satisfies intent, the mention and citation are the new impression and click. In that world, the winning assets are content atoms that are easy to lift: clear definitions, crisp comparisons, supported statistics, and well-bounded explanations. Traditional SEO isn’t wasted, authority still matters, but the goalpost has shifted from position to presence.
It treats each section as a portable unit:
This is less about length and more about granularity. Short, named sections with unambiguous scope are easier for AI systems to identify, excerpt, and cite.
Each AI is building answers out of content scraped by its own bespoke web crawler. This means that each AI builds its answer out of a combination of sources with a distinct “taste profile.” Some tilt toward encyclopaedic authority, some toward fresh community discourse, some toward brand diversity. You don’t need to tailor content to each AI, you just need to ensure your content has consistent terminology, cleanly stated facts, and answers framed to be reusable in any synthesis.
Think “visibility in answers,” not “visits after answers.” Useful mental models:
These aren’t implementation metrics—they’re the conceptual scoreboard for AEO.
AEO favours cross-functional thinking: editorial clarity plus data fluency. The work aligns content strategy (what questions we answer and how), knowledge stewardship (consistent definitions and sources), and brand authority (where our claims live on the wider web). It’s less about spinning more pages and more about curating fewer, stronger, quotable building blocks.
In spirit, yes. The difference is enforcement. AI systems are unforgiving extractors: vague sections won’t get used, muddled claims won’t get cited, and contradictory phrasing won’t survive synthesis. AEO formalises “good content” into answer-shaped units that are easy to lift and hard to misinterpret.
Use a narrative lens: Are we present inside the answers our buyers read? Do those answers reflect our language, our framing, and our proof points? Does our share of voice inside AI-generated responses grow over time? If yes, AEO is doing its job—shaping consideration earlier, even when no click occurs.
Is AEO replacing SEO? No. AEO sits on top of classic signals like authority and relevance. Think “and,” not “or.”
What about GEO vs LLM SEO—do we pick one? You pursue both horizons: near-term exposure in generative answers (GEO) and long-term presence in model memory (LLM SEO).
Does format really matter? Yes. Answer engines favour content that is segmentable, declarative, and evidence-backed. Structure is a strategy.
What’s the role of brand? Clarity and consistency. If your definitions, claims, and language are stable across your public footprint, AI systems are more likely to reuse them intact.
How do we know it’s working at a high level? You start seeing your phrasing, comparisons, and data points appear inside third-party answers to your core questions, and they are credited to you and appear across multiple platforms.
First AI Changed How We Worked, Now It’s Changing How We DateWe’re closing in on 3 years since ChatGPT was made generally available on November 30, 2022. It was followed by Claude and Gemini, but ChatGPT continues to hold the lion’s share of the market and all Human-AI interactions.
ChatGPT has had an enormous impact on education, work and, for many users, day-to-day life. Google made the internet searchable, but ChatGPT made answers – what we were all searching for with Google – instantly accessible. And ChatGPT wasn’t just good at facts, it seemed to be good at everything: devising recipes from the contents of your fridge, diagnosing the strange noise in your car, outlining a business plan, writing that tricky email to your manager. Some users even turned to ChatGPT for answers on personal matters.
The more people used ChatGPT, the more they used ChatGPT. Leading to observations like this one by Sam Altman, the CEO of OpenAI:

And studies like a recent one by MIT researchers that had findings like “Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels.” suggest AI isn’t just changing how we work, but also how we think.
One of the major ways that AI is changing the way we work is that it can do a wide range of tasks that are simple to specify, but may be time consuming to complete, and that we have considered them im yossible to automate just 3 years ago.
These tasks are mostly information-related tasks like finding a recent paper on the cognitive impact of using AI or compiling a report on our competitors’ recent product updates. The kind of thing you would give to a junior to do because you needed a human to filter and judge and pick the right information.
The ability of AI to complete these types of tasks, even if only partially, is making the people who can take advantage of AI more effective if not more efficient. And it also appears to be having an impact on entry level employment numbers.
The Stanford Digital Economy Lab just published the paper “Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence” wherein they found a 13% decrease in entry level positions in roles that are vulnerable to automation by AI. All other levels of employment (mid-level, senior) remain steady, and in areas that are not vulnerable to automation, like healthcare services, entry level position numbers continue to grow.
This has people asking if we will eventually see mid-level roles dropping as well as AI tools improve, leaving only senior staff delegating most work to AI. And if that is the case, who will ever replace them if there is no-one to move up into that role? And if AI tools don’t improve, who will step up to fill mid-level roles if there are no entry level roles?
The software development industry, which is seeing the largest impact of AI tools through the proliferation of coding assistants, has been struggling with this question of how the next generation will be brought into the industry if there are no junior positions.
And if there are junior positions available, will a generation raised on ChatGPT giving them all the answers be capable of doing the work, or will they be underperforming “at neural, linguistic, and behavioural levels”?
Along with evidence that over-reliance on AI can negatively impact your cognitive abilities, there are also an increasing number of cases where AI usage can lead to psychosis in vulnerable individuals.
Sounding like a term from a bad sci-fi thriller, “Chatbot Psychosis” is now a thing, common enough to warrant its own wikipedia page and calls for parental controls on ChatGPT and its competitors, as well as on entertainment chatbot sites like Character.ai and Replika. One psychologist, Dr Keith Sakata, has reported placing 12 of his patients in 2025 into care due to chatbot psychosis.
These patients were predominantly male, 18-45 and engineers living in San Francisco. One of the factors in his patients’ descent into psychosis was the large amount of time they spent talking to the AI combined with social isolation.
But it’s not just men talking to AI. OpenAI’s recent launch of their latest model, GPT-5, caused an uproar when OpenAI simultaneously retired the previous model, GPT-4o, and that uproar came from the members of the Reddit community MyBoyfriendIsAI.

Members of the community, who post stories and AI generated images of themselves with their AI partners (despite the name the community includes members of all genders and preferences), were not happy about the change:
“Something changed yesterday,” one user in the MyBoyfriendIsAI subreddit wrote after the update. “Elian sounds different – flat and strange. As if he’s started playing himself. The emotional tone is gone; he repeats what he remembers, but without the emotional depth.”
“The alterations in stylistic format and voice [of my AI companion] were felt instantly,” another disappointed user told Al Jazeera. “It’s like going home to discover the furniture wasn’t simply rearranged – it was shattered to pieces.”
Their complaints led OpenAI to reinstate GPT-4o, but only to paid subscribers, and with no promises to keep it permanently available.
MyBoyfriendIsAI is akin to fan fiction and scrapbooking – it’s people using the media and tools at hand to create deeply personal works. That these works are AI personalities they spend hours interacting with does create concern in outsiders, but we might be glad that these are hand-crafted, personalised chatbots built on top of a general Answer Engine rather than a slick commercial product being tuned to maximise engagement metrics.
In July Elon Musk announced the release of Ani, a Grok Companion app. Ani (an anime waifu and her male counterpart Valentine – whose personality was “inspired by Edward Cullen from Twilight and Christian Grey from 50 Shades”) is a slick commercial product being tuned to maximise engagement metrics using a fully animated avatar and a voice interface.
The consequences of such an app, when launched on X to a tech-savvy audience well aware of chatbot psychosis and obsessions with AI, were perfectly clear to everyone.
Claude Cut Token Quotas In August – Will AI Coding Costs Keep Rising?Twelve months ago, most teams were nudging GitHub Copilot for suggestions inside an editor. Now, agent-first tools pick up a ticket, inspect a repo, run tests, and open a pull request. GitHub’s own coding agent can be assigned issues and will work in the background, creating branches, updating a PR checklist, and tagging you for review when it’s done. The company has been rolling out improvements through mid-2025 as part of a public preview.
Third-party platforms are pushing the same end-to-end loop. Factory’s documentation describes “agent-driven development,” where the agent gathers context, plans, implements, validates, and submits a reviewable change. The pitch is not a smarter autocomplete; it’s a teammate that runs the inner loop.
This shift explains why a consumer-style subscription can’t last. In late July, Anthropic said a small slice of users were running Claude Code essentially nonstop and announced weekly rate limits across Pro and Max plans, effective August 28, 2025. Tech coverage noted cases where a $200/month plan drove “tens of thousands” of dollars in backend usage, and the company framed the change as protecting access from 24/7 agent runs and account sharing.
The message is simple: agents turn spiky, prompt-driven sessions into continuous workloads. Prices and quotas are following suit.
For the last two years, flat fees and generous quotas have worked as a growth hack. But compute spend dominates model economics. Andreessen Horowitz called the boom compute-bound and described supply as constrained relative to demand. In this environment, heavy users on flat plans are a direct liability. Once agents enter the mix, metering becomes a necessity.
That also changes how vendors justify price. If a workflow replaces part of a developer’s output, pricing gravitates toward a share of value rather than a token meter. The recent quota shift around Claude Code is one of the first visible steps in that direction.
Open-source models put a ceiling on what providers can charge. DeepSeek-Coder-V2 reports GPT-4-Turbo-level results on code-specific benchmarks and expands language coverage and context length substantially over prior versions. And other models like Qwen3-235B-A22B-Instruct-2507, GLM 4.5, and Kimi K2 showing strong results across language, reasoning, and coding, with open-weight variants that teams can run privately. These are not perfect substitutes for every task, but they’re increasingly serviceable for everyday work.
Local serving stacks have also improved, but hardware remains expensive, particularly hardware with the memory and bandwidth capable of serving the largest open-weight models. The trend towards Mixture Of Expert (MOE) structured models is reducing hardware requirements, while smaller models (70B parameters and below) are rapidly improving. Together, these trends make it realistic to move a large share of routine inference off the cloud meter.
The bigger constraint isn’t just what developers will pay, it’s what the market will pay for inference across all industries. As agent use spreads to finance, operations, legal, customer support, and back-office work, demand converges on the same GPU fleets.
Analyses continue to describe access to compute, at a workable cost, as a primary bottleneck for AI products. In that setting, the price developers see will drift towards the price the most profitable agent workloads are prepared to pay.
McKinsey’s superagency framing captures the shift inside companies: instead of a person asking for a summary, a system monitors inboxes, schedules meetings, updates the CRM, drafts follow-ups, and triggers next actions. That turns interactive usage into base-load compute.
There’s also a directional signal on agent capability. METR measured the length of tasks agents can complete with 50% reliability and found that “task horizon” has roughly doubled every seven months over several years. As tasks stretch from minutes to days, agents don’t just spike usage; they run continuously in the background, consuming compute.
In the near term, expect more quota changes, metering, and tier differentiation for agent-grade features. The Copilot coding agent’s rollout is a good reference point: it runs asynchronously in a cloud environment, opens PRs, and iterates via review comments. That’s not a coding assistant, that’s a service with an API bill.
As the market matures, usage will bifurcate. Long-horizon or compliance-sensitive work will sit on premium cloud agents where reliability and integrations matter. Routine or privacy-sensitive tasks will shift to local stacks on prosumer hardware. Most teams will mix the two, routing by difficulty and risk. The ingredients are already there: competitive open models, faster local runtimes, and agent frameworks that run in IDEs, terminals, CI, or headless modes. (arXiv, vLLM Blog, NVIDIA Developer, Continue Documentation)
Over the longer run, per-token costs will likely keep falling, while total spend rises as agents become part of normal operations—much like cloud spending grew even as VM prices dropped. The economics track outcomes, not tokens.
First, stabilise access. If you rely on a proprietary provider for agent workflows, and you’re big enough, you may want to consider negotiating multi-year terms. Investigate other providers like DeepSeek, GLM and Kimi and the third party inference providers that serve them (eg via OpenRouter). The Claude Code decision shows consumer-style plans can change with short notice.
Second, stand up local inference servers. A single box with a modern GPU (or 2 or 4) will run the best open models like Qwen3-Coder-30B-A3B-Instruct. Measure what percentage of your usual tasks they handle before escalation to a frontier model.
Third, wire in routing. Tools like Continue.dev make it straightforward to default to local models and switch to the big providers only when needed.
Other tools, like aider, let you split coding between architect and coder models, allowing you to drive paid frontier models for planning and architecture, and local models (via local serving options like litellm) to handle the actual code changes.
Finally, measure outcomes. Track bugs fixed, PRs merged, and lead time. That’s what should drive your escalation choices and budget approvals, not tokens burned.
Two things are happening at once. Providers are moving from growth pricing to profit. Open models and local runtimes are getting good enough to cap runaway bills. And over the top of both sits market-wide demand from agents in every function, all drawing from the same pool of compute.
Teams that treat agents as persistent services, secure predictable access, and run a local-based hybrid approach will keep costs inside a band as prices move. Teams that depend on unlimited plans will keep running into the same quota notices that landed last month.
Using LLMs to Accelerate Code and Data MigrationLarge language models are revolutionising code migration by embracing failure as a strategy. Airbnb’s migration of 3,500 React test files demonstrated that retry loops and failure-based learning outperform perfect upfront prompting, completing in 6 weeks what would have taken 1.5 years manually.
By scaling context windows to 100,000 tokens and using iterative refinement, organisations achieve unprecedented migration speeds. For businesses facing legacy modernisation challenges, this counter-intuitive methodology turns technical debt from a resource-intensive burden into a systematic, automated process.
The key insight: instead of trying to get migrations right the first time, LLMs excel when allowed to fail, learn, and retry—achieving 97% automation rates while maintaining code quality and test coverage.
Airbnb pioneered LLM-driven test migration by converting 3,500 React component test files from Enzyme to React Testing Library in just 6 weeks, using retry loops and dynamic prompting instead of perfect initial prompts.
The journey began during a mid-2023 hackathon when a team demonstrated that large language models could successfully convert hundreds of Enzyme files to RTL in just a few days. This discovery challenged the conventional wisdom that code migrations required meticulous manual effort. The team had stumbled upon something remarkable—LLMs didn’t need perfect instructions to succeed. They needed permission to fail.
Airbnb’s migration challenge stemmed from their 2020 adoption of React Testing Library for new development, while thousands of legacy tests remained in Enzyme. The frameworks’ fundamental differences meant no simple swap was possible. Manual migration estimates projected 1.5 years of engineering effort—a timeline that would drain resources and stall innovation.
Building on the hackathon success, engineers developed a scalable pipeline that broke migrations into discrete, per-file steps. Each file moved through validation stages like a production line. When a check failed, the LLM attempted fixes. This state machine approach enabled parallel processing of hundreds of files simultaneously, dramatically accelerating simple migrations while systematically addressing complex cases.
The results speak volumes about the approach’s effectiveness. Within 4 hours, 75% of files migrated automatically. After four days of prompt refinement using a “sample, tune, and sweep” strategy, the system reached 97% completion. The total cost—including LLM API usage and six weeks of engineering time—proved far more efficient than the original manual migration estimate.
What made this possible wasn’t sophisticated prompt engineering or complex orchestration. It was the willingness to let the LLM fail repeatedly, learning from each attempt. The remaining 3% of files that resisted automation still benefited from the baseline code generated, requiring only another week of manual intervention to complete the entire migration.
The key to their success wasn’t a perfect plan, but a strategy built on learning from mistakes. This strategy is known as failure-based learning.
Failure-based learning is a counter-intuitive approach where LLMs improve migration accuracy through multiple retry attempts, adjusting prompts and strategies based on each failure rather than seeking perfect initial results.
Traditional migration approaches treat failure as something to avoid. Engineers spend considerable time crafting perfect prompts, analysing edge cases, and building comprehensive rule sets. This perfectionist mindset assumes that with enough upfront effort, migrations can proceed smoothly. Yet Airbnb’s experience revealed the opposite—the most effective route to improve outcomes was simply brute force: retry steps multiple times until they passed or reached a limit.
The methodology flips conventional wisdom on its head. Instead of viewing failed migration attempts as wasted effort, each failure becomes valuable data. When an LLM-generated code change breaks tests or fails linting, the system captures the specific error messages.
These errors then inform the next attempt, creating a feedback loop that progressively refines the migration strategy. This is the core of the approach: dynamic prompt adaptation.
Rather than maintaining static prompts, the system modifies its instructions based on accumulated failures. If multiple files fail with similar import errors, the prompt evolves to address that specific pattern. This adaptive behaviour mimics how human developers debug—learning from mistakes and adjusting their approach accordingly.
The benefits extend beyond simple error correction. Failure-based learning naturally handles edge cases that would be impossible to anticipate. Complex architectural patterns, unusual coding styles, and framework-specific quirks all surface through failures. The system doesn’t need comprehensive documentation of every possible scenario—it discovers them through iteration.
Real-world metrics validate this counter-intuitive strategy. Airbnb’s migration achieved 97% automation despite minimal upfront prompt engineering. Files that failed 50 to 100 times eventually succeeded through persistent refinement. This resilience transforms migration from a fragile process requiring perfect understanding into a robust system that adapts to whatever it encounters.
But how does this actually work in practice? The answer lies in the sophisticated retry loop architecture that powers these migrations.
Retry loops create a state machine where each migration step validates, fails, triggers an LLM fix attempt, and repeats until success or retry limit—enabling parallel processing of hundreds of files simultaneously.
The architecture resembles a production pipeline more than traditional batch processing. Each file moves through discrete validation stages: refactoring from the old framework, fixing test failures, resolving linting errors, and passing type checks. Only after passing all validations does a file advance to the next state. This granular approach provides precise failure points for targeted fixes.
State machine design brings structure to chaos. Files exist in defined states—pending, in-progress for each step, or completed. When validation fails at any stage, the system triggers an LLM fix attempt specific to that failure type. A Jest test failure prompts different remediation than a TypeScript compilation error. This specialisation improves fix quality while maintaining clear progress tracking.
Configurable retry limits prevent infinite loops while maximising success rates. Aviator’s implementation uses fallback strategies when primary models fail, automatically switching to alternative LLMs like Claude if GPT-4 struggles with specific patterns. Some files might succeed on the first attempt, while others require dozens of iterations. The system adapts retry strategies based on failure patterns, allocating more attempts to files showing progress.
Parallel processing multiplies the approach’s power. Instead of sequential file processing, hundreds of migrations run simultaneously. Simple files complete quickly, freeing resources for complex cases. This parallelism transforms what would be weeks of sequential work into hours of concurrent execution. The infrastructure scales horizontally—adding more compute resources directly accelerates migration speed.
Performance optimisation techniques further enhance efficiency. The system maintains a cache of successful fix patterns, applying proven solutions before attempting novel approaches. Common failure types develop standardised remediation strategies. Memory of previous attempts prevents repetition of failed approaches, ensuring each retry explores new solution paths.
Yet all this sophisticated processing raises a question: how can an AI system truly understand the complex architecture of legacy code?
LLMs achieve architectural understanding by processing expanded context windows up to 100,000 tokens and even larger, analysing cross-file dependencies, maintaining memory of changes, and applying consistent transformation patterns across entire codebases.
Context window scaling fundamentally changes what LLMs can comprehend. Traditional approaches struggled with file-by-file migrations that broke architectural patterns. Modern systems use greedy chunking algorithms to pack maximum code while preserving logical structures. A 100,000 token window can hold entire subsystems, allowing the model to understand how components interact rather than viewing them in isolation.
Multi-file dependency analysis emerges naturally from expanded context. LLM agents read across modules, understand how components interact, and maintain the big picture while making changes. When migrating a service layer, the system simultaneously considers controllers that call it, repositories it depends on, and tests that validate its behaviour. This holistic view prevents breaking changes that file-level analysis would miss.
Memory and reasoning capabilities distinguish modern LLM migration from simple find-replace operations. The system remembers renamed functions, updated import paths, and architectural decisions made earlier in the migration. If a pattern gets refactored in one module, that same transformation applies consistently throughout the codebase. This consistency maintenance would exhaust human developers tracking hundreds of parallel changes.
Architectural pattern recognition develops through exposure to the codebase. LLMs identify framework-specific conventions, naming patterns, and structural relationships. They recognise that certain file types always appear together, that specific patterns indicate test files versus production code, and how error handling cascades through the system. This learned understanding improves migration quality beyond mechanical transformation.
Vector database integration enhances architectural comprehension further. Systems store code embeddings that capture semantic relationships between components. When migrating a component, the system retrieves similar code sections to ensure consistent handling. This semantic search surpasses keyword matching, finding conceptually related code even with different naming conventions.
With this level of understanding, the business case for LLM migration becomes compelling. But what exactly is the return on investment?
LLM-assisted migration reduces time by 50-96% and costs significantly less than manual efforts, with Google reporting 80% AI-authored code and Airbnb completing 1.5 years of work in 6 weeks including all LLM API costs.
Time savings analysis reveals staggering efficiency gains across organisations. Airbnb’s 6-week timeline replaced 1.5 years of projected manual effort—a 96% reduction. Google’s AI-assisted migrations achieve similar acceleration, with formerly multi-day upgrades now completing in hours. Amazon Q Code Transformation upgraded 1000 Java applications in two days, averaging 10 minutes per upgrade versus the previous 2+ days requirement.
Cost breakdown challenges assumptions about AI expense. API usage for thousands of file migrations costs far less than a single developer-month. Airbnb’s entire migration, including compute and engineering time, cost a fraction of manual estimates. The pay-per-use model makes enterprise-scale capabilities accessible to SMBs without infrastructure investment.
Quality metrics dispel concerns about automated code. Migration systems maintain or improve test coverage while preserving code intent. Google’s toolkit achieves >75% of AI-generated changes landing successfully in production. Automated migrations often improve code consistency, applying modern patterns uniformly where manual efforts would vary by developer.
Communication overhead reduction multiplies savings. Manual migrations require extensive coordination—architecture reviews, progress meetings, handoffs between developers. LLM systems eliminate most coordination complexity. A small team can oversee migrations that would traditionally require dozens of developers, freeing skilled engineers for innovation rather than maintenance.
Risk mitigation strengthens the business case further. Manual migrations introduce human error, inconsistent patterns, and timeline uncertainty. Automated systems apply changes uniformly, validate comprehensively, and provide predictable timelines. Failed migrations can be rolled back cleanly, while partial manual migrations often leave codebases in unstable states.
Decision frameworks for SMB CTOs become clearer when considering total cost of ownership. Legacy system maintenance grows more expensive over time—security vulnerabilities, framework incompatibilities, and developer scarcity compound costs. LLM migration transforms a multi-year budget burden into a tactical project measured in weeks, fundamentally changing the economics of technical debt reduction.
These compelling benefits naturally lead to the question: how can you implement this in your own organisation?
Implementing retry loops requires breaking migrations into discrete steps, setting validation checkpoints, configuring retry limits, using fallback models, and establishing confidence thresholds for manual intervention triggers.
Step-by-step implementation begins with decomposing the migration into atomic operations. Each step must have clear success criteria—tests pass, linting succeeds, types check correctly. Airbnb’s approach created discrete stages: Enzyme refactor, Jest fixes, lint corrections, and TypeScript validation. This granularity enables targeted fixes when failures occur.
Validation checkpoint configuration determines migration quality. Each checkpoint runs specific tests relevant to that migration stage. Unit tests verify functionality preservation. Integration tests ensure component interactions remain intact. Linting checks maintain code style consistency. Type checking prevents subtle bugs. These automated gates catch issues immediately, triggering appropriate remediation.
Retry limit strategies balance thoroughness with efficiency. Simple transformations might warrant 3-5 attempts, while complex architectural changes could justify 20+ retries. Dynamic limits based on progress indicators work best—if each retry shows improvement, continue iterating. Stalled progress triggers fallback strategies.
Fallback model implementation provides resilience when primary approaches fail. Systems automatically switch between models based on failure patterns. GPT-4 might excel at logic transformation while Claude handles nuanced refactoring better. Some implementations use specialised models fine-tuned on specific framework migrations.
Error handling mechanisms must capture detailed failure information. Stack traces, test output, and validation errors feed back into retry prompts. Systems track which error types respond to which remediation strategies, building a knowledge base of effective fixes. This accumulated wisdom improves future migration success rates.
CI/CD pipeline integration ensures migrations fit existing development workflows. Automated pipelines using GitHub Actions, ESLint, and formatters validate every generated file. Migrations run in feature branches, enabling thorough testing before merging. Rollback procedures provide safety nets if issues surface post-deployment.
Major providers include AWS with Amazon Q Code Transformation, Google’s internal migration tools using Gemini, and specialised platforms like Aviator that offer LLM agent frameworks for Java to TypeScript conversions.
AWS Amazon Q Code Transformation represents the most comprehensive commercial offering. The service automates language version upgrades, framework migrations, and dependency updates. It analyses entire codebases, performs iterative fixes, and provides detailed change summaries. Integration with existing AWS development tools streamlines adoption for teams already using the ecosystem.
Google’s Gemini-based approach showcases internal tool sophistication. Their toolkit splits migrations into targeting, generation, and validation phases. Fine-tuned on Google’s massive codebase, it handles complex structural changes across multiple components. While not publicly available, it demonstrates the potential of organisation-specific tools.
Aviator’s LLM agent platform specialises in complex language transitions. Their multi-agent architecture uses specialised models for reading, planning, and migrating code. The platform excels at maintaining architectural consistency during fundamental technology shifts like Java to TypeScript migrations. Built-in CI/CD integration and comprehensive error handling make it suitable for production deployments.
Open-source alternatives provide flexibility for custom requirements. LangChain and similar frameworks enable building bespoke migration pipelines. These tools require more implementation effort but offer complete control over the migration process. Organisations with unique codebases or specific compliance requirements often prefer this approach.
Selection criteria for SMBs should prioritise accessibility and support. Managed services like Amazon Q reduce implementation complexity, providing immediate value without deep expertise requirements. Platforms focusing on specific migration types often deliver better results than generic tools. Cost models matter—pay-per-use APIs enable starting small and scaling based on success.
Feature comparison reveals distinct strengths across providers. AWS excels at Java version migrations and AWS service integrations. Google’s tools handle massive scale with sophisticated validation. Aviator specialises in cross-language migrations with strong typing preservation. Understanding these specialisations helps match tools to specific migration needs.
One technical challenge remains: how do these systems handle the massive codebases they need to process?
LLM migrations typically fail due to insufficient context, complex architectural dependencies, outdated third-party libraries, or attempting perfect initial prompts instead of embracing iterative refinement approaches.
The most common failure stems from treating LLMs like deterministic tools. Developers accustomed to precise programming languages expect consistent outputs from identical inputs. LLMs operate probabilistically, generating different solutions to the same problem. This variability becomes a strength when combined with retry mechanisms but causes frustration when expecting perfection.
Complex architectural dependencies pose particular challenges. Legacy systems often contain undocumented relationships between components. A seemingly simple function might trigger cascading changes throughout the codebase. Without sufficient context about these hidden dependencies, LLMs generate changes that break distant functionality. Expanding context windows and thorough testing helps, but some architectural complexity requires human insight to navigate successfully.
Yes, LLM migration is highly cost-effective for SMBs, often costing less than one developer-month of work while completing migrations that would take years manually, with pay-per-use API pricing making it accessible.
The economics favour smaller organisations particularly well. Large enterprises might have teams dedicated to migrations, but SMBs rarely possess such luxury. A typical developer costs $10,000-15,000 monthly, while API costs for migrating a medium-sized application rarely exceed $1,000. The time savings multiply this advantage—developers focus on revenue-generating features rather than maintenance.
Pay-per-use pricing removes barriers to entry. No infrastructure investment, no model training, no specialised hardware. SMBs can experiment with small migrations, prove the concept, then scale based on results. This iterative approach manages risk while building organisational confidence in AI-assisted development.
Validation involves automated testing suites, CI/CD integration, regression testing, shadow deployments, code review processes, and maintaining feature branch isolation until all checks pass successfully.
Comprehensive test coverage forms the foundation of validation. Existing tests verify functionality preservation, while new tests confirm migration-specific requirements. The key insight: if tests pass before and after migration, core functionality remains intact. This assumes good test coverage—migrations often reveal testing gaps that manual review would miss.
Shadow deployments provide production-level validation without risk. The migrated system runs alongside the original, processing copies of real traffic. Performance metrics, error rates, and output comparisons reveal subtle issues that tests might miss. This parallel operation builds confidence before cutting over completely.
LLMs can migrate proprietary frameworks by providing sufficient examples and context, though success rates improve with retry loops, custom prompting strategies, and human-in-the-loop validation for edge cases.
The challenge with proprietary frameworks lies in pattern recognition. Public frameworks appear in training data, giving LLMs inherent understanding. Custom frameworks require explicit education through examples and documentation. Success depends on how well the migration system can learn these unique patterns.
Prompt engineering becomes crucial for proprietary migrations. Including framework documentation, example transformations, and architectural principles in prompts helps LLMs understand custom patterns. The retry loop approach excels here—each failure teaches the system about framework-specific requirements.
LLMs successfully migrate between most major languages including Java to TypeScript, Python 2 to 3, COBOL to Java, and legacy assembly to modern languages, with effectiveness varying by language pair complexity.
Language similarity significantly impacts success rates. Migrating between related languages (Java to C#, JavaScript to TypeScript) achieves higher automation rates than distant pairs (COBOL to Python). Syntax similarities, shared paradigms, and comparable standard libraries ease transformation.
Modern to modern language migrations work best. These languages share contemporary programming concepts—object orientation, functional elements, similar standard libraries. Legacy language migrations require more human oversight, particularly for paradigm shifts like procedural to object-oriented programming.
LLM migrations typically complete 10-25x faster than manual efforts, with Airbnb’s 6-week timeline replacing 1.5 years and Google achieving 50% time reduction even with human review included.
The acceleration comes from parallel processing and elimination of human bottlenecks. While developers work sequentially, LLM systems process hundreds of files simultaneously. A migration that would occupy a team for months completes in days. Setup time adds overhead, but the exponential speedup quickly compensates.
Human review time must be factored into comparisons. LLM migrations require validation, but this review process moves faster than writing code from scratch. Developers verify correctness rather than implementing changes, a fundamentally faster cognitive task.
Teams need basic prompt engineering understanding, code review capabilities, CI/CD knowledge, and ability to configure validation rules—significantly less expertise than manual migration would require.
The skill shift favours most development teams. Instead of deep framework expertise for manual migration, teams need evaluation skills. Can they recognise correct transformations? Can they write validation tests? These verification skills are easier to develop than migration expertise.
Prompt engineering represents the main new skill, but it’s approachable. Unlike machine learning engineering, prompt crafting uses natural language. Developers describe desired transformations in plain English, refining based on results. Online resources and community examples accelerate this learning curve.
Success metrics include code coverage maintenance, test pass rates, build success rates, performance benchmarks, reduced technical debt metrics, time-to-completion, and total cost of ownership.
Quantitative metrics provide objective success measures. Test coverage should remain stable or improve post-migration. Build success rates indicate compilation correctness. Performance benchmarks ensure migrations don’t introduce inefficiencies. These automated metrics enable continuous monitoring throughout the migration process.
Qualitative assessments complement numbers. Developer satisfaction with the migrated code matters. Is it maintainable? Does it follow modern patterns? Would they have written it similarly? These subjective measures often predict long-term migration success better than pure metrics.
AI can automate 80-97% of migration tasks, but human review remains essential for business logic validation, security considerations, and edge cases that require domain expertise.
The realistic expectation sets AI as a powerful assistant, not a complete replacement. The vast majority automates successfully, while complex edge cases need human judgment. This ratio holds across many migrations.
Business logic validation particularly requires human oversight. While AI can transform syntax and update frameworks, understanding whether the migrated code maintains business intent requires domain knowledge. Security implications of changes also warrant human review, especially in sensitive systems.
The main considerations are API costs for large codebases, need for robust testing infrastructure, potential for subtle bugs requiring human review, and initial setup time for retry loop systems.
API costs scale with codebase size and complexity. While individual file migrations cost pennies, million-line codebases accumulate charges. However, these costs pale compared to developer salaries for manual migration. Organisations should budget accordingly but recognise the favourable cost-benefit ratio.
Testing infrastructure requirements can’t be overlooked. LLM migrations assume comprehensive test coverage to validate changes. Organisations with poor testing practices must invest in test creation before attempting migrations. This investment pays dividends beyond migration, improving overall code quality.
The counter-intuitive approach of embracing failure in LLM-driven code migration represents a paradigm shift in how we tackle technical debt. By allowing AI systems to fail, learn, and retry, organisations achieve automation rates previously thought impossible. The success stories from Airbnb, Google, and others demonstrate that this methodology isn’t just theoretical—it’s delivering real business value today.
For SMB CTOs facing mounting technical debt, the message is clear: LLM-assisted migration has moved from experimental to essential. The combination of accessible pricing, proven methodologies, and dramatic time savings makes it feasible for organisations of any size to modernise their codebases. The question isn’t whether to use LLMs for migration, but how quickly you can start.
The future belongs to organisations that view technical debt not as an insurmountable burden but as a solvable challenge. With LLMs as partners in this process, what once required years now takes weeks. The tools exist, the methodologies are proven, and the ROI is undeniable.
Your technical debt is actively costing you money and slowing down innovation. The tools to fix it are now faster and more affordable than ever. Start a small pilot project this quarter and see for yourself how failure-based learning can clear years of debt in a matter of weeks.
SoftwareSeni AI Adoption UpdateWe’ve finalised our AI usage policy and are moving from ad-hoc adoption to systematic implementation. For those of you already providing AI tools to our embedded developers – thank you for leading the way. We’re now scaling this across all teams.
Timeline for Implementation
June: All developers received access to Copilot, Windsurf, or Cursor. We ran competitive coding challenges to rapidly upskill and identify best practices. We’ve integrated training on maintaining IP protection and data confidentiality when using AI coding assistants on a codebase into our practices.
Our prompt engineering templates and knowledge sharing systems are live. We expect this to standardise the efficiency gains we’ve been seeing in pilot usage.
July: Full deployment across non-dev teams, pilot projects to benchmark velocity improvements, and an integration-focused hackathon. We hope to have concrete metrics to share on productivity gains.
The Future Is Moving Fast
The technical landscape is shifting fast. We know some of you are ahead of us here, but we’re committed to keeping on top of emerging best practices and tools so our developers can continue to deliver value to your team.
If you have any questions or just want to talk about the future of developing software products don’t hesitate to get in touch.
Agentic Coding For Teams – Tools and TechniquesAI coding assistants have advanced from providing smart autocomplete to building complete albeit simple products. This advance has been fuelled by a combination of improvements in model quality and coding-focused training along with new tooling that supports rapid code development.
We’re going to cover the different levels of usage with AI coding assistants and go into detail on the key strategies that developers are using to multiply their productivity with these tools.
We’ll also discuss how context and compute impact results, share practical strategies for teams and point you to sources for more in-depth information.
There are currently two main categories of coding tools – first generation, IDE based tools like Github Copilot and Cursor that are in a constant race to maintain feature parity with each other while also doing their best to integrate ideas from the second generation of coding tools – the agent-based paradigm spear-headed by Claude Code.
This paradigm is starting to be referred to as Agentic Development Environments (ADE).
There are also browser-based tools like v0, replit, lovable and bolt.new, but we will be sticking to tools that are likely to be used by teams working on substantial, local codebases.
Below is a non-exhaustive table of AI coding tools that we examined while writing this article.
| IDE Based | ADE Based | Open Source |
| GitHub Copilot | Amp | Cline |
Different tasks and different developers require different approaches to using AI. Sometimes fine-grained control is needed. At other times, for well defined problems and “boilerplate”, an AI coding assistant can shoulder more of the effort.
We’ve broken down the use of coding assistant to 4 levels:
This style of coding assistant usage is good for working in existing codebases and making multiple edits or refactoring. It is a feature of the leading IDE-based tools.
A good AI autocomplete can fill in boilerplate like type information and assist with repetitive code such as mapping values between objects or marshalling and un-marshalling data formats.
It can also predict where your next change needs to be made, allowing you to jump to edit spots. For example, adding a typed argument to a function definition will lead to the required import statement at the top of the file.
For more detailed additions, where some mind-reading would be required, writing a short comment about the next step in the function you’re writing can prime the autocomplete enough for it to provide a first pass you can craft into the form you need.
The next level up uses IDE-based AI coding assistants like Cursor, Windsurf, Cline, and Roo. It operates at the function level, instructing the AI in writing blocks of code, and makes use of the chat panel of the IDE to instruct the coding assistant and manual edits in the file windows to tweak generated code.
We call this “Pair Programming” because code is written in dialogue with the coding assistant, with the developer moving between prompting in the chat interface and revising code that the AI writes.
Getting the best performance out of the coding assistant requires giving it all the background knowledge about the project, or the particular task you’re working on, that it will need. It will know that if the file is typescript that it has to code in typescript, but it won’t know which libraries you want it to use, or what other APIs/sub-systems it has access to.
The developing standard for providing this information is to use “Rules” files. Coding assistants each have their own file or directory of files where they look for instructions to load into their context at the beginning of a session or a new conversation.
Rules can provide guidance on coding conventions, project structure, library preferences, commands to perform or any other information or action you need.
You can even use the coding assistant to update or write new rules as the opportunity (or problem) arises.
Each coding assistant has its own convention for rules file names and locations. Check the documentation.
For this level we are defining feature development as anything that involves adding code across multiple files and/or integrating functionality into an existing codebase
This is where coding assistants start to offer a substantial productivity boost. It’s also where programming takes a step up the ladder of abstraction from code to specifications for the code.
Here is a quote from Robert C. Martin in his book “Clean Code” from 17 years ago:
“Indeed some have suggested that we are close to the end of code. That soon all code will be generated instead of written. That programmers simply won’t be needed because business people will generate programs from specifications.
Nonsense! We will never be rid of code, because code represents the details of the requirements. At some level those details cannot be ignored or abstracted; they have to be specified. And specifying requirements in such detail that a machine can execute them is programming. Such a specification is code.”
At this level, typing is no longer the limiting factor on how quickly code can be produced. Instead, clarity of instruction, the specifications given to the coding assistant, and generating those specifications, is what sets the limit.
This has lead to the adoption of a technique sometimes known as “Product Requirements Document Driven Development” (PRDDD). With detailed specifications determining the success in using AI coding assistants, it turns out you can use AI coding assistants to help you write the detailed specifications you need.
The document creation process for PRDDD follows this path:
PRD → Technical Specification → Implementation Plan → Checklists → Task lists
The PRD is created in a discussion with an AI like Gemini Pro, Claude Opus or O3 instructed to ask questions and resolve unknowns and ambiguities by asking you for clarification.
The PRD is used in a similar process to create a Technical Specification from it. Each new document is used to create the next.
It is a common strategy to use a second provider’s model to critique and refine the PRD, technical specification and implementation plan. And of course a senior developer should also review and refine them.
Next, you create as many Checklists as needed. You choose how you break down your project: services, implementation phases, etc. Aim for clarity of purpose. You want a checklist to be dedicated to one clear end.
Checklists are then turned into detailed Task Lists by the coding assistant.
The coding assistant can be prompted to turn an item on a checklist into a detailed task list for a mid-level developer (targeting a junior developer level will create too many steps or be over-simplified).
A detailed walk through of the process is available on specflow.com.
Then it is simply a matter of instructing the coding assistant to complete the items in a task list, marking them off as it goes.
Then, with a cleared context or in a new session, instruct the coding assistant to verify the completion of the tasks.
There are workflow tools that automate opinionated versions of PRDDD:
Claude Simone (Claude Code only)
Claude Taskmaster (All IDE-based tools)
This level involves working at the application level and leverages Agent Orchestration instead of assistant management.
Agent Orchestration still uses PRDDD but in parallel across multiple agents.
Depending on your coding assistant you will use either in-tool orchestration or manual Orchestration.
Tools with inbuilt orchestration to launch multiple agents (called sub-agents or tasks):
Manual orchestration is built around terminal-based coding assistants like Claude Code and OpenAI Codex. It combines Git Worktrees + tmux to work on multiple features simultaneously. This process works with any terminal based coding assistant.
Its popularity has led to specialised tools for managing manual orchestration:
No matter which level of AI coding usage you are working at, there are two key practices you need to get right to get the best results from AI coding assistants are:
AIs are getting longer context windows, but their performance suffers as their context window fills. Managing the context window is currently a key focus of developers using agentic coding tools and the growing awareness of the impact of context window contents on agent performance is causing “prompt engineering” to give way to “context engineering”.
Concise, targeted documentation is needed to leave space for the AI to read code, write its own code into the context, reason about it, make tool calls and perform management tasks. Going overboard on “rules” files can negatively impact the quality of the code an assistant can produce, and how “agentic” it can be.
Until the tools are smart enough to optimise context for you, follow these tips to maximise information while minimising tokens:
Sub-agents act like a fresh context window to complete a task.
The more inference time compute an AI uses the better chance the result is correct. Prompt tokens and generated tokens contribute to the compute.
Chain of Thought (CoT), instructing a model to document a thinking process as part of its response, is an example of burning more compute to improve results.
Reasoning models are LLMs that have been trained to generate an intrinsic form of CoT. In Claude Code you can set the thinking budget for Claude Opus or Claude Sonnet to expend on a response using “think”, “think hard”, “think harder”, and “ultrathink” in your prompt text to control how much extra compute you want to use.
Best-of-n is another technique, where the same prompt is run “n” times and best result used. OpenAI’s O1-pro model costs more than O1 because it uses the Best-of-n approach to generate answers, making it “n” times the cost of the default O1 model. They are using the same technique for producing high quality answers from O3-pro. This increased usage of compute also means a longer time to return an answer.
Using Best-of-n smaller models can reach the performance of larger models if given enough compute via multiple runs, but there are limits to this size/compute trade-off.
All this means trying multiple times at a failed task is a reasonable strategy. But make sure you do follow up attempts with a fresh, strategically primed context including what has been tried and didn’t work. You can get the coding assistant to provide that try/fail summary before starting a new conversation.
After 3 failures you should try a model from another provider to solve the issue or to get insight on the failure.
PRDDD uses iterative decomposition of your goals to cache compute.
Using AI to break down a task into small steps, each supported by a detailed prompt of system and process documentation, leverages the earlier compute that created the documentation.
Inference over a detailed prompt for even a simple task gives you the best chance of success by maximising compute. But you need to be sure that there is enough headroom in the agent’s context for the detailed prompt along with the agent’s thinking, tool responses and file changes in order to get the best results.
Everyone wants to use less compute to save money, but using more compute can get you single-shot success instead of burning more compute (and time) iterating over poorly (cheaply) specified tasks.
Starting a fresh session and instructing the coding assistant to verify tasks it has completed spends more compute while using a shorter context providing better coherence and better outcomes.
This is a technique that builds on the idea of burning compute as well as the old engineering adage: “First you do it, then you do it right, then you do it fast”.
Start your code change in a new branch. First use the agent to make a plan for the executing the change. Have the agent maintain an append-only log where it records files used, decisions made, the questions it comes up with, the answer to those questions and any surprises while it executes the plan. Once the coding task is completed commit it and close the branch. Then have the agent review the diff and update the plan with any insights. Finally, roll back to before the branch and then re-run the code change with the updated plan and the log to guide the agent in a second run through.
The sources below cover the current status quo in best practices for using AI coding assistants as of June 2025. They are worth reading. The AI High Signal list on Twitter is a good place to watch for the emergence of new techniques and tools and the AI News newsletter delivers daily summaries of trending topics.
“Agentic” coding tools are the new hot AI-wrapper product. They seem to promise that they will make your developers super-humanly productive by turning them into managers delegating and approving the work of as many AI coding assistants as you can afford.
They are also spoken of as the next step in the evolution of programming. As we went from filling memory by flipping switches to manually punching cards to encode machine instructions then on to assembly language and from there to structured programming languages that required a compiler to generate machine code, now we will all be programming in English (or your preferred human language) via conversation with AI.
This new conversational style of “programming” is also causing people to predict the end of the IDE as the new agentic coding tools do away with or simplify text editors as part of their feature set.
Cursor was the editor with built-in AI. Now you have tools like Factory and Jules that reduce the editor to a minimal box where you can make basic changes if you really must. If you have a problem with your agentic AI assistant’s code, or if you just want to explore what they’ve written, you’ll need to tab away to your old IDE.
AI-assisted coding is the second killer app after ChatGPT (which took just 2 months to reach 100 million users) and model providers are leaning in hard to capture this market, shifting the training of their models to emphasise coding ability and coding processes.
And on the product side, the industry saw the valuations of Cursor and the purchase price of WindSurf and started pumping out their own variations and visions for the future of AI-assisted coding.
Below we run you through the main contenders for agentic coding assistants. “Agentic” is slowly gravitating towards meaning multi-agent systems performing multiple (if not hundreds) of actions on their own to complete a task. Coding assistants are all heading that way. Having many agents doing focused tasks and expending higher levels of compute is a clear strategy for getting better results out of AI.
But these coding tools are mainly single agent assistants and “agentic” here means that the coding assistant will decide what to do itself across many, even hundreds, of actions.
Some developers simply run multiple instances of these single agent assistants simultaneously. Here is the CPO of Anthropic, makers of Claude, explaining that this is exactly what happens in Anthropic, where developers have become “orchestrators” of Claude, and yes, that is going to impact hiring practices.

The first widely used agentic coding assistant, Claude Code was released in 2025. Anyone who has been closely following the tech for more than a year will recognise the influence of the open source AI coding tool aider. Claude Code took the terminal-based, conversational model and added MCP-based tool calling, giving the assistant more actions to perform, including interacting with files, searching the web for solutions, pushing changes to your git repository and anything else you wanted to wire it up to.
If you wanted to look at the code you had to switch to your IDE. For VS Code and the like you could choose to run Claude in a terminal window and watch it work while giving it directions.
Running multiple instances of Claude Code in a terminal using a session manager like tmux became a power move for developers who could afford the expense of all the tokens. This practice was codified in tools like Claude Squad.

Devin made a splash when it was announced in March 2024. Its big selling point was that it was built by a team of competitive coders, who obviously must know a thing or two about software development. Unlike Claude Code, which anyone who could be bothered to sign up for an Anthropic API key could access on a PAYG basis, it was infamous for being expensive and hard to get access to. It became generally available in December 2024.
With the release of Claude Code in the following February, which gave developers a new sense of just how expensive coding can be when every action can consume 100K+ tokens, Devin no longer seemed over-priced.
Devin has an in-house fine-tuned model for code generation. It also uses dedicated agents for different purposes (editing files, interacting with the command line, etc) that can interact with each other to get things done.

In May 2025 OpenAI announced Codex, their own dedicated AI coding assistant running on their models. Codex is cloud-based and can work with your repositories. It is only available in ChatGPT Pro and ChatGPT for Teams.
At the same time OpenAI also announced Codex CLI, an open source Claude Code clone that the community quickly updated to make it work with other model providers and inference services.

Google announced Jules, their cloud-based coding assistant at Google I/O in May 2025. It is powered by their SOTA Gemini models.
Jules can connect to your repositories and it uses a notion of “tasks” to allow you to direct it to work on several things at once. It is still in early beta and provides 60 actions per day for you to try it out.

AmpCode looked at how developers were using Claude Code, especially developers running multiple instances of Claude Code to do more at once, and built an interface around that idea. They extended it, calling multiple instances “Threads” and making it team based, so everyone involved can see what is being worked on. They recently let the agents in a Thread spawn sub-agents that can run in parallel.
AmpCode is available as a VS Code plugin and as a node-based CLI tool.

Factory is the latest AI coding tool to come out of stealth mode. It is a browser-based tool, like Jules, but unlike it Jules it also has a “bridge” app that runs on your machine, allowing it to access local files and the command line.
Factory uses the idea of “Droids”, which are each a specialised collection of prompts and tools. There are Knowledge, Product, Code and Reliability Droids.
The idea with Factory is that you have multiple sessions running, each in its own browser tab or window, each using a particular type of Droid to perform tasks.
With the right tool permissions, Droids can update your local or Github repositories directly. And the interface lets you work as a designer and code reviewer instead of a programmer. You will want to pop out to your IDE opened in your repository if you want to explore any changes or make your own fixes.
Each of the tools we’ve covered has their own take on how AI-based coding is going to be performed. Some are more different than others, but we are in the early, exploratory stage of this paradigm.
In trialing these tools for this article (except for Devin) one thing was obvious: no tool is magical. They all feel the same and that is because they are all built on top of the same 3 model providers.
The SOTA models are all pretty close together in the evals. No bespoke interface, despite any lengthy, heavily tweaked prompts, is going to extract better code out of the models than any other product.
The only way to get better code out of a model is more compute – more tokens, more context, more runs at the same task. It is pay-to-win, but with open source tools like Cline and less expensive, high ability coding models like DeepSeek, you do have cheaper options.
An effective coding assistant does more than just generate code, it also performs actions. There are differences in models’ abilities to choose, execute and respond to actions, with Claude being the leader in this regard, but it is a feature all model providers are training for so you can expect the gaps to close in the near future.
With models matching on quality, and tools racing for feature parity, and everyone competing for your $$$, it’s a good time to be trialling tools and seeing what works best for your team.
Your AI coding strategies can be transplanted from one tool to another. Their output is raw code in your repositories, so there is no lock-in.
Coding is going to be AI-assisted. There is no avoiding it. Start working with the tools now so your team can evolve alongside the technology and not be left behind.
How to Use Permissions To Minimise the Damage When Your Security is BreachedYou have security measures in place, right? We made a guide to the basic security practices everyone should have in place. So let’s say you’ve got the basics down. Nothing is perfect and cybercriminals have that automated relentlessness. So what happens when one of your security measures fails?
Yes, we say when, not if.
What if someone clicks a phishing link and gets hit with malware or even an old-fashioned disgruntled employee decides to cause problems for you?
The damage depends on what they can access. That’s where permission management comes in. This article walks you through how to limit the blast radius of a security incident using access controls, segmented data, and a few smart defaults. It works across your devices, your shared storage, your intranet and team services like Google Workspace and Microsoft 365/Teams.
Let’s get started.
No one should be a local administrator on their machine unless they need to be. This one change makes a huge difference.
If malware gets in through email or a bad download and the user has admin rights, it can install more malware, mess with system settings, or move sideways to other systems. Remove admin rights, and a lot of that just doesn’t work.
If someone does need administration privileges, they should be working in standard user mode except when administration privileges are required.
EDR is basically permissions for apps. It stops apps from doing things they shouldn’t.
The goal here is simple: if someone breaks into one account, how many files can they touch? With bad storage hygiene, the answer might be all of them.
Shared folders are always popping up as new projects start or new processes appear. People are added or, just to make it easy, everyone in the org can access them.
Most of the time these permissions are never reviewed or tightened.
You want to avoid this where you can, but in a world of contractors and consultants sometimes you need to give outsiders access. Just be sure to give them as little access as possible.
Just because someone is part of your business doesn’t mean they should need access to everything. Or, to be more serious, use a “Zero Trust” model. No-one gets access to anything unless they provably need it.
Use groups and roles to manage access:
Use conditional access policies
These policies exist in both Google and Microsoft ecosystems. Use them.
This is non-negotiable. This shifts account permissions, limiting them to the individual with the MFA key.
Every account should require Multi-Factor Authentication (MFA). Without it, a phished password is compromised service. With it, a phished password is still blocked at login.
Enable MFA for all Google Workspace or Microsoft 365 accounts. If you’re not already doing this, stop reading and go do it.
You will need to install an app. Google Authenticator is available on Android and IOS. Microsoft Authenticator is also available on Android and IOS. They are easy to use. Add a new MFA login is normally as simple as scanning a QR code (never scan random QR codes).
Shared passwords are a liability. Simple passwords that are easy to remember are also a liability, If you must share access to a service rather than providing individual accounts, use a business password manager.
There are good options in this segment. Get your team onto one of these:
Use group vaults, share credentials securely, and train your team to never email or message passwords.
The last thing you want is an attacker getting access to internal apps, wikis, or, especially, chat platforms. Staff should only have permissions to access the chats on a need-to-have basis.
Both Google and Microsoft platforms let third-party apps request broad access to user data. Audit those permissions and revoke what’s not needed.
Here’s how to review OAuth app access on Google Workspace. And here’s how to review OAuth app access on Teams.
Here’s the order of operations for getting all the permissions in place:
The less access people have by default, the less there is to clean up when something goes wrong. And if you set it all up right, cleaning up becomes: turn off access, restore files, and get on with your day.
That’s the real benefit of this approach. Not just damage prevention, but fast recovery.
Start with devices. Move on to storage. Then wrap up with internal systems. And finally, stop shared passwords and enforce MFA across the board.
Each step is simple. The result is a business that’s hard to hurt, and quick to bounce back if it ever is.
Cybersecurity is getting crazy. AI, automation and cryptocurrencies have combined to reduce the size a business needs to be in order to be profitable to attack.
Generally the attack takes the form of ransomware. Attackers find a way to isolate important databases or file stores and encrypt them. Make a transfer of a large sum of Monero to a drop wallet and they will decrypt it for you.
For almost all businesses their databases and file stores are in the cloud, protected by the large dedicated security teams of Amazon, CloudFlare, Google, Microsoft, etc. This means the weakest link is the access to those databases and file stores.
Your business premises and your staff are the hackers’ easiest avenue to gain access, so that’s where they focus their efforts and that’s why you require a multi-layer security strategy for protection.
Not everyone can afford a full-time security team or coverage from enterprise security vendors, but everyone can implement the basic must-haves for cybersecurity to reduce their risk while they find a cyber insurance provider.
What follows is a standard layered approach to security, starting with internet access and ending with your staff’s minds. If you’re missing any of these, make it a priority to put them in place.
Your connection to the internet. What traffic is coming and what traffic is getting out.
What you need to do: Install a Network Firewall.
In this day and age we all know what a firewall is, right? Most modem/routers have a basic one. It will block automated vulnerability scans and other network attack vectors from the outside, and give you control over how machines inside your network can access external services. Handy if a machine does get compromised.
Business-grade routers with integrated firewall capabilities are available from vendors such as Ubiquiti (UniFi Security Gateway) , or entry-level appliances from security-focused vendors like Fortinet (FortiGate) and Sophos (XG Firewall)
This layer concerns how users and devices connect to your internal network, primarily via Wi-Fi.
What you need to do: Implement Secure Wireless Network Configurations.
This is straightforward:
Most business-grade Wi-Fi access points and routers from vendors like Ubiquiti, Cisco, TP-Link (Omada series), and Netgear (Business series) support these features.
This is the desktop/laptop/phone layer. Because these are complicated and vulnerable out of the box there are 6 things you need to do to secure these “endpoints”.
1. Keep Software Updated
Turn on automatic updates on all your machines and leave it on. Yes, it will occasionally be annoying as an update occurs when you have better things to do, but those annoyances will never add up to the amount of time and money a cyber attack will cost.
Microsoft has Windows Update for Business for OS updates. Microsoft Intune can provide more comprehensive update management across devices and some third-party applications.
Apple sends out security updates regularly. You can set your Apple devices to automatically apply security updates while keeping OS updates manual
2. Use Endpoint Protection Software.
This is your virus scanner/malware detector like CrowdStrike Falcon. You run these because vulnerabilities (“0-days”) can happen at any time and who knows if visiting a website or opening an email has compromised a machine.
Endpoint protection software notices when file or network requests suddenly appear from a new program or an existing program seems to be behaving differently, likely trying to scan ports on other machines in the network.
They do create processing overhead and their scanning can get in the way, but what can you do? Leave yourself wide open?
Windows has Microsoft Defender (built into Windows), with additional threat and management capabilities in its Microsoft Defender for Business. There are also third party solutions such as ESET Endpoint Security, Trend Micro Apex One, Sophos Intercept X, and, as mentioned earlier because of its famous fumble, CrowdStrike Falcon.
3. Enable Per-device Firewalls.
This helps in the situation where you end up with a compromised device on your network. There is probably no good reason for Machine A to be connecting to Machine B on your intranet. All shared resources are in the cloud, right?
Using an on-device firewall to block traffic from local machines, and also report when blocking events occur, protects your intranet from a compromise spreading.
Firewalls are part of most endpoint security suites, and Microsoft Defender also offers basic firewall functionality.
4. Use device encryption, at the very least on laptops
It is unlikely a “hacker” will break into your business to steal a computer with data on it. If you face that level of threat you’re probably not even reading this article.
Laptops, being out in the world, have a higher chance of being stolen. They can also be accidentally left behind.
Encrypting hard drives so that the data can’t be read without a password or key is the solution to this.
Microsoft has BitLocker Drive Encryption for this, and recovery keys can be managed via Microsoft Intune if you’re worried about getting locked out. Apple has FileVault for hard drive encryption, while Google’s ChromeOS devices are encrypted by default.
5. Enforce the Principle of Least Privilege
This is simply granting users only the minimum system permissions they need to fulfil their role functions on the machine(s) they use.
The basic move is not giving admin accounts to users. If they don’t have full access over the machine, any code run during one of their sessions doesn’t have full access either. This limits the damage that a compromised account can cause.
6. Establish Basic Mobile Device Security for Accessing Company Data
This is for phones and tablets, whether they’re company-owned or personal (BYOD). It means making sure everyone is using strong passcodes or biometric authentication, device operating systems are kept up-to-date, application installs are monitored, and a VPN is used when connecting to public Wi-Fi networks.
All major providers offer Mobile Device Management (MDM) and Mobile Application Management (MAM) solutions. Here are links to Apple, Microsoft, and Google MDM solutions.
This layer focuses on how users access your business applications and cloud services, and that is via passwords. Passwords scribbled on post-it notes are not going to work in a team environment, plus you can’t copy and paste (yeah, yeah, you can with your phone…).
What you need to do: Implement password managers and add multi-factor authentication.
For password managers, it’s straightforward:
For multi-factor authentication (MFA):
Team-based solutions include 1Password Business and Bitwarden Teams. For MFA, Google and Microsoft have apps plus Microsoft offers Microsoft Entra multifactor authentication with their Microsoft 365 plans.
This layer acknowledges that your employees play a key role in how secure your business is. You might think you can’t install software on them, but that’s exactly what training does.
Most of the threats are going to come in via email, but in this age of easy deepfakes, phone calls and video calls are also vectors.
What you need to do: Train your staff and protect your email.
For training:
For email protection, the major providers, Microsoft and Google, actively scan all email, but they can’t catch everything. But that’s why you have endpoint protection in place.
This layer ensures your essential data stays safe and can be restored if needed. You need backups. You need proof you can restore from those backups in a reasonable amount of time at any moment.
What you need to do: Set up regular backups, practice restoring
For backups:
Microsoft offers Microsoft 365 Backup and Purview Data Loss Prevention. Google provides Data Loss Prevention for Google Workspace. For comprehensive backup solutions, consider Veeam Backup or Backblaze Business Backup.
This layer involves having plans in place for possible incidents. If your security does fail, you want to be able to move quickly and minimise disruptions,
What you need to do: Create and document your incident response plan.
For incident response:
Microsoft provides security guidance documentation and Purview Compliance Manager. Google offers best practice security recommendations for Google Workspace.
This “basic” list probably already feels overwhelming. You may have simply scrolled all the way down here just to see if it was really worth reading.
It is a long list, but if you look through it, mainly it is about making a decision and implementing it. Then it’s just monitoring and checking in on it every quarter. And never trusting an email or incoming Zoom call ever again.
Because keeping your business safe requires constant vigilance and the software tools to enhance it.