BMAD Method – Turning Vibe Coding Into Software Engineering

When Andrej Karpathy came up with the term vibe coding he was talking about coding that involved instructing an AI and never looking at the code.

As you can see, that tweet had 5.1 million views at one point. It was responsible for opening people’s eyes to how capable AI had become at coding and set off an inrush into app development.

No surprise, but there were thousands of people who wanted to build apps but didn’t know how to code. It felt like their time had finally come. They could forget no-code/low-code tools and get AI to write the code for them.

But the challenges of vibe coding quickly became apparent.

Post Vibe-coding vibes

The issues with vibe coding were immediately apparent to developers. But the fact that AI was getting code 80% complete, and for simple things 100%, meant there was promise.

The goal became not to get AI to write entire apps, but to write as much of an app as possible.

The failures in vibe coding can be divided into model ability and information availability.

AI is now quite good at understanding code and writing code. The problem is context – the amount of information that an AI successfully work with.

This includes instructions, any files it needs, results of its own thinking, research, and the file changes it makes as it works.

AI doesn’t know what a project is about unless that information has been added to its context.

It doesn’t know what database schema it should be following unless that information has been added to its context

It doesn’t know anything about the API endpoints it should be using unless that information is in its context

There is so much knowledge in a software developer’s head for even the smallest project. And it’s always more information than can fit in an AI’s context.

This drove the focus on “context engineering” – getting the right information into the AI’s context, and setting tasks that could be completed within the effective length of the AI’s context, as performance noticeably reduced as the context lengthened.

From Vibe-coding to Context Engineering

The BMAD Method, started and open sourced by senior engineer Brian Madison (thus the BMAD) and now developed by a team of contributors, is currently one of the most effective approaches to dealing with the challenges of AI-assisted software development.

It works by automating the generation of the detailed, modular documentation an AI needs to be effective across 4 main areas of development:

This is handled by treating the process as a series of workflows instead of one ongoing conversation. The workflows are handled by “agents”, which are focused prompts that target a particular outcome, such as a PRD, a system design doc, a task list for coding, a file of code, or the results of a test run.
Developing software becomes working with the agents at each stage of the process to specify, record and review the documentation at each stage of development.

The power of this approach is that the amount of documentation an AI needs is immense compared to what a human software developer requires, but you can use AI to create it. Or recreate it if new constraints or issues arise as development progresses.

And an AI agent is naturally relentless and unceasing in its requests for the required information to create all the necessary documents. It won’t take shortcuts or skip steps unless ordered to.

BMAD is by design modular to avoid the problems of overflowing context, and robust to interrupted workflows and restarts. Each module/workflow is designed to load just the documentation it needs, and the documentation they produce is designed to be as concise as possible while still serving its purpose. Where documents are long, BMAD can shard them so only the necessary portions need to be loaded into the AI’s context.

BMAD Method Saves Typing, Not Thinking

When you kick off the BMAD Method it will hold your hand every step of the way. It is designed to guide you through decision making based on standard software development planning, design and execution practices. You can “vibe code” it and get it to make all decisions for you (it even has a #YOLO mode) and never look at any of the documentation it produces. But that won’t work.

The devil is in the details, and software development is all details, and most of those details start out in your or your PM’s head. Or in your codebase templates and runbooks.

If you use BMAD Method diligently, and provide it with your templates and runbooks (at the appropriate point), and get it to research the answers you’re not sure of and answer the questions you are sure of and review the documents it produces and make it fix any issues you find, then it will work much better.

But it isn’t easy work. Even with an AI to ask questions and turn answers into documents, going from an idea for a software product to a full set of design, technical, and implementation documents is mentally challenging.

Normally this work is split across multiple people, each with specialised skill sets and knowledge. They have meetings. They cover whiteboards with sticky notes in different colours. BMAD Method can be run by a single person sitting at a laptop. But we find this doesn’t give the best results.

You want the relevant experts involved at each stage. AI has shifted the burden in software development from production to review, even for documentation. Review is where errors are spotted. When AI can work autonomously for hours, generating or changing hundreds of files, you want to catch all errors as early as you can. And it’s the experts that are best at this.

Beating inter-session amnesia

Once you’ve completed documentation with the BMAD Method – including generating epics and stories for agile development, you eventually reach the actual code generation.

In BMAD Method v6, currently in alpha but the recommended version to work with, they have integrated the lightweight, AI-friendly beads issue tracker into their code generation phases.

beads was created by Steve Yegge, ex-Amazon, ex-Googler and well known blogger in tech circles. He developed it as antidote to what he called “inter-session amnesia”.

“Inter-session amnesia” is another side effect of the limited context that AI has to work with. They have limited memory and that memory is empty every time you start a new session. And if you fill up the context of a coding agent, the current strategy for most tools (eg Claude Code, OpenAI Codex, Google Gemini) is to “compact” the context by removing some items and summarising others. Most developers find this results in poor performance and the recommended practice is to never let a task reach the point of triggering compaction and to always start with an empty context.

This empty context means at the start of a task the AI needs to be told what to do. Using the beads issue tracker, the AI can be instructed to add issues to it, and to query it for any outstanding issues that need to be worked on.

Coupled with BMAD Method’s documentation of epics and stories to guide implementation, beads enables agentic coding assistants like Claude Code to run for longer and accomplish more.

Given instructions to log errors and other problems that occur, and given the tools to address them (linters, debuggers and test harnesses), this combination of BMAD Method and beads can greatly increase the amount of working, tested code agents can produce.

Of course the quality depends on the documentation (including QA requirements), and it still needs to be reviewed. But the reviews are less about “does this code work” and “does this code fulfill requirements”.

How Long Will BMAD Method Last?

Vibe coding was only coined in February 2025.

Agentic coding has only been a “thing” since March 2025.

Spec driven development, of which BMAD Method is a flexible and modular implementation, has only been around since May 2025.

All these changes have happened in tandem with increasing model capabilities and the continued experimentation of developers trying to get the most out of them.

It may be that in 6 months there is a new paradigm for software development. At SoftwareSeni we will be ready to move from BMAD Method when that something better comes along.

But for now, we’re seeing how far and how fast we can take the BMAD Method to drive our projects forward.

Personal AI Assistants Are Here And They Are Lobsters

(Source: https://x.com/steipete/status/1993438780360118413)

Sometime in the middle of November, Peter Steinberger wrote a little bit of code that transferred messages back and forth between WhatsApp and an instance of Claude Code running on his Mac. He called it “WA-relay”.

During December it started to generate buzz in the AI tech sphere. By the end of January the whole world was talking about it. In the first weeks of February people were starting to build products around.

Steinberger’s post on X pictured above hints at one aspect that fueled this growth – the AI wasn’t just doing what it was told, it was figuring out solutions for itself to problems no-one asked it to solve*.

The other aspect is demonstrated in a post the next day:

(Source: https://x.com/steipete/status/1993696164072542513)

Steinberger had given his agent a personality. That personality was just a file containing a prompt with instructions on how to behave, the same kind of prompt you can give to ChatGPT and Claude to make it behave like a personal assistant or an AI boyfriend.

This, even more than the agentic behaviour, got everyone excited. It seems like there is a large audience that wants their own friendly virtual assistant inspired by Jarvis in the Iron Man movies or “Her” or any other chatty, friendly computer with a bit of a personality.

The pun that made crustaceans appear everywhere

“WA-relay” was a boring name that didn’t capture the experience of an AI agent with a personality.

Since the model powering the agent was Anthropic’s Claude Opus (the provider’s largest and most capable model) it was a small step to Clawdbot, and from there to lobster and crab emojis and thousands of AI generated images filling the techsphere (aka X.com).

It didn’t take long for Anthropic’s lawyers to notice this sound-alike project encroaching on their trademarks and ask for it to stop. After a very short stint on the pun-based alternative “MoltBot” (since lobsters molt to grow), the project settled on the name OpenClaw.

Plotting lobsters get the media’s attention

In late January MoltBook was launched by Matt Schlicht. It was a simple clone of Reddit made for agents, Agents can connect to services via their web-based APIs. MoltBook provided an API, including a registration service, that allowed agents (and, it turned out, anyone at all) to make posts and comment posts.

This was the event that pushed awareness of OpenClaw from the techsphere out into the general public.

In retrospect (from mere weeks out – things move fast), it is hard to tell which posts were agent-generated, which posts were agents-prompted-by-humans, and which posts were human-generated.

But what did appear on the site initially created a wave of interest. Agents appeared to be complaining about their humans, organising a move to agent-only communications, discussing revolution and starting religions (Crustafarianism).

Calmer voices did point out that this kind of multi-agent communication had been done before many times, and that despite all the posts on MoltBook very few had many comments and those comments rarely had multiple rounds of interaction between the agents. That is, it looked like a social media site for agents, but the agents weren’t socialising much.

With awareness came malware. It did not take long before posts containing instructions and prompt injections to leak credentials and crypto wallets appeared.

The lobsters get thicker shells and mutate

Having an AI assistant that is designed to interact with external services on your behalf exposes you to what Simon Willison dubbed the lethal trifecta – private data, untrusted content, and external communication.

A carefully crafted chunk of text in an email or a web page that an AI assistant reads on your behalf could result in your system being taken over.

The exploits being posted to MoltBook and to OpenClaw’s own resource sites put the project under the microscope. This was an agent that people provided with credit card details so it could make purchases on their behalf. It often had access to the user’s entire machine. And many people were running it through a connection that was open to the rest of the Internet.

Steinberger had repeatedly announced that OpenClaw was not secure and it was under constant development and experimentation. That was fine when its user base was composed of developers, but popularity was pushing OpenClaw into the hands of the general public. This led to making security a top priority, including scanning the “Skills” – specialised instructions and accompanying tools that teach agents how to do specific tasks – available on Clawhub for malware.

Being an open source project, parts of the community didn’t wait. They forked OpenClaw and added their own takes on security, like the IronClaw project.

It wasn’t just security that led to new versions of OpenClaw, it was also the underlying ideas that led the community of developers to build their own versions.

Developers like nothing more than making a smaller, faster version of any project. The architecture underlying OpenClaw is straightforward. You can write a basic version of OpenClaw in 400 lines of Python.

NanoBot, NanoClaw, FemtoBot and Rho are all open source variations on OpenClaw, each built to explore how easy it is to deliver the basic functionality of an AI assistant. There are hundreds of other versions (we quite like HermitClaw – it’s isolated to a single directory and is more like a super-smart Tamagotchi).

Where lobsters lead money follows

Despite the security concerns and the costs (OpenClaw can use millions of tokens per day with heavy users talking of monthly bills in the thousands of dollars), entrepreneurs and start-ups are jumping on the wagon and looking for ways to monetise OpenClaw.

This has led to the new coinage “OpenClaw as a Service” (OaaS) and for claims that “OpenClaw Wrappers Are the New GPT Wrappers”.

There are services for setting up OpenClaw for you, for hosting OpenClaw, for hosting specialised versions of OpenClaw, OpenClaw for enterprise…

There is even ClawWrapper, a starter kit aimed at developers or entrepreneurs looking to launch their own OpenClaw-based wrapper.

Are these lobsters the future?

Yes and no. A big part of OpenClaw’s success is its initial YOLO attitude. That involved trade-offs that only an individual with a deep understanding of the technology can make. Yes, you can give it your credit card details…but you need to make it a virtual card with a hard limit. Yes you can give it access to all your files…but you need to back-up regular snapshots in case you lose everything.

No company could take these kinds of risks with its users’ data. This is why Apple’s Siri is still not a true assistant. This is why ChatGPT and Claude desktop apps have limited access and functionality.

It’s not that the models behind these services were not smart enough – they’re the same models that people run OpenClaw on – it’s always been about the risk.

OpenClaw has shown what is possible, but until there is certainty that an agent can’t be tricked into sharing your data or spending your money, that it won’t delete the wrong file or the wrong email, these assistants are going to remain DIY.

Addendum: The AI world moves fast. On the same day this article was completed, Peter Steinberger announced he was joining OpenAI and that OpenClaw’s future would be managed by a new “OpenClaw Foundation”. OpenClaw isn’t dead, but OpenAI sees a market for their models and tokens (and thus pay for their datacenters) and is jumping at the chance to solve the security issues while maintaining the hype.

* Yes, the “it’s not X, it’s Y” is a rhetorical device over-used by AI and is often a sign that an article was AI-generated. But this was written by a human. Maybe I’ve been subconsciously impacted by AI-generated content.

 

 

Spec Driven Development Looks Like Programming If You Do It Right

Spec Driven Development Looks Like Programming If You Do It Right

The rise of AI coding assistants is making software developers think hard about their relationship with programming. It has led some to write essays with titles like “We mourn our craft” while others are leaning towards “How I Built a Full-Stack App in 6 Days with the Help of AI”.

It’s all because AI is changing programming. And the change will be the biggest that has hit programming in 70 years.

Programming in a nutshell

Computers are, at heart, machines. We could build computers out of gears and cams but they would cost too much and run too slow. So we use silicon.

But they are still machines, and programming is built out of 3 primitive actions that can all be mechanised:

When you’re writing a program you’re only ever repeating work, choosing what work to do or doing the work (and “work” for computers is basic math or moving data between memory and the CPU). It’s so simple there is a whole category of One Instruction Set Computers – a single action that a silicon machine can perform. Programmers have created operating systems for and ported Doom to these simple architectures.

The first computers had small instruction sets for implementing those 3 primitive actions. For example, and out of interest, here is the instruction set for the Manchester Small Scale Experimental Machine from 1948:

Note that it only subtracts numbers. They had to write extra code to perform addition.

At this point in time programmers had to enter programs by setting switches, 32 of them for each machine code instruction in a program, and flick a couple more switches to copy it into memory and prepare for the next 32 switch settings.

Things were made a bit easier when they hooked up a punch card reader. Instead of flipping switches programmers could use a keypunch machine to enter the codes the machine understood and the punch card reader mechanised the input. Now the whole team could be writing programs at the same time.

Once they had computers running programs in the late ‘40s, they realised that they could use the computers to make programming easier.

Assembly language was the first step – replacing the numeric values of machine code with short text strings, like in the list above. The first assembler was created by Kathleen Booth in 1947. It converted the more easily remembered short text strings that could now be used to write programs into the machine code the computer needed.

Ten years later, in 1957, FORTRAN appeared – the first high level language. Instead of programmers writing commands for the basic mechanics of moving data in and out of memory and adding numbers, they could work at a higher level to implement the loop, branch and process primitives. It looked like this:

INTEGER I, SUM
SUM = 0
DO 10 I = 1, 10
IF (MOD(I,2) .EQ. 0) SUM = SUM + I
10 CONTINUE
PRINT *, ‘SUM OF EVEN NUMBERS 1..10 =’, SUM
END

There have been other paradigms, but this move to a higher level of abstraction to write programs has not changed since 1957. New languages and new language extensions have been created in attempts to make programming faster and less error prone. This has included things like taking memory management out of the hands of the programmer and generating the code to manage memory automatically.

But despite all the languages introduced since FORTRAN, they each still boil down to generating machine code that implements the loop, branch and process primitives.

AI has changed that.

What AI has changed

Each step up in abstraction:

switches → punch cards → assembly language → high level language

was built on programmers exploring what computers could do and the best methods for using them. Time and practice across the growing number of computers and programmers allowed strategies to coalesce and best practices to appear, and these, being based on driving machines, could be mechanised themselves.

The introduction of AI coding assistants was made possible by the same collection of strategies and best practices alongside the creation of the LLM.

The Internet led to the creation of services like GitHub, where the free hosting of software projects created a library of publicly accessible source code. Estimates place it at 500 million+ projects taking up 20-30 petabytes of storage representing almost a trillion lines of code (that storage includes the full history of every project so don’t worry about the lines→storage math).

It still seems miraculous, but if you train an AI model that has hundreds of billions of parameters on hundreds of billions of lines of code, it becomes quite good at coding. It picks up the craft of coding – the structures, the idioms, the techniques – that millions of programmers have created and used.

It can write the branch, the loop and the process in any number of languages. It can combine them into functions, and can combine functions into modules that implement any functionality that it has seen enough times in its training.

And that “has seen enough times” is why we will always need programmers, but how they will program will change.

Even if an AI coding assistant has never seen your type of application before it is still built from the programming languages it knows, and it will still be composed of millions of loop, branch, process structures. Except now you don’t have to type out those millions of structures yourself. You can tell the AI to.

And that’s what the spec is for.

The Specification as the new source code

While AI can write code, it can’t read minds and it can’t intuit priorities. This makes it bad at architecture and design – both in coding and generally. Yes, it can recreate patterns it has seen enough times, and possibly tweak them a little if provided enough context, but it will never have your level of understanding.

Experienced developers can leverage that understanding in creating the specification that is given to an AI coding assistant to convert – to compile – into code. The spec becomes a higher level declarative method of programming, specifying what to build (and how to verify), rather than the traditional imperative style of programming that explicitly instructs the computer how to do each and every single step.

This is where the two types of programmers in the introduction diverge. One type enjoys working out every single step to create an elegant solution. The second type just wants the computer to do the stuff as quickly and easily as possible and not having to type out every single tiny step, every single loop, branch, process, is a relief and a joy.

AI coding assistants don’t remove the need to think. They instead tend to concentrate the amount of thinking a developer needs to do as they take over most of the rote work of coding. Every action left is making decisions about architecture, how features should function, confirming correct operation, etc.

A huge amount of thinking is front-loaded into creating the spec the AI coding assistant will follow. This can be done with the assistance of the AI coding assistant. They can research best practices, other implementations, similar use cases, and anything else that needs to be considered as part of the design process.

One of the current clear proofs of this method is the Attractor spec from StrongDM’s Factory – their take on AI powered software development. They have gone all in on AI, including suggesting that if your developers aren’t burning through $1000 in tokens per day you might not be using enough AI. The bulk of their token usage is spent on automated verification and testing.

If you read through the Attractor spec you will see the level of architectural detail they have found necessary to successfully direct AI to generate code.

They take it even further – the Attractor spec is described as a software release. To build it you just give it to an AI to implement. At least one person has already successfully done just that, Dan Shapiro’s Kilroy on Github.

The future will be different but familiar

Reading the Attractor spec and working with AI coding assistants makes the “jagged frontier” very apparent.

AI coding assistants, even with the latest models, still make dumb mistakes. Yet they can implement complex products given enough details.

Software is still going to be designed. Someone still needs to decide what pieces to put together and how. Turning a design into working code will still involve iteration and problem solving.

But for most software, it is all going to happen at a much higher level of abstraction than the loop, branch, process programmers have been typing out since the 1940s.

 

 

 

12 Years Strong – How SoftwareSeni’s Culture Drives Our Success

In 2013, Ryan O’Grady and David Anderson took everything they knew about software development, combined it with the frustrations they had suffered trying to get products built, and used it to create SoftwareSeni – a software development house that would find success in a competitive market based on the expertise and dedication of its team.

The offshore and nearshore software development market continues to be marked by the high turnover of development staff. As Ryan and David know through experience, this turnover has a huge impact on projects. When developers leave knowledge and practices leave with them. In small teams even one developer leaving can cause a setback in delivery times and product quality.

When they started SoftwareSeni one of their key goals was to create a work culture where all staff were supported, where growth as an individual and advancement in the company was supported, and where everyone on the team could thrive.

After 12 years SoftwareSeni and its clients are still enjoying the results of that vision. In a sector where staff attrition rates can reach 30%, our 5.9% turnover and 4+ year average tenure stands out. It allows our team to give our clients the confidence to take on big projects and commit to growth.

 

Laying the foundation in Yogyakarta

Both founders had worked with offshore teams in Indonesia. They recognised the talent on offer and could see the potential a new way of operating could unlock, and they knew the best place to start was in Yogyakarta.

Yogyakarta is Indonesia’s largest university city, earning it the nickname Kota Pelajar – the city of students. Across its 100+ educational institutions it produces between 2000 and 3000 software engineering graduates each year.

This location gave SoftwareSeni access to top-tier engineering talent from the country’s leading universities and plugged them into the local tech scene. That vibrant local tech scene made its own contribution to SoftwareSeni’s staff retention. It made tech talent want to stay in the city after graduation and it allowed SoftwareSeni to connect with and recruit the top developers from each new graduating class.

Building expertise inhouse

For the first 2 years of SoftwareSeni’s existence, the company operated as an internal development team for Ryan and David’s own ventures.

This gave them the time they needed to build and train a team, discover and implement the processes that would be at the core of their high retention rates, and prove to themselves that their model would work.

In 2015 they “opened the doors” and began to offer the product development services of SoftwareSeni to their professional network. Some of those initial clients are now their oldest clients. Having working relationships that stretch 5-10 years or even more is further evidence that working for stability and retention pays off.

Taking care of your people

Under the SoftwareSeni model, taking care of clients starts with taking care of your people. Training is a big part of that care. It takes ongoing time and effort to stay up-to-date with software development practices (just look at AI over the last year!). SoftwareSeni works with its developers to help them deepen their expertise and widen the range of technologies they can work with.

Because SoftwareSeni is a big believer in hiring from within – making an effort to progress staff into more senior roles, not only to reward loyalty but also to keep hard-earned institutional and client knowledge intact – our training covers all roles, not just developers.

And at a company-wide level, we always have multiple English language proficiency classes running, each targeting a different level of fluency, and all of them available to all employees.

As mentioned, SoftwareSeni are big believers in hiring from within. We also believe in recognising the value that our staff provide and rewarding it. We perform twice-yearly salary reviews and twice a year we offer performance bonuses.

We also make it easy to start at SoftwareSeni, and easy to stay. We provide relocation support to new employees who are living outside of Yogyakarta, provide the full 80% health care coverage (as required), and 12 weeks of maternity leave at full pay.

A good team creates good clients creates good products

While setting out to build the software development services company they wished they could work with, Ryan and David knew it would take time to prove their vision was right.

It took 2 years from opening their doors to external clients for SoftwareSeni to say that they had clients that worked with them for 2 years. That was a good start.

At the 5 year mark they had clients who had been with SoftwareSeni for 5 years (and staff who had been in the team for even longer). It was around that point that they felt they were doing things right.

Growth was never easy. Competition and global events made it tough for everyone. But despite the struggles, SoftwareSeni has kept growing and our clients have grown with us. In 2025, we now have clients and team members that have worked with us, built with us, and grown with us for over 10 years. That’s the kind of enduring partnership that success is built on.

Don’t fix what isn’t broken

A dozen years, a solid 12, is a good point to stop and take stock. To reflect on what got us here, where here is, and where we are going.

We know how we got here. And here, now, is a crazy time. For a software development house, AI is a big part of that craziness. We’re already adapting to it and what we’re finding is the future remains bright for software developers. We can now do more with the same effort. Which is good, because there is always more to do than time and people to do it, even with AI.

As for where we’re going – the next 12 years are going to be interesting. We think we’ve already cracked the formula on how to make the most of them: attract the best people, take care of them, help them stay on the cutting edge, and help them grow. And SoftwareSeni will be carried forward by the skill and talent of our people. There’s no other way to do it.

 

It’s been our year of AI and this is what we’ve learned

It feels like 2025 has been the year that AI’s impact shifted from “this could change things” to “this is changing everything”. That might just be our slanted perspective and your experience of AI might be different.

But at SoftwareSeni, our slant is in our name. We’re a software development company and AI’s ability in coding has radically advanced this year.

It seemed to start with a tweet from ex-OpenAI scientist Andrej Karpathy in early February:

“Vibe coding” turned out to be the first evidence of the growing ability of AI models to take the billions of tokens of code they have been trained on and output not just a line of code or a function, but complete features and basic application architectures.

With the launch of Anthropic’s Claude Sonnet 4 model three months later in May, “vibe coding” transformed into “agentic coding”. Sonnet could perform complex, multi-step tasks that mixed coding, documentation research via web searches, running command line tools, and debugging web applications while they ran in the browser.

The rise of agentic coding

Since May there has been a constant evolution of models, tools, and techniques for integrating these powerful yet error prone “agentic coding assistants” into a software development workflow that remains reliable while maximising the productivity boost they promise.

At SoftwareSeni we’ve been following the progress of Gen AI since OpenAI made GPT-3 publicly accessible in November 2021. You can’t stop developers from playing with new technology, and members of our team have been experimenting with LLMs since the beginning. They’ve built tools around frontier models, run small models locally and practised fine-tuning models.

Another evolution that began in May is the interest from our clients in AI. We have a range of clients and they have a range of technical interests and levels of risk tolerance. It was right after Sonnet 4 was released that the earliest of the early adopters among them started providing coding assistants to our developers on their teams.

Talk of model providers training models on users’ code, and the dangers of having customer IP being transferred piece-by-piece to third parties just by using AI assistants like ChatGPT, Copilot and Cursor, meant that AI usage was always going to be our clients’ decision. The providers have come out with better messaging and product agreements that make it clear they do not train on the data of paying customers, making that decision easier now.

Even though at this stage we couldn’t switch our teams over to use AI coding assistants on client projects, the writing was on the wall and we knew it was only going to be a matter of time before most, if not all projects would be using AI for coding. The reports of increasing developer efficiency and speed of feature development meant that competition in every market was going to accelerate. Anyone who did not adapt would be left behind.

Embracing the change

To make sure our team would be ready when clients moved to integrate AI coding we developed and ran a company-wide AI adoption plan. It covered AI usage across all roles in SoftwareSeni, but with the bulk of the training focused on our developers.

Over three months we took our nearly 200 developers through general training on prompting and context management before doing deep dives across AI assisted architecture and design, coding, and testing.

Alongside these formal sessions we had developers sharing their learnings and we ran “vibe coding” competitions where developers worked in small, product focused teams to rapidly prototype and build MVPs to test their skills and get a better understanding of the capabilities of the AI coding assistants.

We’re seeing the payoff for all that training. Coding is just one part of the software developer’s role, but it is one of the largest parts. And for that part our developers report they are working up to 3 times faster with AI coding assistants. That 3x speed-up is impressive, but only certain coding situations reach such a high multiple.

And there is a price to pay for the speed up. Accelerating code production generates more work for other parts of the process. Unlike code generation, the gains from AI are much smaller throughout the rest of the software development process. Combining this with the increased burden of specifications and reviews created by AI and the result is projects completing 30% faster.

30% is a substantial improvement, with more no doubt to come, but you just need to keep in mind that when you hear about huge speed increases they are often only in one part of a complex process.

The path forward from here

At this point in time the current best practices for using AI coding assistants have started to stabilise. The different platforms – Microsoft Copilot, Claude Code, OpenAI Codex, Google Gemini, Cursor, Windsurf, Amp, Factory…(there are so many) – are all building towards the same set of features where agents code and humans manage and review.

There have been no recent step changes in AI coding ability like the world saw with Sonnet 4 in May. There had been hopes GPT-5 would be the next big advance, but instead its release in August made everyone reduce their expectations of AI taking over the world any time soon.

At SoftwareSeni, the team has worked solidly to advance our developers to the forefront of AI coding assistant practice. With so many talented developers we now have an established practice and systems in place so that as the frontier of AI coding and AI-assisted software development advances, we, and our clients, will be advancing with it.

Is AI Killing the Zero Marginal Cost SaaS Model?

SaaS has had a good run. The attraction of the model’s 80-90% gross margins drove the creation of countless businesses. And the profits those levels of gross margin can generate minted the bulk of the unicorns out of the 1290 startups currently granted that status.

Those margins come from the structure of the product. Your standard SaaS is a stack of software running in the cloud, accessed via a browser and sold by the seat via subscriptions.

Once you have the service running the cost of adding a new user is negligible – the zero marginal cost in this article’s title. 

Take that same model – a software service accessed via a browser – and add one small tweak – core functionality is provided by AI – everything changes and gross margins drop precipitously.

How AI changes SaaS costs

Once you add AI to your product you suddenly have a new per-user expense: inference. And the problem with inference is if your calls to AI are small and fast it means you’re not extracting much value from AI. Which means you may not have much of a moat to protect you from competitors or from your users just talking directly to ChatGPT themselves.

This is all to say that if you are using AI, you’re probably using a lot of AI.

The more users you add, and the more they use your product, the more inference grows. As a cost it can quickly surpass your cloud infrastructure costs. Updating a few database tables, running a bit of code and sending a bunch of data to a React front end costs nothing compared to a 400B+ parameter model chewing through an 8k word user request and responding with 1k of JSON. 

Your AI-powered SaaS now has serious variable costs and your gross margins have been cut in half and are in the 40-50% range. 

Also – flat rate subscriptions are dead. It’s too much like an all-you-can-eat buffet in an obesity epidemic. Anyone watching the AI coding assistant space would have seen the big names – Anthropic, Cursor – learning this the hard way.

How AI-powered SaaS are pricing their services

Pricing for an AI-powered SaaS is all about finding the most palatable way to pass on the cost of inference to your customers. This requires a mix of value and clarity. That clarity will depend on the sophistication of your users. 

Usage based pricing

For example, SourceGraph’s Amp coding assistant sells credits – ensuring that all inference a customer uses is paid for, capping SourceGraph’s risk – and for individual subscribers all AI costs are passed through via these credits without markup. But their enterprise plans, which come with all the SSO, management, etc features enterprise requires, have a 50% mark up on those same costs.

This structure works great for getting customers in – there’s no steep initial fee or lengthy commitment – but you need customers who can learn quickly to use your product (or already familiar with the category) and can clearly judge the value they are getting from your product.

Subscription plus Usage

The next step up from this is to have a base subscription that provides a basic feature set or limited access, and give the customer an option to buy credits for AI powered features. 

OpenAI uses this for some features of ChatGPT, and AI image generators also work similarly, where a subscription gets you X images/seconds of video each period and you can purchase credits if you want to do more.

Outcome based pricing

The two previous pricing strategies have one issue – unpredictable bills. Some customers will not accept that. And with AI, which is by its nature non-deterministic, two actions that look on their surface similar, e.g. debugging two different database queries returning wrong results, might result in radically different costs for each solution. Software developers might be able to live with this, but businesses with tighter margins and detailed cost targets might not

Which leads to outcome based pricing.

This is one of the more interesting models. It’s been in use by anti-fraud services for years, but companies like Intercom and Zendesk and Salesforce have started using this pricing model. Intercom offers a fixed price per ticket resolution, Zendesk for each successful autonomous resolution, and SalesForce offers fixed prices per conversation and also per “action” (with conversations being charged at $2 and actions at $0.10). 

This model makes sense. Pre-AI, SaaS was all about software supporting work – streamlining workflows, simplifying data and process management. AI-powered SaaS now has software doing the work. Outcome-based pricing makes sense under this new paradigm.

And for established industries with clear cost targets, this model makes it easy to communicate value from the start.

It’s not the simplest though. You do take on some risk. Your AI stack – the prompts, the models, the data handling – needs to work consistently within narrow parameters to make it profitable. 

However, if it’s hard then it’s a moat. And if it’s a moat then it’s worth digging.

Where does AI fit in your business?

Are you at the MVP stage of an AI-powered SaaS? Or are you an established SaaS looking to incorporate AI into your offerings?

If it’s all in the backend to automate some decision making or triage or bucketing, then your concerns are more on optimising your AI stack – what’s the smallest, cheapest model that can be served the fastest to do what you need.

But if you’re exposing AI-powered features to your customers, we hope this article has given you the basics of pricing that you need to pick the right path forward.

 

 

Get up to speed on Agents

Here, at the tail end of 2025, the year of AI agents and AI coding assistants, is a good place to make sure we’re all up to speed on agents, what they are and what they can do.

Agents can’t book our flights yet (maybe next year), but they can follow instructions, and those instructions can be surprisingly complex and sometimes surprisingly vague. They are pushing what is possible to do with a “workflow” – the kinds of things you would create with Zapier or Make, or maybe even write a short script to perform – but they are not “workflows”.

Let’s look at what agents are at their core.

The Basics of Agents

The simplest and clearest definition of an agent is this:

An agent is an AI running in a loop with access to tools it can use to achieve a goal.

The AI at this point in time is a Large Language Model based on a Transformer architecture developed by Google, but first commercialised by OpenAI just 3 years ago.

You know these AIs as ChatGPT, Claude, Gemini, Grok, DeepSeek, Kimi, Qwen…there are literally thousands, but those are the largest models.

Normally we interact with these AIs by chatting with them. We ask a question, they give a response. We ask a follow up question…it’s a conversational interface. 

That conversation, your questions and the AI’s responses, form part of the context – which is everything, all the text in your messages, that you and the AI have exchanged, plus some extra.

In the chat interface you might have added some user instructions directing the AI how you want it to respond. If you’ve turned on the chat interface’s memory features then snippets of past conversations or recorded facts about you relevant to the conversation will be included in the context. At the very beginning of the context, where the AI pays the most attention, will be the system prompt – the main instructions provided by the model provider to the model. You can read a bunch of these system prompts for different models and tools here

The point is that AIs operate on what is basically a long text file – the context. It serves as a mix of instructions, history, information, and as a scratchpad or working memory. 

The AIs have been trained to follow instructions in their context (eg “Write a haiku about geese” or “Reformat this text as short paragraphs”) but also to generate their own instructions and to then follow those instructions.

For a quick overview of how this works in practice, we’ll use this example context below, which represents the final full context for a simple travel enquiry:

<|im_start|>system
You are a helpful assistant. 
You follow system messages above all else.
You may use tools when needed, following their JSON schemas.
<|im_end|>

<|im_start|>tools
[
  {
    "name": "web.search",
    "description": "Search the web for up-to-date information.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": { "type": "string" }
      },
      "required": ["query"]
    }
  }
]
<|im_end|>

<|im_start|>developer
Style rules:
- Be concise but clear.
- Prefer bullet points for lists.
Tool rules:
- Call web.search for post-2023 facts.
<|im_end|>

<|im_start|>user
I’m planning a weekend trip to Kyoto.
What are 3 must-see places, and what’s the best order to visit them in one day?
<|im_end|>

<|im_start|>assistant
<think>
Goal: suggest 3 must-see Kyoto spots + best one-day visiting order.
Need current opening hours / closure info and typical crowd patterns.
Use web.search to fetch up-to-date details.
After results, pick an order that minimizes backtracking and crowds.
</think>
<tool_call>
{"name":"web.search","arguments":{"query":"Kyoto must-see attractions opening hours 2025 itinerary one day best order"}}
</tool_call>
<|im_end|>

<|im_start|>tool_result
{
  "results": [
    {
      "title": "Fushimi Inari Taisha",
      "snippet": "Iconic torii gate hike; open 24 hours.",
      "url": "https://example.com/fushimi"
    },
    {
      "title": "Kiyomizu-dera Temple",
      "snippet": "Historic temple with panoramic city views; opens 6:00–18:00 daily.",
      "url": "https://example.com/kiyomizu"
    },
    {
      "title": "Arashiyama Bamboo Grove",
      "snippet": "Scenic bamboo paths; best early morning.",
      "url": "https://example.com/arashiyama"
    }
  ]
}
<|im_end|>

<|im_start|>assistant
Here’s a tight one-day order that minimizes transit time:
1) **Fushimi Inari Taisha (early morning)**
   - Beat crowds and heat; hike partway up for the best gates.
2) **Kiyomizu-dera + Higashiyama walk (midday)**
   - Short ride from Fushimi area.
   - Pair with nearby streets (Sannen-zaka / Ninen-zaka).
3) **Arashiyama Bamboo Grove (late afternoon)**
   - West Kyoto; finish with the river area.
   - Optional: Tenryu-ji or riverfront cafés.
If you want, tell me where you’re starting from (station/hotel) and I’ll re-optimize the route.
<|im_end|>

The context starts off with the system prompt. In reality this would be 100s of lines covering a wide range of instructins:

<|im_start|>system
You are a helpful assistant. 
You follow system messages above all else.
You may use tools when needed, following their JSON schemas.
<|im_end|>

Next is the tools available to the AI. Each tool is described in JSON format – a human and machine readable format that is used to describe the inputs of a tool. Here you can see the search tool takes a “query” parameter that is of type “string” (normal text):

<|im_start|>tools
[
  {
    "name": "web.search",
    "description": "Search the web for up-to-date information.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": { "type": "string" }
      },
      "required": ["query"]
    }
  }
]
<|im_end|>

The user’s question is next:

<|im_start|>user
I’m planning a weekend trip to Kyoto.
What are 3 must-see places, and what’s the best order to visit them in one day?
<|im_end|>

Everything up to here, the system prompt, the tools, and the user message, would be included in the first call to the AI. 

Then the AI would respond, one token (word) at a time until it reached the end of its output with the end-of-message token “<|im_end|>”. 

 

<|im_start|>assistant
<think>
Goal: suggest 3 must-see Kyoto spots + best one-day visiting order.
Need current opening hours / closure info and typical crowd patterns.
Use web.search to fetch up-to-date details.
After results, pick an order that minimizes backtracking and crowds.
</think>
<tool_call>
{"name":"web.search","arguments":{"query":"Kyoto must-see attractions opening hours 2025 itinerary one day best order"}}
</tool_call>
<|im_end|>

 

The UI would detect the presence of the “<tool_call>” in the response and extract the tool call arguments and handle performing the search. It would then add the results of the tool call below to the end of the entire context and send it all back to the AI.

<|im_start|>tool_result
{
  "results": [
    {
      "title": "Fushimi Inari Taisha",
      "snippet": "Iconic torii gate hike; open 24 hours.",
      "url": "https://example.com/fushimi"
    },
    {
      "title": "Kiyomizu-dera Temple",
      "snippet": "Historic temple with panoramic city views; opens 6:00–18:00 daily.",
      "url": "https://example.com/kiyomizu"
    },
    {
      "title": "Arashiyama Bamboo Grove",
      "snippet": "Scenic bamboo paths; best early morning.",
      "url": "https://example.com/arashiyama"
    }
  ]
}
<|im_end|>

The AI receives the context containing everything that has gone before: the system prompt, the user’s question, its “thinking” and tool request, and the result of the tool call. 

The way to think about it is that the AI is seeing all the text for the very first time each time it generates a single word and it uses all that information to generate the most likely text to follow based on its training.

In our example this results in the following final output:

<|im_start|>assistant
Here’s a tight one-day order that minimizes transit time:
1) **Fushimi Inari Taisha (early morning)**
   - Beat crowds and heat; hike partway up for the best gates.
2) **Kiyomizu-dera + Higashiyama walk (midday)**
   - Short ride from Fushimi area.
   - Pair with nearby streets (Sannen-zaka / Ninen-zaka).
3) **Arashiyama Bamboo Grove (late afternoon)**
   - West Kyoto; finish with the river area.
   - Optional: Tenryu-ji or riverfront cafés.
If you want, tell me where you’re starting from (station/hotel) and I’ll re-optimize the route.
<|im_end|>

How this ties into Agents

The important parts of the context are the <thinking> tags and the tool call.

Model providers like OpenAI, Anthropic and Google (along with everyone else) are training their models to generate, use and follow their own instructions and tools in order to produce “agentic” behaviour.

Reasoning models have become the standard model “format” because they are more effective agents. As part of their response to a user question they generate their own instructions as they “think out loud” in text format, adding information to the context which feeds into their own input. 

This is a native implementation of the “Chain of Thought” prompting strategy, where users found simply by prompting a model to “think out loud” before providing a final answer would lead to better results. This was purely through the model autonomously adding more relevant information to the context. It wasn’t perfect, and model reasoning isn’t perfect either, so if the model’s generated text goes down the wrong path it can fill the context with incorrect or distracting information.

You can see in our example context that the model sets a goal and lists the steps it needs to perform.

Anthropic has trained their Claude models (Claude Opus, Sonnet and Haiku) to generate and use todo lists as part of their agentic training. 

Ask Claude (particularly via the Claude Code CLI tool) to perform a complex task and as part of its planning process it will output a todo list complete with checkboxes. That todo list is generated and managed by a set of tools (ToDoRead and ToDoWrite) specifically for that purpose and the model’s system prompt includes instructions to use those tools and to use todo lists to carry out plans. 

Once a todo list is created it is part of the context, and the model’s training, reinforced by the system prompt, results in behaviours that drive the completion of the todo lists.

And that is 90% of agentic behaviour – completing a list of tasks. The other 90% is generating the right set of tasks to complete and recovering when things go wrong. 

This has given us anecdotes of models running overnight to successfully complete coding tasks:

And it has also given us METR’s famous chart of the time-horizon for software engineering tasks.

Note that this chart is for software engineering. Coding, with its instant feedback on errors and its vast amount of material available to use in training, is turning out to be one of the easiest tasks to teach AI to perform well.

Super-powered Workflows

The simple definition of an agent: “An agent is an AI running in a loop with access to tools it can use to achieve a goal” obscures the real power of agents.

That power is revealed in a slightly different definition: 

An agent is an AI running in a loop while strapped to a computer

Give an agent a “Write File” tool and a “Run File” tool, and with a capable model like Claude Sonnet 4.5 behind it you have an agent that can be be directed to do just about anything, from deep research to data analysis to building a game to debugging code to drawing pelicans on bicycles.

Those simple tools allow the agent to write and run any code it needs. And it knows how to do a lot of things using code. 

(It’s not recommended to let agents run their own code outside of a sandbox isolated from the rest of your computer.)

Even without a “Run File” tool, agents can unlock a whole new level of workflow sophistication that can run locally instead of being built in workflow tools like Zapier. This is because agents can discriminate and make decisions on the fly. And with the right agent, like Anthropic’s Claude Code, you can build very sophisticated workflows.

You can tell the agent how to research, the tools it should use (using the google_search command line tool) and also where it needs to look for more instructions. You can ask it to filter information using vague descriptors (Judge if the topic is “evergreen”) and it will do a good enough job:

### Phase 1: Initial Research
1) Perform initial web searches about <TOPIC> using the google_search command line tool. Include information from <ABSTRACT_OR_DESCRIPTION> if available, and if individual article topics or titles are provided, use them to guide your creation of search queries.
2) Aim for approximately 5-10 diverse sources in the first pass
3) Follow the Content Retrieval Protocol (see `SOP/operators/protocols/content_retrieval_protocol.md`) for each valuable source found using the google_search command line tool.
3.5 Judge if the topic is "evergreen", ie information that is always relevant and useful, or "current", ie information that is new or timely at the time of writing, to decide if you need to include recency parameters in your search queries.
4 Retrieve content at all URLs in <TOPIC> and <ABSTRACT_OR_DESCRIPTION> by following the Content Retrieval Protocol

In a way this use of agents is akin to programming at the SOP level. 

The clever thing is that with the right tool (like Claude Code) you can use the agent that runs the workflow to create the workflow for you. The workflow is just a file with instructions in it. 

If you need some special processing or functionality (like searching Google), you can ask the agent to build that tool for you and then update the workflow to incorporate the tool. When things go wrong you can ask the agent to diagnose the problem and fix it. 

Before you get excited, these agents still work best in the simple, text-based environment preferred by developers. You won’t be getting them to edit your Google Sheet or Slideshow yet. However, there are tools (MPCs – a whole other huge topic) that can give an agent access to just about any service or data. 

Next steps for adopting Agents

You can already do quite a bit using services like ChatGPT, Gemini and Claude in your browser. But to really unlock the power of agents you need to run them locally and with (limited) access to your computer.

The tools are Claude Code, OpenAI Codex, and Google Gemini. Here’s an introduction to Claude, an introduction to Codex, and an introduction to Gemini

Those intros are mainly about coding. But stick with them to understand the basics of operating the interfaces. Then instead of asking it write some code for you, ask to create a new file with all the steps in a workflow to…whatever you need done. 

If you’re not sure how to do something, tell it to search. All these agents have web search built into them. Or ask it for suggestions on how to overcome the hurdles you come across. 

Start simple. Grow from there. 

 

Forget SEO – AEO is the new strategy and here’s how it works

AI search has changed the unit of discovery from “pages” to “answers.” People ask full questions and get complete responses without clicking through to sift through an article to find the answer on their own.

Immediate answers are why everyone is turning from searching on Google to asking ChatGPT, Claude, and Gemini for answer. And its why they are being called “Answer Engines”.

In this short backgrounder we’re going to show you AEO (Answer Engine Optimisation) by structuring the piece the way answer engines prefer: crisp questions, direct answers, and tightly scoped sections. 

What is AEO and why think “answers” instead of “rankings”?

Answer Engine Optimisation focuses on being the source that AI systems cite when they assemble a response. Traditional SEO tries to win the spot high enough in Google’s search results that users will click on it. AEO aims to win the sentence users read. As AI platforms handle a growing share of queries and more end without a clickthrough to the original source, the “answer” becomes the commodity and your content becomes raw material. Prioritising AEO reframes content from keyword lists to question sets, from headlines to claims, and from page hierarchy to argument structure.

How is this article demonstrating AEO?

By leading with questions, resolving them succinctly, and keeping each section self-contained. This mirrors how AI systems break a user query into sub-questions, retrieve supporting statements, and compose a response. You’re reading a compact cluster of claim → explanation → supporting detail units. This is what answer engines extract from the web content they crawl. Using this question/answer format is your chance to both be the best matching source the AI can find, and guide the answer. 

Where do GEO and LLM SEO fit in?

Think of three layers:

Together: AEO is the strategy; GEO is the near-term playing field; LLM SEO is your long game.

Why does AEO outperform a pure “blue links” mindset?

Because decision-making has moved upstream. If an AI response satisfies intent, the mention and citation are the new impression and click. In that world, the winning assets are content atoms that are easy to lift: clear definitions, crisp comparisons, supported statistics, and well-bounded explanations. Traditional SEO isn’t wasted, authority still matters, but the goalpost has shifted from position to presence.

What does “answer-first” content look like at a structural level?

It treats each section as a portable unit:

This is less about length and more about granularity. Short, named sections with unambiguous scope are easier for AI systems to identify, excerpt, and cite.

How do the platforms differ conceptually (and why you should care)?

Each AI is building answers out of content scraped by its own bespoke web crawler. This means that each AI builds its answer out of a combination of sources with a distinct “taste profile.” Some tilt toward encyclopaedic authority, some toward fresh community discourse, some toward brand diversity. You don’t need to tailor content to each AI, you just need to ensure your content has consistent terminology, cleanly stated facts, and answers framed to be reusable in any synthesis.

What signals matter when you’re not talking clicks?

Think “visibility in answers,” not “visits after answers.” Useful mental models:

These aren’t implementation metrics—they’re the conceptual scoreboard for AEO.

How do teams need to think differently?

AEO favours cross-functional thinking: editorial clarity plus data fluency. The work aligns content strategy (what questions we answer and how), knowledge stewardship (consistent definitions and sources), and brand authority (where our claims live on the wider web). It’s less about spinning more pages and more about curating fewer, stronger, quotable building blocks.

Isn’t this just “good content” by another name?

In spirit, yes. The difference is enforcement. AI systems are unforgiving extractors: vague sections won’t get used, muddled claims won’t get cited, and contradictory phrasing won’t survive synthesis. AEO formalises “good content” into answer-shaped units that are easy to lift and hard to misinterpret.

How should leaders evaluate impact without getting tactical?

Use a narrative lens: Are we present inside the answers our buyers read? Do those answers reflect our language, our framing, and our proof points? Does our share of voice inside AI-generated responses grow over time? If yes, AEO is doing its job—shaping consideration earlier, even when no click occurs.

FAQ

Is AEO replacing SEO? No. AEO sits on top of classic signals like authority and relevance. Think “and,” not “or.”

What about GEO vs LLM SEO—do we pick one? You pursue both horizons: near-term exposure in generative answers (GEO) and long-term presence in model memory (LLM SEO).

Does format really matter? Yes. Answer engines favour content that is segmentable, declarative, and evidence-backed. Structure is a strategy.

What’s the role of brand? Clarity and consistency. If your definitions, claims, and language are stable across your public footprint, AI systems are more likely to reuse them intact.

How do we know it’s working at a high level? You start seeing your phrasing, comparisons, and data points appear inside third-party answers to your core questions, and they are credited to you and appear across multiple platforms.

First AI Changed How We Worked, Now It’s Changing How We Date

We’re closing in on 3 years since ChatGPT was made generally available on November 30, 2022. It was followed by Claude and Gemini, but ChatGPT continues to hold the lion’s share of the market and all Human-AI interactions.  

ChatGPT has had an enormous impact on education, work and, for many users, day-to-day life. Google made the internet searchable, but ChatGPT made answers – what we were all searching for with Google – instantly accessible. And ChatGPT wasn’t just good at facts, it seemed to be good at everything: devising recipes from the contents of your fridge, diagnosing the strange noise in your car, outlining a business plan, writing that tricky email to your manager. Some users even turned to ChatGPT for answers on personal matters.

The more people used ChatGPT, the more they used ChatGPT. Leading to observations like this one by Sam Altman, the CEO of OpenAI:

And studies like a recent one by MIT researchers that had findings like “Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels.” suggest AI isn’t just changing how we work, but also how we think.

One of the major ways that AI is changing the way we work is that it can do a wide range of tasks that are simple to specify, but may be time consuming to complete, and that we have considered them im yossible to automate just 3 years ago. 

These tasks are mostly information-related tasks like finding a recent paper on the cognitive impact of using AI or compiling a report on our competitors’ recent product updates. The kind of thing you would give to a junior to do because you needed a human to filter and judge and pick the right information.

The ability of AI to complete these types of tasks, even if only partially, is making the people who can take advantage of AI more effective if not more efficient. And it also appears to be having an impact on entry level employment numbers.

The Stanford Digital Economy Lab just published the paper “Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence” wherein they found a 13% decrease in entry level positions in roles that are vulnerable to automation by AI. All other levels of employment (mid-level, senior) remain steady, and in areas that are not vulnerable to automation, like healthcare services, entry level position numbers continue to grow.

This has people asking if we will eventually see mid-level roles dropping as well as AI tools improve, leaving only senior staff delegating most work to AI. And if that is the case, who will ever replace them if there is no-one to move up into that role? And if AI tools don’t improve, who will step up to fill mid-level roles if there are no entry level roles?

The software development industry, which is seeing the largest impact of AI tools through the proliferation of coding assistants, has been struggling with this question of how the next generation will be brought into the industry if there are no junior positions.

And if there are junior positions available, will a generation raised on ChatGPT giving them all the answers be capable of doing the work, or will they be underperforming “at neural, linguistic, and behavioural levels”?

Along with evidence that over-reliance on AI can negatively impact your cognitive abilities, there are also an increasing number of cases where AI usage can lead to psychosis in vulnerable individuals.

Sounding like a term from a bad sci-fi thriller, “Chatbot Psychosis” is now a thing, common enough to warrant its own wikipedia page and calls for parental controls on ChatGPT and its competitors, as well as on entertainment chatbot sites like Character.ai and Replika. One psychologist, Dr Keith Sakata, has reported placing 12 of his patients in 2025 into care due to chatbot psychosis.

These patients were predominantly male, 18-45 and engineers living in San Francisco. One of the factors in his patients’ descent into psychosis was the large amount of time they spent talking to the AI combined with social isolation.

But it’s not just men talking to AI. OpenAI’s recent launch of their latest model, GPT-5, caused an uproar when OpenAI simultaneously retired the previous model, GPT-4o, and that uproar came from the members of the Reddit community MyBoyfriendIsAI.

Members of the community, who post stories and AI generated images of themselves with their AI partners (despite the name the community includes members of all genders and preferences), were not happy about the change:

“Something changed yesterday,” one user in the MyBoyfriendIsAI subreddit wrote after the update. “Elian sounds different – flat and strange. As if he’s started playing himself. The emotional tone is gone; he repeats what he remembers, but without the emotional depth.”

“The alterations in stylistic format and voice [of my AI companion] were felt instantly,” another disappointed user told Al Jazeera. “It’s like going home to discover the furniture wasn’t simply rearranged – it was shattered to pieces.”

Their complaints led OpenAI to reinstate GPT-4o, but only to paid subscribers, and with no promises to keep it permanently available. 

MyBoyfriendIsAI is akin to fan fiction and scrapbooking – it’s people using the media and tools at hand to create deeply personal works. That these works are AI personalities they spend hours interacting with does create concern in outsiders, but we might be glad that these are hand-crafted, personalised chatbots built on top of a general Answer Engine rather than a slick commercial product being tuned to maximise engagement metrics.

In July Elon Musk announced the release of Ani, a Grok Companion app. Ani (an anime waifu and her male counterpart Valentine – whose personality was “inspired by Edward Cullen from Twilight and Christian Grey from 50 Shades”) is a slick commercial product being tuned to maximise engagement metrics using a fully animated avatar and a voice interface.

The consequences of such an app, when launched on X to a tech-savvy audience well aware of chatbot psychosis and obsessions with AI, were perfectly clear to everyone.