Remember a long time ago (2024) when prompt engineering was the frontier of AI and people were landing 6 figure roles to engineer prompts? Prompt engineering was followed by context engineering, and that by harness engineering. The gap between each “engineering” growing shorter each time.
The latest engineering causing the tech internet to lose its mind is loop engineering. The mind-losing was triggered by this tweet from OpenClaw creator Peter Steinberger:

In an industry already dealing with the fallout of having machines do the very thing that was the core of their career and for many the soul of their identity, implying everyone should get over unanswered questions on the quality of AI-generated code and move on to embrace these vague “loops” created panic in everyone who didn’t know what these “loops” were and friction between the code purists and the agent maximalists.
It also amplified the anxiety over the increasing costs of agentic coding. Peter Steinberger works for OpenAI and posts about burning through billions of tokens a month. The other headliner talking about loops was Boris Cherny, head of Claude Code at Anthropic. He has said in interviews “my job is to write loops”. He also has access to unlimited tokens.
If loops were where the future of agentic coding was heading, who could afford to follow? And the silent corollary – if you can’t follow how can you compete?
So, we’re going to give a quick breakdown of what loop engineering is and some examples of how and where it can be used. As for affording to follow – that’s a bit of a case-by-case $$$ vs infrastructure questions which we will touch on briefly at the end.
What is loop engineering
Loop engineering is automation that relies on the judgment and abilities of agents. Judgement here includes how to do a task/solve a problem, if a solution is correct, and if the result needs to be escalated to a human.
This opens up a whole new universe of possibilities that deterministic, software-based automation can never touch.
At its most basic level it is an orchestration pattern that combines:
-
Scheduled or triggered execution
-
Isolate workspaces – so agents don’t overwrite other agents’ work (because you will of course be running hundreds of agents in parallel)
-
Verification – this is the key to making loops viable: provable success
-
Persistent memory – this could be an issue tracker or a database
Verification falls into two categories – deterministic and agent-based. Deterministic verification is things like build systems, linters, benchmarks, test suites – code that runs and returns actionable information: errors that need to be corrected, numbers that aren’t low enough/high enough yet. Agent-based metrics leverage the judgement of agents to verify that outputs meet complex or subtle criteria, or criteria that the developer can’t be bothered building into deterministic verification steps.
Agent-based verification works best if a separate agent with no shared context, and preferably from a different model family, is used. So use GPT-5.5 to judge the outputs of Opus 4.8 and vice versa. This is used particularly for coding oriented loops where what is being verified is complex in itself.
What loop engineering is used for
The success of AI-based coding is due to the simple fact that solutions to coding problems can be tested and verified, making training and improvement in the domain possible. This extends to the kinds of “loops” that can be engineered.
These examples have been collected and summarised from discussions across the tech internet. They are broad and they are not exhaustive.
Improving code quality
Set agents to work on (a copy of) your codebase with a simple goal like “simplify the code” or, more specifically:
“Reduce the cyclomatic complexity of the code while maintaining correctness. Run the test suite after each change and fix any errors that are reported. Keep going until you’ve revised every source file.”
Performance optimisation
This is much the same as improving code quality, but instead of your own test suite you might have an industry or inhouse benchmark that you want to impove on.
Processing logs
“Logs” here can mean anything from server logs to mailboxes. This is where a scheduled job might kick off at 7am every morning to go through each item and triage them.
For example, your product’s CI failures from the day before – you can a set a sub-agent per failure to work on a fix in its own worktree, then have a second sub-agent follow it up with a code review and running it against your product’s test suite. Perhaps a second round of failures at this stage results in it being elevated to a human.
Keeping documentation up to date
Every day, or every hour, you can have agents go through your codebase, or pull information from your CMS or a projects directory or whatever set of documents you need to have correct and current information available to your organisation.
The agents make sure sources and documents are in sync and all the information is structured, formatted and styled to match your inhouse standards.
Other loop ideas
People are out there trying to find where they can leverage loops for their own work. You can find a collection of sometimes quite specific loops at this site.
Don’t be surprised if it feels like prompt engineering all over again.
How to build loops
Tools for building loops are already present in the Anthropic’s Claude Code tool and in OpenAI’s Codex App (not to be mistaken for OpenAI’s Codex cli tool).
Claude is ahead here at the moment for ease of building loops.
Both have “/goal <condition>” (Anthropic / OpenAI) – tell the agent what you want to achieve and what success looks like and it will keep running until it reaches it, you stop it, or you exhaust your subscription, or your token credits (you did set limits on your billing, right?).
People have reported Opus and GPT-5.5 working over days to complete a goal successfully.
Claude also has “/loop <interval> <prompt>” command. It uses Anthropics scheduling infrastructure behind the scenes. Interval can be 5 minutes, it can be hours, and if you don’t suggest it, Claude will decide itself based on the prompt how often it should run.
You can also use Anthropic’s scheduling infrastructure directly via the “/schedule” command in Claude Code as well as in the Claude app.
OpenAI has automations available in the Codex app. You can prompt GPT-5.5 from a chat to set up schedules for any work you want to automate.
Of course, you can always write your own agent loops, or get the agents to write the loops for you.
The caveats of loop engineering
Infrastructure is infrastructure – it still needs to be created, documented and maintained. On top of that, loop engineering gives you the opportunity to wake up to immense invoices or at least exhausted subscriptions. There are ways around this.
We don’t just mean setting limits. We mean exploring alternative providers to OpenAI and Anthropic, if your product/vertical/nation allows it. The last few months have seen increasingly effective open weight models becoming available. They are much cheaper than Opus and GPT-5.5, and are being served by multiple neo-clouds, and available via OpenRouter.
Your entire loop doesn’t need to run on these models. You can have your Opus/GPT-5.5 agents delegate the legwork to agents running on these cheaper models. All it takes is a little more set up and a little more testing for permanent cost decreases.
And, if your organisation is technically proficient and likes a challenge, you can investigate running models locally for parts of your workflow. Agents promise to do a lot and they will do it by burning compute. So owning the compute and being able to burn tokens 24/7 to update that documentation, clean up that code, etc, might be what makes them viable for you.
The human cost of loop engineering
When you can launch hundreds of agents at the click of a mouse the human becomes the bottleneck and, because agents are imperfect and the world is messy so escalation to humans is common, the workload can become inhumane. And once the workload becomes more than what a person can diligently attend to, slop begins to accumulate, and once it accumulates it becomes load bearing.
So the only strategy here is to scale the number of agents to match the humane review rate. Scaling beyond this rate will lead to dropping standards, not to mention miserable team members looking for better options. And letting agents work ahead of reviews means burning tokens on code that suddenly needs to be rewritten because agents made and propagated mistakes or poor decisions about architecture.
Do you have loops worth engineering?
Loop engineering is really about asking two questions. The first is “Where are we wasting the time and intelligence of a human in X% of the cases?”. Do you need a human to implement and test a 15 line code fix? Do you need a human to triage emails?
The second question is “What automated improvements/optimisations will pay dividends for us?”.
Once you have answers to either of those questions, it is often just a matter of asking an agent to implement a prototype of the solution and growing from there.