Here, at the tail end of 2025, the year of AI agents and AI coding assistants, is a good place to make sure we’re all up to speed on agents, what they are and what they can do.
Agents can’t book our flights yet (maybe next year), but they can follow instructions, and those instructions can be surprisingly complex and sometimes surprisingly vague. They are pushing what is possible to do with a “workflow” – the kinds of things you would create with Zapier or Make, or maybe even write a short script to perform – but they are not “workflows”.
Let’s look at what agents are at their core.
The Basics of Agents
The simplest and clearest definition of an agent is this:
“An agent is an AI running in a loop with access to tools it can use to achieve a goal.”
The AI at this point in time is a Large Language Model based on a Transformer architecture developed by Google, but first commercialised by OpenAI just 3 years ago.
You know these AIs as ChatGPT, Claude, Gemini, Grok, DeepSeek, Kimi, Qwen…there are literally thousands, but those are the largest models.
Normally we interact with these AIs by chatting with them. We ask a question, they give a response. We ask a follow up question…it’s a conversational interface.
That conversation, your questions and the AI’s responses, form part of the context – which is everything, all the text in your messages, that you and the AI have exchanged, plus some extra.
In the chat interface you might have added some user instructions directing the AI how you want it to respond. If you’ve turned on the chat interface’s memory features then snippets of past conversations or recorded facts about you relevant to the conversation will be included in the context. At the very beginning of the context, where the AI pays the most attention, will be the system prompt – the main instructions provided by the model provider to the model. You can read a bunch of these system prompts for different models and tools here.
The point is that AIs operate on what is basically a long text file – the context. It serves as a mix of instructions, history, information, and as a scratchpad or working memory.
The AIs have been trained to follow instructions in their context (eg “Write a haiku about geese” or “Reformat this text as short paragraphs”) but also to generate their own instructions and to then follow those instructions.
For a quick overview of how this works in practice, we’ll use this example context below, which represents the final full context for a simple travel enquiry:
<|im_start|>system You are a helpful assistant. You follow system messages above all else. You may use tools when needed, following their JSON schemas. <|im_end|> <|im_start|>tools [ { "name": "web.search", "description": "Search the web for up-to-date information.", "parameters": { "type": "object", "properties": { "query": { "type": "string" } }, "required": ["query"] } } ] <|im_end|> <|im_start|>developer Style rules: - Be concise but clear. - Prefer bullet points for lists. Tool rules: - Call web.search for post-2023 facts. <|im_end|> <|im_start|>user I’m planning a weekend trip to Kyoto. What are 3 must-see places, and what’s the best order to visit them in one day? <|im_end|> <|im_start|>assistant <think> Goal: suggest 3 must-see Kyoto spots + best one-day visiting order. Need current opening hours / closure info and typical crowd patterns. Use web.search to fetch up-to-date details. After results, pick an order that minimizes backtracking and crowds. </think> <tool_call> {"name":"web.search","arguments":{"query":"Kyoto must-see attractions opening hours 2025 itinerary one day best order"}} </tool_call> <|im_end|> <|im_start|>tool_result { "results": [ { "title": "Fushimi Inari Taisha", "snippet": "Iconic torii gate hike; open 24 hours.", "url": "https://example.com/fushimi" }, { "title": "Kiyomizu-dera Temple", "snippet": "Historic temple with panoramic city views; opens 6:00–18:00 daily.", "url": "https://example.com/kiyomizu" }, { "title": "Arashiyama Bamboo Grove", "snippet": "Scenic bamboo paths; best early morning.", "url": "https://example.com/arashiyama" } ] } <|im_end|> <|im_start|>assistant Here’s a tight one-day order that minimizes transit time: 1) **Fushimi Inari Taisha (early morning)** - Beat crowds and heat; hike partway up for the best gates. 2) **Kiyomizu-dera + Higashiyama walk (midday)** - Short ride from Fushimi area. - Pair with nearby streets (Sannen-zaka / Ninen-zaka). 3) **Arashiyama Bamboo Grove (late afternoon)** - West Kyoto; finish with the river area. - Optional: Tenryu-ji or riverfront cafés. If you want, tell me where you’re starting from (station/hotel) and I’ll re-optimize the route. <|im_end|>
The context starts off with the system prompt. In reality this would be 100s of lines covering a wide range of instructins:
<|im_start|>system You are a helpful assistant. You follow system messages above all else. You may use tools when needed, following their JSON schemas. <|im_end|>
Next is the tools available to the AI. Each tool is described in JSON format – a human and machine readable format that is used to describe the inputs of a tool. Here you can see the search tool takes a “query” parameter that is of type “string” (normal text):
<|im_start|>tools [ { "name": "web.search", "description": "Search the web for up-to-date information.", "parameters": { "type": "object", "properties": { "query": { "type": "string" } }, "required": ["query"] } } ] <|im_end|>
The user’s question is next:
<|im_start|>user I’m planning a weekend trip to Kyoto. What are 3 must-see places, and what’s the best order to visit them in one day? <|im_end|>
Everything up to here, the system prompt, the tools, and the user message, would be included in the first call to the AI.
Then the AI would respond, one token (word) at a time until it reached the end of its output with the end-of-message token “<|im_end|>”.
<|im_start|>assistant <think> Goal: suggest 3 must-see Kyoto spots + best one-day visiting order. Need current opening hours / closure info and typical crowd patterns. Use web.search to fetch up-to-date details. After results, pick an order that minimizes backtracking and crowds. </think> <tool_call> {"name":"web.search","arguments":{"query":"Kyoto must-see attractions opening hours 2025 itinerary one day best order"}} </tool_call> <|im_end|>
The UI would detect the presence of the “<tool_call>” in the response and extract the tool call arguments and handle performing the search. It would then add the results of the tool call below to the end of the entire context and send it all back to the AI. <|im_start|>tool_result { "results": [ { "title": "Fushimi Inari Taisha", "snippet": "Iconic torii gate hike; open 24 hours.", "url": "https://example.com/fushimi" }, { "title": "Kiyomizu-dera Temple", "snippet": "Historic temple with panoramic city views; opens 6:00–18:00 daily.", "url": "https://example.com/kiyomizu" }, { "title": "Arashiyama Bamboo Grove", "snippet": "Scenic bamboo paths; best early morning.", "url": "https://example.com/arashiyama" } ] } <|im_end|>
The AI receives the context containing everything that has gone before: the system prompt, the user’s question, its “thinking” and tool request, and the result of the tool call.
The way to think about it is that the AI is seeing all the text for the very first time each time it generates a single word and it uses all that information to generate the most likely text to follow based on its training.
In our example this results in the following final output:
<|im_start|>assistant Here’s a tight one-day order that minimizes transit time: 1) **Fushimi Inari Taisha (early morning)** - Beat crowds and heat; hike partway up for the best gates. 2) **Kiyomizu-dera + Higashiyama walk (midday)** - Short ride from Fushimi area. - Pair with nearby streets (Sannen-zaka / Ninen-zaka). 3) **Arashiyama Bamboo Grove (late afternoon)** - West Kyoto; finish with the river area. - Optional: Tenryu-ji or riverfront cafés. If you want, tell me where you’re starting from (station/hotel) and I’ll re-optimize the route. <|im_end|>
How this ties into Agents
The important parts of the context are the <thinking> tags and the tool call.
Model providers like OpenAI, Anthropic and Google (along with everyone else) are training their models to generate, use and follow their own instructions and tools in order to produce “agentic” behaviour.
Reasoning models have become the standard model “format” because they are more effective agents. As part of their response to a user question they generate their own instructions as they “think out loud” in text format, adding information to the context which feeds into their own input.
This is a native implementation of the “Chain of Thought” prompting strategy, where users found simply by prompting a model to “think out loud” before providing a final answer would lead to better results. This was purely through the model autonomously adding more relevant information to the context. It wasn’t perfect, and model reasoning isn’t perfect either, so if the model’s generated text goes down the wrong path it can fill the context with incorrect or distracting information.
You can see in our example context that the model sets a goal and lists the steps it needs to perform.
Anthropic has trained their Claude models (Claude Opus, Sonnet and Haiku) to generate and use todo lists as part of their agentic training.
Ask Claude (particularly via the Claude Code CLI tool) to perform a complex task and as part of its planning process it will output a todo list complete with checkboxes. That todo list is generated and managed by a set of tools (ToDoRead and ToDoWrite) specifically for that purpose and the model’s system prompt includes instructions to use those tools and to use todo lists to carry out plans.
Once a todo list is created it is part of the context, and the model’s training, reinforced by the system prompt, results in behaviours that drive the completion of the todo lists.
And that is 90% of agentic behaviour – completing a list of tasks. The other 90% is generating the right set of tasks to complete and recovering when things go wrong.
This has given us anecdotes of models running overnight to successfully complete coding tasks:
And it has also given us METR’s famous chart of the time-horizon for software engineering tasks.
Note that this chart is for software engineering. Coding, with its instant feedback on errors and its vast amount of material available to use in training, is turning out to be one of the easiest tasks to teach AI to perform well.
Super-powered Workflows
The simple definition of an agent: “An agent is an AI running in a loop with access to tools it can use to achieve a goal” obscures the real power of agents.
That power is revealed in a slightly different definition:
“An agent is an AI running in a loop while strapped to a computer”
Give an agent a “Write File” tool and a “Run File” tool, and with a capable model like Claude Sonnet 4.5 behind it you have an agent that can be be directed to do just about anything, from deep research to data analysis to building a game to debugging code to drawing pelicans on bicycles.
Those simple tools allow the agent to write and run any code it needs. And it knows how to do a lot of things using code.
(It’s not recommended to let agents run their own code outside of a sandbox isolated from the rest of your computer.)
Even without a “Run File” tool, agents can unlock a whole new level of workflow sophistication that can run locally instead of being built in workflow tools like Zapier. This is because agents can discriminate and make decisions on the fly. And with the right agent, like Anthropic’s Claude Code, you can build very sophisticated workflows.
You can tell the agent how to research, the tools it should use (using the google_search command line tool) and also where it needs to look for more instructions. You can ask it to filter information using vague descriptors (Judge if the topic is “evergreen”) and it will do a good enough job:
### Phase 1: Initial Research
1) Perform initial web searches about <TOPIC> using the google_search command line tool. Include information from <ABSTRACT_OR_DESCRIPTION> if available, and if individual article topics or titles are provided, use them to guide your creation of search queries.
2) Aim for approximately 5-10 diverse sources in the first pass
3) Follow the Content Retrieval Protocol (see `SOP/operators/protocols/content_retrieval_protocol.md`) for each valuable source found using the google_search command line tool.
3.5 Judge if the topic is "evergreen", ie information that is always relevant and useful, or "current", ie information that is new or timely at the time of writing, to decide if you need to include recency parameters in your search queries.
4 Retrieve content at all URLs in <TOPIC> and <ABSTRACT_OR_DESCRIPTION> by following the Content Retrieval Protocol
In a way this use of agents is akin to programming at the SOP level.
The clever thing is that with the right tool (like Claude Code) you can use the agent that runs the workflow to create the workflow for you. The workflow is just a file with instructions in it.
If you need some special processing or functionality (like searching Google), you can ask the agent to build that tool for you and then update the workflow to incorporate the tool. When things go wrong you can ask the agent to diagnose the problem and fix it.
Before you get excited, these agents still work best in the simple, text-based environment preferred by developers. You won’t be getting them to edit your Google Sheet or Slideshow yet. However, there are tools (MPCs – a whole other huge topic) that can give an agent access to just about any service or data.
Next steps for adopting Agents
You can already do quite a bit using services like ChatGPT, Gemini and Claude in your browser. But to really unlock the power of agents you need to run them locally and with (limited) access to your computer.
The tools are Claude Code, OpenAI Codex, and Google Gemini. Here’s an introduction to Claude, an introduction to Codex, and an introduction to Gemini.
Those intros are mainly about coding. But stick with them to understand the basics of operating the interfaces. Then instead of asking it write some code for you, ask to create a new file with all the steps in a workflow to…whatever you need done.
If you’re not sure how to do something, tell it to search. All these agents have web search built into them. Or ask it for suggestions on how to overcome the hurdles you come across.
Start simple. Grow from there.