Claude Code$20/mo
Terminal-native autonomous coding agent from Anthropic
Foundations · Updated June 25, 2026
An AI agent works by running a loop: it takes a goal, gathers context, uses a large language model to reason about the next step, calls a tool to act on the world, observes the result, then repeats — planning, acting, and self-correcting across many steps until the goal is met or a limit stops it.
| Step | What the agent does | Example: fixing a failing test |
|---|---|---|
| 1. Goal intake | Receives an objective in plain language and, if needed, breaks it into sub-goals. | "Make the checkout test pass." |
| 2. Perceive / gather context | Pulls in the information it needs — files, search results, API responses, prior memory. | Reads the test file and the error output from the last run. |
| 3. Reason / plan | The LLM core thinks through the situation and decides the next action. | Concludes the bug is a null check in the price formatter. |
| 4. Act (tool call) | Calls a tool — edit a file, run code, browse, send a request — to change the world. | Edits the formatter and runs the test suite. |
| 5. Observe | Reads the result of the action and feeds it back into the loop. | Sees the test still fails on a second assertion. |
| 6. Iterate or finish | Repeats steps 3–5 with the new information, or stops when the goal is met or a limit is hit. | Fixes the second case, re-runs, sees green, reports done. |
The single most important idea in how AI agents work is the loop. A plain language model is a one-shot system: you send a prompt, it returns text, and it stops. An agent wraps that model in a control loop that runs many times for a single task. Each pass through the loop does four things — gather context, reason about what to do next, take one concrete action through a tool, then read the result — and the result of one pass becomes the input to the next. The loop keeps turning until the goal is reached, the agent decides it cannot proceed, or a guardrail (a step limit, a budget, or a human checkpoint) stops it.
This looping is exactly why an agent can do things a chatbot cannot. Because it observes the outcome of each action before choosing the next one, it can recover from mistakes: run a command, see it error, and try a different approach without being told. The table above traces a single real pattern — a coding agent fixing a failing test — but the same six steps describe a research agent gathering and synthesising sources, a support agent looking up an order and issuing a refund, or a browser agent filling a multi-page form. The domain changes; the loop does not.
Across the 38 verified agents in our index, this loop is the common machinery underneath very different products. What varies is how many times the loop runs (a simple lookup might finish in one or two passes; a complex build can take dozens), how much autonomy the agent has before it must check in, and which tools it is allowed to call. Understanding the loop is the fastest way to predict what any agent can and cannot do.
At the centre of nearly every modern agent is a large language model — Claude, GPT, Gemini, or an open model — acting as the reasoning core. The model is not the agent; it is the part that decides. On each pass of the loop, the agent hands the model the goal, the relevant context, and a list of the tools it is allowed to use. The model responds either with a final answer or with a structured request to call a specific tool with specific arguments. That mechanism — the model emitting a machine-readable action rather than just prose — is called function calling or tool use, and it is what lets a text model reach out and change the world.
The decision-making itself usually follows a reason-then-act pattern often called ReAct (short for Reasoning + Acting, from a widely cited 2022 research paper). Before each tool call, the model produces a short chain of reasoning — "the error points to the formatter, so I should open that file" — and only then chooses the action. Interleaving private reasoning with public actions measurably improves reliability, because the model commits to a rationale before acting and can notice when an observation contradicts its plan. More capable agents add an explicit planning stage up front, drafting a multi-step plan and then executing it step by step, revising the plan when reality diverges from it.
Crucially, the agent does not 'think independently' in any human sense. Every decision is the model predicting the most useful next action given its training and the context in front of it. It has no goals of its own beyond the objective you gave it, and its judgement is only as good as the information it gathered and the tools it was given. This is why context quality and tool design matter more than raw model intelligence for whether an agent succeeds.
Concrete example makes the loop tangible. Give a coding agent like Claude Code or Devin the goal "the checkout test is failing — make it pass." Pass one: it reads the failing test and the error trace (perceive), reasons that the price formatter is returning null on an empty cart (reason), edits the formatter (act), and runs the full suite (act). Pass two: the suite still fails on a different assertion (observe), so it reasons about the second case, patches it, and re-runs. Pass three: green. It reports what it changed and stops. No step was scripted in advance; each was chosen from the result of the last.
A research agent such as GPT Researcher runs the same loop with different tools. Goal: "summarise the competitive landscape for X." It plans a set of sub-questions, then loops: search the web (act), read the top results (perceive), decide whether it has enough to answer each sub-question (reason), and search again where it is thin. Once coverage is sufficient, it synthesises a cited report. A browser agent like Browser Use loops over clicks and form fields; an autonomous generalist like Manus loops over a mix of code, browsing, and file tools. Same engine, different instruments.
The pattern also explains the visible 'working…' behaviour you see in agentic products: the pauses are the loop running tool calls and waiting for results — a test executing, a page loading, an API responding — not the model 'thinking' for minutes. Each of those round trips is one turn of the cycle in the table above.
A language model is stateless: on its own it remembers nothing between calls. For an agent to make progress across many loop passes, it needs memory. Short-term, that memory is the context window — the running transcript of the goal, every action taken, and every result observed, all fed back into the model on each pass so it knows what it has already tried. Longer-running agents add external memory: a scratchpad, a vector store of past findings, or a database, so they can carry knowledge across sessions without stuffing everything into one prompt.
This is also the honest answer to 'can agents learn from experience?' Within a single task, yes — the agent adapts continuously, because each observation updates the context it reasons over. Across tasks, most production agents do not retrain themselves; the underlying model's weights are fixed, and any lasting improvement comes from saved memory, better instructions, or a future model update — not from the agent rewriting itself on the fly.
Taking many small steps instead of answering in one shot is a feature, not inefficiency. Decomposing a goal, acting, and checking the result is what lets an agent handle problems too large or too uncertain to solve blind. The trade-off is cost: every loop pass is at least one model call, and a long task can mean dozens of calls plus the growing context replayed each time — which is why complex agent runs consume far more tokens than a single chatbot reply, and why teams cap steps and prune context to control spend.
Knowing the mechanism also tells you how agents fail. The most common failure is looping without progress: the agent repeats a doomed action, or oscillates between two fixes, never converging. A second is the hallucinated or malformed tool call — the model invents an argument or calls a tool that does not exist, and the action does nothing useful. A third is context overflow: on a long task the transcript grows until the most relevant detail is buried or pushed out of the window, and the agent 'forgets' a constraint it was told early on. A fourth is acting on bad perception — drawing a confident conclusion from a misread file or an out-of-date search result.
Well-built agents add guardrails around the loop to contain these. Step and time limits stop runaway loops. Schema validation rejects malformed tool calls before they execute. Context management — summarising old turns, retrieving only what is relevant — keeps the working memory focused. Human-in-the-loop checkpoints pause the agent before high-stakes, hard-to-reverse actions such as spending money, deleting data, or sending external messages, exactly as Robinhood's agentic trading kill switch and Claude Code's permission prompts do.
The takeaway: an AI agent is not magic, it is a disciplined loop of reason-act-observe wrapped around a language model, with tools for hands and memory for state. Its power comes from iterating toward a goal; its risks come from the same loop running unchecked. Designing the tools, the context, and the guardrails well is most of what separates an agent that reliably finishes the job from one that spins.
Real, verified agents from our index that illustrate the concept above.
Terminal-native autonomous coding agent from Anthropic
Autonomous open-source agent producing cited research reports
General AI agent that plans and executes whole tasks in the cloud
Open-source framework that lets any LLM operate a browser
An AI agent's large language model core decides each step. On every loop pass it receives the goal, current context, and available tools, then reasons briefly and outputs the next action — usually a tool call. It is predicting the most useful next move from context, not thinking independently.
The agent loop is the repeating cycle that drives every AI agent: gather context, reason about the next step, act through a tool, observe the result, then repeat. One pass feeds the next, so the agent self-corrects across many steps until the goal is met or a limit stops it.
Tool use, also called function calling. The language model outputs a structured request — which tool to call and with what arguments — instead of plain text. A surrounding program executes that call (editing a file, running code, browsing, hitting an API) and feeds the result back into the loop.
Almost all modern AI agents use an LLM such as Claude, GPT, or Gemini as their reasoning core, because language models are flexible enough to plan and choose tools. The model alone is not the agent, though — the tools, memory, and control loop wrapped around it are what create agency.
On each loop pass the model judges whether the goal is met based on what it has observed — tests passing, all sub-questions answered, the form submitted. When it concludes the objective is satisfied it returns a final result instead of another tool call. Step or budget limits can also stop it early.
Within a single task, yes — each observation updates the context the agent reasons over, so it adapts as it goes. Across tasks, most agents do not retrain themselves; the model's weights stay fixed, and lasting improvement comes from saved memory, better instructions, or a new model version.
Because acting then checking the result lets an agent handle problems too large or uncertain to solve blind. Decomposing a goal into small reason-act-observe steps is what enables recovery from errors. The cost is more model calls and tokens, which is why long agent runs are far more expensive than one chatbot reply.
Common failures are looping without progress, hallucinated or malformed tool calls, context overflow where it forgets an early constraint, and acting on misread information. Good agents add guardrails — step limits, schema validation, context pruning, and human checkpoints before risky actions — to contain these.