Multi-turn agent loops consume tokens at a fundamentally different scale than single-shot prompts. Here’s how production teams are adapting.
- Control lever — What it limits — When to use it
- Token budget per run — Total spend per task — High-stakes or long-horizon tasks
- Context compression — Per-step input size — Non-critical intermediate results
- Step limit / early exit — Runaway loops — Ambiguous or open-ended tasks
- Prompt caching — Repeated static prefix cost — Stable system prompts shared across runs
- Tiered model routing — Frontier model overuse — Mixed-complexity step pipelines
The Problem With Agentic Loops
When teams first adopt agentic AI — systems where a model plans, acts, observes, and re-plans in a loop — the token cost surprise comes quickly. A user query that costs a few hundred tokens in a chat interface can balloon into tens of thousands of tokens when the agent makes multiple tool calls, reads intermediate results, and reasons across several turns.
This isn’t a quirk of a particular model or framework. It’s structural: every agent step re-sends accumulated context. The agent that “reads” five documents and reasons about them together isn’t reading them once — it’s re-reading them on every subsequent step.
Why Single-Shot Cost Intuitions Break Down
Engineers used to REST-style API pricing often anchor on “input tokens per request.” But in a multi-step agent, each step’s input includes the entire prior context. A task with ten steps and an average 4,000-token context per step costs 40,000 input tokens — not 4,000.
Add tool-call outputs (search results, code execution logs, API responses) that get appended verbatim to context, and costs compound further. Teams that didn’t budget for this have seen production bills that were difficult to justify to stakeholders.
Practical Cost Control Patterns
1. Token Budgets Per Agent Run
Set a hard ceiling on total tokens an agent can consume in a single task. When a run approaches the budget, the agent either wraps up or escalates to a human. This requires instrumenting the token count at the orchestration layer, but it’s the most reliable safeguard.
2. Context Compression Between Steps
Instead of appending raw tool outputs, summarize them before adding to context. A 3,000-token search result that gets compressed to 400 tokens of relevant findings reduces downstream step costs by roughly 65%. The tradeoff is latency and the risk of summarization errors, so this works best for non-critical intermediate observations.
3. Step Limits and Early Exit
Cap the number of steps an agent can take before it must produce a final answer. This prevents runaway loops caused by ambiguous tasks or tool failures that keep the agent circling without progress.
4. Prompt Caching for Static Context
Many agent runs share a large, stable system prompt — instructions, tool definitions, background knowledge. Models that support prompt caching (like Claude’s cache_control parameter) can reuse these tokens across calls, significantly reducing per-step cost when the static prefix dominates context size.
5. Tiered Model Routing
Not every step requires the most capable model. Routine tool-call parsing or classification steps can be routed to smaller, cheaper models, reserving frontier model capacity for the reasoning-heavy steps that actually benefit from it.
Organizational Implications
Cost-aware agentic architectures require treating token usage as a first-class operational concern — tracked, budgeted, and reviewed. Teams that treat it as a pure engineering detail often find the real constraints surfaced by finance or product leadership rather than engineering planning.
The teams managing this well tend to share one habit: they measure agent token consumption in production from day one, before costs become a problem they’re reacting to.
Sources
- https://platform.claude.com/docs/en/build-with-claude/prompt-caching — Anthropic — Prompt caching docs: reusing stable prefixes across agent turns.
- https://platform.claude.com/docs/en/about-claude/pricing — Anthropic — Per-token pricing reference for cost modeling.
- https://www.langchain.com/langgraph — LangGraph — agent framework documentation showing why per-step budgets matter.