agent

Paper: AI agents can spend unpredictably on coding tasks

Research-focused agent item on why token usage in coding agents varies dramatically and does not reliably map to accuracy.

Published 2026-04-28Source: arXiv

Why it matters

This is one of the most concrete sources because it moves the discussion from vibes to measured agent behavior on coding work.

Tokenmaxxing read

The paper supports a core site thesis: agent token burn can vary sharply, and more tokens do not automatically mean better results.

Source takeaway

A primary source for agent-burn pages, especially when explaining why budgets, traces, and evals matter for long-running coding agents.

Topic links

researchtopic coding-agentstopic token-consumptiontopic

Related projects

Tools that match this angle

#4In spirit

Agents

LangGraph

langchain-ai/langgraph

A framework for building resilient stateful agents with explicit graphs, persistence, human-in-the-loop flows, and controllable execution.

37K6.2KMIT

agentsstateworkflows

Project profile GitHub

#5Direct

Evaluation

promptfoo

promptfoo/promptfoo

A CLI and CI workflow for testing prompts, agents, and RAG systems across models, with evals and red-team style checks.

23.1K2.1KMIT

prompt-evalscirag

Project profile GitHub

#6In spirit

Evaluation

DSPy

stanfordnlp/dspy

A framework for programming and optimizing language-model pipelines rather than hand-tuning one prompt at a time.

36K3.1KMIT

optimizationprogrammingevals

Project profile GitHub

Related feed

More source-linked context

newsAT

news2026-06-16

Anthropic "pauses" token-based billing for its Claude Agent SDK

Anthropic paused its plan to move Claude Agent SDK power users onto metered API pricing, updating its billing page to put the rollout on hold while it reworks how heavy agent usage is charged on subscription plans.

tokenmaxxingcoding-agentsagents

Read note

newsA

news2026-06-09

Claude Fable 5 and Claude Mythos 5 - Anthropic

Anthropic shipped Claude Fable 5 (GA, with classifier safeguards) and Claude Mythos 5 (safeguards lifted, vetted partners only) on June 9 — $10 per million input tokens, $50 per million output, under half the Mythos Preview price.

agentscoding-agentspricing

Read note

newsYF

news2026-05-24

AI Cost Crisis Emerges as Claude Usage and Agentic Coding Bills Spiral

Yahoo Finance flags an emerging AI cost crunch: agentic coding and heavy Claude usage can spike bills fast, forcing leaders to rethink budgeting and ROI.

tokenmaxxingcoding-agentsagents

Read note