Spend tracking fails when it starts at the invoice. The useful unit is the traced model call with enough metadata to explain who triggered it, why it ran, what it cost, and whether the result survived review.
Start with attribution
Every request should carry metadata that identifies the product surface, workflow, model, user or agent, prompt version, and environment. Without attribution, the only thing a cost dashboard can say is that money was spent somewhere.
- Minimum tags: workflow, owner, model, prompt version, environment.
- Useful extras: customer tier, feature flag, route, and task category.
Record the cost inputs
Track input tokens, output tokens, cached tokens where available, retries, tool calls, latency, and model price at the time of the request. Preserve the pricing source or snapshot date so future readers understand the calculation.
- Separate input and output tokens because pricing usually differs.
- Keep retry count and tool-call count visible.
Attach outcome state
Token data becomes operational when paired with whether the output was accepted, edited, rejected, or escalated. That one field separates cost accounting from productivity theater.
- Accepted output makes cost-per-task possible.
- Edited or rejected output exposes prompts and routes that need repair.
Build outlier views
The first useful dashboards are not elaborate executive scoreboards. They are outlier views: highest-cost workflows, sudden jumps, high retry rates, expensive agents, and low-acceptance prompts.
- Sort by total spend and by cost per accepted result.
- Review the trace before changing the model or prompt.

