Topic

Model Routing

Model-router docs, pricing signals, gateway projects, and cost-aware routing approaches for choosing the right model per task.

7 source-linked itemsOriginal annotations with outbound attribution
6 related projectsOpen-source tools that match the topic
Search intentSearchers want cheaper or smarter ways to route prompts across model providers without giving up too much quality.
Topic brief

What this page is watching

Searchers want cheaper or smarter ways to route prompts across model providers without giving up too much quality.

The tokenmaxxing connection

Routing turns tokenmaxxing from a spending contest into an allocation problem: which model is good enough for this exact step?

What belongs on this page

Pricing pages, context-window changes, gateway projects, public router usage, and practical notes on fallback and retry behavior.

Latest sources

Feed items for Model Routing

Startup Fortune source artwork
newsSF
news

Hermes Agent leads OpenRouter as agent usage becomes a market signal – Startup Fortune

OpenRouter's public app/agent leaderboard briefly put Hermes Agent at #1, illustrating how token-based usage dashboards can steer attention in the agent boom.

tokenmaxxingmodel-routerpricing
Read note
TrueFoundry tokenmaxxing article image
long-formT
long-form

Tokenmaxxing as the new lines-of-code metric

Fresh AI infra angle on why token volume becomes dangerous when teams optimize for consumption instead of attributable outcomes.

cost-governancemodel-routingllm-infra
Read note
Generated Tokenmaxxing editorial thumbnail for Anthropic raises Claude Code limits with new compute
agentA
agentmedium review

Anthropic raises Claude Code limits with new compute

Anthropic ties higher Claude Code and API limits to new compute capacity, making capacity itself part of the agent-product story.

coding-agentstoken-consumptionapi
Read note
Augment Code source artwork
newsAC
news

Introducing Augment Prism: model routing to reduce cost and maintain quality

Augment Code introduces Prism, a cache-aware model router for coding-agent sessions that chooses an underlying model per user turn to reduce token spend without materially degrading output quality (per Augment’s benchmarks).

tokenmaxxingcost-governancemodel-routing
Read note
Generated Tokenmaxxing editorial thumbnail for Augment Prism routes coding turns for cost and quality
agentAC
agentmedium review

Augment Prism routes coding turns for cost and quality

Official Prism launch note on per-turn model routing for coding work, framed around cost control without forcing teams onto one model family.

model-routingcost-governancecoding-agents
Read note
Hugging Face Hub documentation artwork
agentHF
agent

Hugging Face Hub API for public model momentum

Public model metadata, download counts, likes, and tags can support an open-model momentum board.

open-modelsdownloadsapi
Read note
OpenRouter model hub artwork
agentOD
agent

OpenRouter model catalog for pricing and context windows

The source behind the leaderboard: model IDs, pricing fields, context length, supported parameters, and update feeds.

model-routerpricingapi
Read note
Open source

Projects related to Model Routing

#1Direct
Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

47.8K8.2KSource-available
gatewaycost-trackingrouting
#10Direct
Routing

Portkey Gateway

Portkey-AI/gateway

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

11.8K1.1KMIT
gatewayguardrailsrouting
#2Direct
Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

27.6K2.8KSource-available
tracesevalscosts
#11Direct
Observability

Helicone

Helicone/helicone

Open-source LLM observability for monitoring, evaluation, experimentation, latency, requests, and usage behavior.

5.7K584Apache-2.0
observabilityexperimentsusage
#14Direct
Observability

OpenLLMetry

traceloop/openllmetry

Open-source observability for LLM and GenAI applications, built on OpenTelemetry conventions.

7.1K968Apache-2.0
opentelemetrytracingllmops
#5Direct
Evaluation

promptfoo

promptfoo/promptfoo

A CLI and CI workflow for testing prompts, agents, and RAG systems across models, with evals and red-team style checks.

21.5K1.9KMIT
prompt-evalscirag
Guides

Evergreen pages to read next

Searchers want a concrete model-routing approach for LLM cost control, not just a list of tools.

Model Routing LLM Cost Playbook

A practical playbook for routing prompts across models to control cost and latency while keeping accepted output quality stable.

Read guide
Searchers want OpenRouter token rankings, model costs, context windows, and caveats explained clearly.

OpenRouter Token Usage Rankings Explained

How to read OpenRouter public model rankings and pricing data without confusing router volume for global model usage.

Read guide
Searchers want a concrete measurement plan for AI token spend, not just a definition of tokenmaxxing.

How to Track AI Token Spend

A practical measurement plan for LLM token usage by model, workflow, user, agent, cost, and accepted output.

Read guide