Topic

Model Routing

Model-router docs, pricing signals, gateway projects, and cost-aware routing approaches for choosing the right model per task.

18 source-linked itemsOriginal annotations with outbound attribution

6 related projectsOpen-source tools that match the topic

Search intentSearchers want cheaper or smarter ways to route prompts across model providers without giving up too much quality.

Topic brief

What this page is watching

Searchers want cheaper or smarter ways to route prompts across model providers without giving up too much quality.

The tokenmaxxing connection

Routing turns tokenmaxxing from a spending contest into an allocation problem: which model is good enough for this exact step?

What belongs on this page

Pricing pages, context-window changes, gateway projects, public router usage, and practical notes on fallback and retry behavior.

Latest sources

Feed items for Model Routing

newsW

news2026-06-30

Meituan open-sources LongCat-2.0 — the 1.6T model that topped OpenRouter as Owl Alpha

WinBuzzer: Meituan opened LongCat-2.0, a 1.6-trillion-parameter MoE coding model (~48B active per token, 1M-token context) that surfaced atop OpenRouter as the unbranded alias Owl Alpha — MIT-licensed, with weights not yet posted.

tokenmaxxingmodel-routermodel-routing

Read note

newsU

news2026-06-29medium review

Why Token Optimization Is a Gift to the Hyperscalers

UncoverAlpha's Rihard Jarc argues the pivot from tokenmaxxing to token optimization — routing cheap work to cheaper models — won't shrink AI bills. It multiplies token volume, and the hyperscalers renting the compute collect either way.

tokenmaxxingmodel-routerai-spend

Read note

newsTD

news2026-06-29

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Coinbase CEO Brian Armstrong says five levers — cheaper model defaults (GLM 5.2, Kimi 2.7), task routing, caching, lean context, and spend visibility — cut the company’s AI bill roughly in half despite rising token volume.

tokenmaxxingcost-governancemodel-routing

Read note

newsA

news2026-06-09

Claude Fable 5 and Claude Mythos 5 - Anthropic

Anthropic shipped Claude Fable 5 (GA, with classifier safeguards) and Claude Mythos 5 (safeguards lifted, vetted partners only) on June 9 — $10 per million input tokens, $50 per million output, under half the Mythos Preview price.

agentscoding-agentspricing

Read note

newsBI

news2026-06-01medium review

Silicon Valley's AI token craze is facing a reality check

Business Insider says the gamified token-leaderboard era is yielding to efficiency-maxxing: Amazon told staff not to use AI for its own sake, Copilot moved to usage-based billing, and labs now compete on intelligence per dollar.

cost-governanceexplainermetrics

Read note

newsTN

news2026-05-27medium review

“Tokenmaxxing is real, expensive & it’s spreading”: AI budgets are exploding - The New Stack

AI accountability startup Lanai debuted Token Tuner, a beta that scores each employee's efficiency by matching token usage and model choice to task complexity — peers burned 10x the tokens for half the efficiency in one beta.

ai-spendcost-governanceexplainer

Read note

newsE

news2026-05-27

OpenRouter scale becomes a business-model story

36Kr's OpenRouter coverage is valuable as a business-model read on how high-volume model routing can turn token flow into platform leverage.

tokenmaxxingmodel-routerpricing

Read note

agentNK

agent2026-05-26

OpenRouter funding puts router volume in the spotlight

The OpenRouter funding item is a clean router-market signal because it ties capital raised to reported weekly token volume and model access demand.

tokenmaxxingmodel-routerpricing

Read note

newsMV

news2026-05-26

OpenRouter Now Processes More Than a Quadrillion Tokens a Year | Menlo Ventures

Menlo Ventures argues OpenRouter is becoming a core multi-model routing layer, and highlights how routing, caching, and policy controls matter as token volumes surge.

tokenmaxxingmodel-routerpricing

Read note

newsSF

news2026-05-10

Hermes Agent leads OpenRouter as agent usage becomes a market signal – Startup Fortune

OpenRouter's public app/agent leaderboard briefly put Hermes Agent at #1, illustrating how token-based usage dashboards can steer attention in the agent boom.

tokenmaxxingmodel-routerpricing

Read note

long-formT

long-form2026-05-07

Tokenmaxxing as the new lines-of-code metric

Fresh AI infra angle on why token volume becomes dangerous when teams optimize for consumption instead of attributable outcomes.

cost-governancemodel-routingllm-infra

Read note

agentA

agent2026-05-06medium review

Anthropic raises Claude Code limits with new compute

Anthropic ties higher Claude Code and API limits to new compute capacity, making capacity itself part of the agent-product story.

coding-agentstoken-consumptionapi

Read note

newsAC

news2026-05-02

Introducing Augment Prism: model routing to reduce cost and maintain quality

Augment Code introduces Prism, a cache-aware model router for coding-agent sessions that chooses an underlying model per user turn to reduce token spend without materially degrading output quality (per Augment’s benchmarks).

tokenmaxxingcost-governancemodel-routing

Read note

agentAC

agent2026-05-02medium review

Augment Prism routes coding turns for cost and quality

Official Prism launch note on per-turn model routing for coding work, framed around cost control without forcing teams onto one model family.

model-routingcost-governancecoding-agents

Read note

agentHF

agent2026-05-01

Hugging Face Hub API for public model momentum

Public model metadata, download counts, likes, and tags can support an open-model momentum board.

open-modelsdownloadsapi

Read note

agentOD

agent2026-04-15

OpenRouter model catalog for pricing and context windows

The source behind the leaderboard: model IDs, pricing fields, context length, supported parameters, and update feeds.

model-routerpricingapi

Read note

newsSC

news2026-02-25

China’s MiniMax, Moonshot top AI token use ranking, ending year of US dominance

SCMP reports that OpenRouter's token-usage rankings show a surge in demand for Chinese open-source models, with MiniMax (M2.5) and Moonshot (Kimi K2.5) leading by token usage after a wave of recent releases.

tokenmaxxingmodel-routerpricing

Read note

newsIB

news2026-02-19

Bunq adopts Orq.ai router amid Europe AI sovereignty push - IT Brief UK

IT Brief UK reports bunq replaced in-house LLM routing with Orq.ai’s router, citing rising maintenance costs and gaps in observability, governance, and performance.

tokenmaxxingcost-governanceai-spend

Read note

Open source

Projects related to Model Routing

#1Direct

Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

53.2K9.6KSource-available

gatewaycost-trackingrouting

Project profile GitHub

#10Direct

Routing

Portkey Gateway

Portkey-AI/gateway

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

12.4K1.2KMIT

gatewayguardrailsrouting

Project profile GitHub

#2Direct

Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

30.9K3.2KSource-available

tracesevalscosts

Project profile GitHub

#11Direct

Observability

Helicone

Helicone/helicone

Open-source LLM observability for monitoring, evaluation, experimentation, latency, requests, and usage behavior.

5.9K625Apache-2.0

observabilityexperimentsusage

Project profile GitHub

#14Direct

Observability

OpenLLMetry

traceloop/openllmetry

Open-source observability for LLM and GenAI applications, built on OpenTelemetry conventions.

7.3K1KApache-2.0

opentelemetrytracingllmops

Project profile GitHub

#5Direct

Evaluation

promptfoo

promptfoo/promptfoo

A CLI and CI workflow for testing prompts, agents, and RAG systems across models, with evals and red-team style checks.

23.1K2.1KMIT

prompt-evalscirag

Project profile GitHub

Guides

Evergreen pages to read next

Searchers want a concrete model-routing approach for LLM cost control, not just a list of tools.

Model Routing LLM Cost Playbook

A practical playbook for routing prompts across models to control cost and latency while keeping accepted output quality stable.

Read guide

Searchers want OpenRouter token rankings, model costs, context windows, and caveats explained clearly.

OpenRouter Token Usage Rankings Explained

How to read OpenRouter public model rankings and pricing data without confusing router volume for global model usage.

Read guide

Searchers want a concrete measurement plan for AI token spend, not just a definition of tokenmaxxing.

How to Track AI Token Spend

A practical measurement plan for LLM token usage by model, workflow, user, agent, cost, and accepted output.

Read guide

Model Routing

What this page is watching

The tokenmaxxing connection

What belongs on this page

Feed items for Model Routing

Meituan open-sources LongCat-2.0 — the 1.6T model that topped OpenRouter as Owl Alpha

Why Token Optimization Is a Gift to the Hyperscalers

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Claude Fable 5 and Claude Mythos 5 - Anthropic

Silicon Valley's AI token craze is facing a reality check

“Tokenmaxxing is real, expensive & it’s spreading”: AI budgets are exploding - The New Stack

OpenRouter scale becomes a business-model story

OpenRouter funding puts router volume in the spotlight

OpenRouter Now Processes More Than a Quadrillion Tokens a Year | Menlo Ventures

Hermes Agent leads OpenRouter as agent usage becomes a market signal &#8211; Startup Fortune

Tokenmaxxing as the new lines-of-code metric

Anthropic raises Claude Code limits with new compute

Introducing Augment Prism: model routing to reduce cost and maintain quality

Augment Prism routes coding turns for cost and quality

Hugging Face Hub API for public model momentum

OpenRouter model catalog for pricing and context windows

China’s MiniMax, Moonshot top AI token use ranking, ending year of US dominance

Bunq adopts Orq.ai router amid Europe AI sovereignty push - IT Brief UK

Projects related to Model Routing

LiteLLM

Portkey Gateway

Langfuse

Helicone

OpenLLMetry

promptfoo

Evergreen pages to read next

Model Routing LLM Cost Playbook

OpenRouter Token Usage Rankings Explained

How to Track AI Token Spend

Hermes Agent leads OpenRouter as agent usage becomes a market signal – Startup Fortune