Open source

Top tokenmaxxing tools

Open-source tools that help people tokenmax in spirit: route cheaper models, trace usage, retrieve tighter context, evaluate prompts, cache repeated calls, and make agents less chaotic.

13 toolsRanked by tokenmaxxing fit and public open-source signal

351.6K stars trackedGitHub metadata last checked May 11, 2026

7 direct fitsRouting, observability, token counting, caching, and evals

Shortlist

The top three are practical, not ceremonial.

#1Direct

Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

The most direct tokenmaxxing fit: route calls, track spend, enforce budgets, and stop pretending every prompt deserves the priciest model.

53.2K9.6KSource-available

gatewaycost-trackingrouting

Project profile GitHub

#2Direct

Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

Turns token burn into something you can inspect: traces, costs, regressions, and evals instead of vibes and surprise invoices.

30.9K3.2KSource-available

tracesevalscosts

Project profile GitHub

#3In spirit

Retrieval

LlamaIndex

run-llama/llama_index

A data and document-agent framework for connecting LLM apps to files, structured data, retrieval systems, and agent workflows.

Good retrieval is tokenmaxxing in disguise: send the model the useful context, not a suitcase full of maybe-relevant text.

50.8K7.7KMIT

ragagentscontext

Project profile GitHub

Full board

Open-source ways to get more useful work per token

#1Direct

Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

The most direct tokenmaxxing fit: route calls, track spend, enforce budgets, and stop pretending every prompt deserves the priciest model.

53.2K9.6KSource-available

gatewaycost-trackingrouting

Project profile GitHub

#2Direct

Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

Turns token burn into something you can inspect: traces, costs, regressions, and evals instead of vibes and surprise invoices.

30.9K3.2KSource-available

tracesevalscosts

Project profile GitHub

#3In spirit

Retrieval

LlamaIndex

run-llama/llama_index

A data and document-agent framework for connecting LLM apps to files, structured data, retrieval systems, and agent workflows.

Good retrieval is tokenmaxxing in disguise: send the model the useful context, not a suitcase full of maybe-relevant text.

50.8K7.7KMIT

ragagentscontext

Project profile GitHub

#4In spirit

Agents

LangGraph

langchain-ai/langgraph

A framework for building resilient stateful agents with explicit graphs, persistence, human-in-the-loop flows, and controllable execution.

Stateful graphs help keep agents from wandering through expensive loops. Fewer accidental tool calls, more deliberate context.

37K6.2KMIT

agentsstateworkflows

Project profile GitHub

#5Direct

Evaluation

promptfoo

promptfoo/promptfoo

A CLI and CI workflow for testing prompts, agents, and RAG systems across models, with evals and red-team style checks.

A bad prompt can spend tokens forever and still be wrong. Evals let you find the cheap-enough prompt before production does.

23.1K2.1KMIT

prompt-evalscirag

Project profile GitHub

#6In spirit

Evaluation

DSPy

stanfordnlp/dspy

A framework for programming and optimizing language-model pipelines rather than hand-tuning one prompt at a time.

Optimization beats prompt superstition: measure the task, tune the pipeline, and spend tokens where they actually move quality.

36K3.1KMIT

optimizationprogrammingevals

Project profile GitHub

#7Direct

Tokenization

tiktoken

openai/tiktoken

A fast BPE tokenizer for OpenAI models, useful for counting and estimating token usage before requests go out.

You cannot manage what you do not count. Token counting is the basic meter that makes practical spend estimates possible.

18.7K1.5KMIT

token-countingbudgetingopenai

Project profile GitHub

#8In spirit

Retrieval

Qdrant

qdrant/qdrant

A vector database and vector search engine for AI search, semantic retrieval, filtering, and hybrid-search applications.

Retrieval infrastructure helps swap bloated prompts for targeted context windows by sending the most relevant chunks first.

33.1K2.5KApache-2.0

vector-dbsearchrag

Project profile GitHub

#9In spirit

Retrieval

Chroma

chroma-core/chroma

Search infrastructure for AI applications, commonly used as a retrieval layer for agents, RAG apps, and local prototypes.

A practical way to keep context nearby and queryable instead of force-feeding the model everything every turn.

28.8K2.4KApache-2.0

retrievalagentssearch

Project profile GitHub

#10Direct

Routing

Portkey Gateway

Portkey-AI/gateway

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

Model routing plus guardrails is the grown-up version of tokenmaxxing: pick the right route, then keep the call inside policy.

12.4K1.2KMIT

gatewayguardrailsrouting

Project profile GitHub

#11Direct

Observability

Helicone

Helicone/helicone

Open-source LLM observability for monitoring, evaluation, experimentation, latency, requests, and usage behavior.

A clean feedback loop for where tokens are going, which calls are slow, and which experiments are worth keeping.

5.9K625Apache-2.0

observabilityexperimentsusage

Project profile GitHub

#13In spirit

Structured output

Outlines

dottxt-ai/outlines

A structured-output toolkit for constraining generation with formats like JSON, regex, and grammars.

Structured outputs reduce repair prompts and retry loops. Fewer malformed responses means fewer wasted follow-up calls.

14.4K765Apache-2.0

jsonconstrained-generationretries

Project profile GitHub

#14Direct

Observability

OpenLLMetry

traceloop/openllmetry

Open-source observability for LLM and GenAI applications, built on OpenTelemetry conventions.

Useful for teams that already live in telemetry and want token behavior next to the rest of production reality.

7.3K1KApache-2.0

opentelemetrytracingllmops

Project profile GitHub

How to read this

Not every project is literally about token counts.

The board favors projects that reduce waste, make spend visible, or improve context quality. That means routers and observability tools sit next to retrieval, memory, eval, caching, and structured-output tools.

Ranking rule

Editorial rank comes from tokenmaxxing relevance first, then GitHub activity and adoption. Stars are useful signal, but the page is not a raw popularity contest.