Open source

Top tokenmaxxing projects

Projects that help people tokenmax in spirit: route cheaper models, trace usage, retrieve tighter context, evaluate prompts, cache repeated calls, and make agents less chaotic.

15 projectsRanked by tokenmaxxing fit and public open-source signal
342.5K stars trackedGitHub metadata last checked May 11, 2026
8 direct fitsRouting, observability, token counting, caching, and evals
Shortlist

The top three are practical, not ceremonial.

#1Direct
Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

The most direct tokenmaxxing fit: route calls, track spend, enforce budgets, and stop pretending every prompt deserves the priciest model.
47.8K8.2KSource-available
gatewaycost-trackingrouting
#2Direct
Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

Turns token burn into something you can inspect: traces, costs, regressions, and evals instead of vibes and surprise invoices.
27.6K2.8KSource-available
tracesevalscosts
#3In spirit
Retrieval

LlamaIndex

run-llama/llama_index

A data and document-agent framework for connecting LLM apps to files, structured data, retrieval systems, and agent workflows.

Good retrieval is tokenmaxxing in disguise: send the model the useful context, not a suitcase full of maybe-relevant text.
49.6K7.4KMIT
ragagentscontext
Full board

Open-source ways to get more useful work per token

#1Direct
Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

The most direct tokenmaxxing fit: route calls, track spend, enforce budgets, and stop pretending every prompt deserves the priciest model.
47.8K8.2KSource-available
gatewaycost-trackingrouting
#2Direct
Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

Turns token burn into something you can inspect: traces, costs, regressions, and evals instead of vibes and surprise invoices.
27.6K2.8KSource-available
tracesevalscosts
#3In spirit
Retrieval

LlamaIndex

run-llama/llama_index

A data and document-agent framework for connecting LLM apps to files, structured data, retrieval systems, and agent workflows.

Good retrieval is tokenmaxxing in disguise: send the model the useful context, not a suitcase full of maybe-relevant text.
49.6K7.4KMIT
ragagentscontext
#4In spirit
Agents

LangGraph

langchain-ai/langgraph

A framework for building resilient stateful agents with explicit graphs, persistence, human-in-the-loop flows, and controllable execution.

Stateful graphs help keep agents from wandering through expensive loops. Fewer accidental tool calls, more deliberate context.
32.6K5.5KMIT
agentsstateworkflows
#5Direct
Evaluation

promptfoo

promptfoo/promptfoo

A CLI and CI workflow for testing prompts, agents, and RAG systems across models, with evals and red-team style checks.

A bad prompt can spend tokens forever and still be wrong. Evals let you find the cheap-enough prompt before production does.
21.5K1.9KMIT
prompt-evalscirag
#6In spirit
Evaluation

DSPy

stanfordnlp/dspy

A framework for programming and optimizing language-model pipelines rather than hand-tuning one prompt at a time.

Optimization beats prompt superstition: measure the task, tune the pipeline, and spend tokens where they actually move quality.
34.6K2.9KMIT
optimizationprogrammingevals
#7Direct
Tokenization

tiktoken

openai/tiktoken

A fast BPE tokenizer for OpenAI models, useful for counting and estimating token usage before requests go out.

You cannot manage what you do not count. Token counting is the basic meter that makes practical spend estimates possible.
18.3K1.5KMIT
token-countingbudgetingopenai
#8In spirit
Retrieval

Qdrant

qdrant/qdrant

A vector database and vector search engine for AI search, semantic retrieval, filtering, and hybrid-search applications.

Retrieval infrastructure helps swap bloated prompts for targeted context windows by sending the most relevant chunks first.
31.5K2.3KApache-2.0
vector-dbsearchrag
#9In spirit
Retrieval

Chroma

chroma-core/chroma

Search infrastructure for AI applications, commonly used as a retrieval layer for agents, RAG apps, and local prototypes.

A practical way to keep context nearby and queryable instead of force-feeding the model everything every turn.
28K2.3KApache-2.0
retrievalagentssearch
#10Direct
Routing

Portkey Gateway

Portkey-AI/gateway

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

Model routing plus guardrails is the grown-up version of tokenmaxxing: pick the right route, then keep the call inside policy.
11.8K1.1KMIT
gatewayguardrailsrouting
#11Direct
Observability

Helicone

Helicone/helicone

Open-source LLM observability for monitoring, evaluation, experimentation, latency, requests, and usage behavior.

A clean feedback loop for where tokens are going, which calls are slow, and which experiments are worth keeping.
5.7K584Apache-2.0
observabilityexperimentsusage
#12Direct
Caching

GPTCache

zilliztech/GPTCache

A semantic cache for LLM applications, with integrations for LangChain and LlamaIndex-style workflows.

The cheapest token is the one you do not send twice. Semantic caching is the unglamorous cost killer.
8K583MIT
semantic-cachecost-controllatency
#13In spirit
Structured output

Outlines

dottxt-ai/outlines

A structured-output toolkit for constraining generation with formats like JSON, regex, and grammars.

Structured outputs reduce repair prompts and retry loops. Fewer malformed responses means fewer wasted follow-up calls.
13.9K698Apache-2.0
jsonconstrained-generationretries
#14Direct
Observability

OpenLLMetry

traceloop/openllmetry

Open-source observability for LLM and GenAI applications, built on OpenTelemetry conventions.

Useful for teams that already live in telemetry and want token behavior next to the rest of production reality.
7.1K968Apache-2.0
opentelemetrytracingllmops
#15In spirit
Agents

Zep

getzep/zep

A memory layer and integration collection for AI agents and knowledge-graph-backed language-model applications.

Agent memory is tokenmaxxing when it recalls the right prior fact instead of replaying the whole conversation.
4.6K627Apache-2.0
memoryagentsknowledge-graph
How to read this

Not every project is literally about token counts.

The board favors projects that reduce waste, make spend visible, or improve context quality. That means routers and observability tools sit next to retrieval, memory, eval, caching, and structured-output projects.

Ranking rule

Editorial rank comes from tokenmaxxing relevance first, then GitHub activity and adoption. Stars are useful signal, but the page is not a raw popularity contest.