From the team
Technical articles on LLM agent optimization, token cost reduction, and production AI.
Tokenmaxxing Is Dead: How to Cut Your LLM Bill 50–70%
Tokenmaxxing backfired — big bills, no ROI. LLM cost optimization cuts the tokens your agent never needed: reduce LLM costs 50–70% without changing your code.
Agent Called a Tool You Didn't Send? Fix Recall Misses
Per-turn tool routing can drop a tool the model still needs. How recall@K catches misses, how session recovery responds, and what Orqen can — and cannot — fix.
Turn 47 Hit the Context Window. Now What?
Long agent sessions outgrow the context window. Why naive truncation drops task context, and how Orqen uses fill-ratio gating, hot/warm/cold history, and summarization.
Context Caching: The LLM Cost Lever Most Agents Skip
Provider prompt caching can cut repeated-context costs up to 90%. How it works, why agents skip it, and what Orqen does when your requests aren't caching yet.
Why 'Now Update It' Breaks Tool Routing
Routing on the last message alone misroutes follow-ups like 'now update it.' How Orqen uses multi-turn context and session hints — without treating every turn like a fresh chat.
The 50KB JSON Your Agent Sends Every Turn
Bulky tool results dominate agent token cost. How Orqen shrinks role:tool messages with query-aware extraction and structural fallbacks.
Embeddings Aren't Enough for Agent Tool Selection
Cosine similarity misses indirect tool dependencies. Why a second routing pass helps on chained workflows — and why it must fail open when unavailable.
Your Tool Descriptions Are the Bug
Vague tool schemas cause routing failures before the model runs. A routing quality checklist, Orqen's schema audit, x-orqen-examples, and the dashboard view — what you fix on your side.
MCP Gave Your Agent 50 Tools — Now What?
Connecting MCP servers is easy; stopping your agent from sending every tool on every LLM call is not. Here's why tool sprawl happens, what it costs, and how to route each turn to a small relevant subset.
Stop Hardcoding GPT-4o: Task-Aware Model Routing
One expensive model for every turn wastes money on lookups and underpowers hard tasks. Use orqen/auto and siblings to match model capability to each request.
What Agent Optimization Should Log (No Prompts)
Debug savings, recall, and latency without storing prompts. Orqen logs structured metadata per request — counts, timings, plan decisions, and honest savings math.
Introducing Orqen: Cut Your LLM Bill Without Changing Your Code
Orqen sits between your agent and the LLM provider, removes the tokens the model doesn't need, and sends a smaller request — so you pay less on every call.