LATEST INSIGHT — MAY 2026

From the team

Technical articles on LLM agent optimization, token cost reduction, and production AI.

Tokenmaxxing Is Dead: How to Cut Your LLM Bill 50–70%

Tokenmaxxing backfired — big bills, no ROI. LLM cost optimization cuts the tokens your agent never needed: reduce LLM costs 50–70% without changing your code.

Orqen Team30.05.2026//5 MIN READ

Read post

Guide

Agent Called a Tool You Didn't Send? Fix Recall Misses

Per-turn tool routing can drop a tool the model still needs. How recall@K catches misses, how session recovery responds, and what Orqen can — and cannot — fix.

21.05.2026//6 MIN READ

Guide

Turn 47 Hit the Context Window. Now What?

Long agent sessions outgrow the context window. Why naive truncation drops task context, and how Orqen uses fill-ratio gating, hot/warm/cold history, and summarization.

21.05.2026//5 MIN READ

Guide

Context Caching: The LLM Cost Lever Most Agents Skip

Provider prompt caching can cut repeated-context costs up to 90%. How it works, why agents skip it, and what Orqen does when your requests aren't caching yet.

21.05.2026//10 MIN READ

Guide

Why 'Now Update It' Breaks Tool Routing

Routing on the last message alone misroutes follow-ups like 'now update it.' How Orqen uses multi-turn context and session hints — without treating every turn like a fresh chat.

20.05.2026//4 MIN READ

Guide

The 50KB JSON Your Agent Sends Every Turn

Bulky tool results dominate agent token cost. How Orqen shrinks role:tool messages with query-aware extraction and structural fallbacks.

20.05.2026//4 MIN READ

Technical

Embeddings Aren't Enough for Agent Tool Selection

Cosine similarity misses indirect tool dependencies. Why a second routing pass helps on chained workflows — and why it must fail open when unavailable.

19.05.2026//4 MIN READ

Guide

Your Tool Descriptions Are the Bug

Vague tool schemas cause routing failures before the model runs. A routing quality checklist, Orqen's schema audit, x-orqen-examples, and the dashboard view — what you fix on your side.

19.05.2026//4 MIN READ

Guide

MCP Gave Your Agent 50 Tools — Now What?

Connecting MCP servers is easy; stopping your agent from sending every tool on every LLM call is not. Here's why tool sprawl happens, what it costs, and how to route each turn to a small relevant subset.

19.05.2026//7 MIN READ

Guide

Stop Hardcoding GPT-4o: Task-Aware Model Routing

One expensive model for every turn wastes money on lookups and underpowers hard tasks. Use orqen/auto and siblings to match model capability to each request.

18.05.2026//4 MIN READ

Technical

What Agent Optimization Should Log (No Prompts)

Debug savings, recall, and latency without storing prompts. Orqen logs structured metadata per request — counts, timings, plan decisions, and honest savings math.

18.05.2026//4 MIN READ

Product

Introducing Orqen: Cut Your LLM Bill Without Changing Your Code

Orqen sits between your agent and the LLM provider, removes the tokens the model doesn't need, and sends a smaller request — so you pay less on every call.

17.05.2026//9 MIN READ

RSS feed