Skip to content
All posts
Guide//4 MIN READ

The 50KB JSON Your Agent Sends Every Turn

Bulky tool results dominate agent token cost. How Orqen shrinks role:tool messages with query-aware extraction and structural fallbacks.

O

Orqen Team

orqen.app

Your agent called an internal API. The tool returned 52KB of JSON — user profiles, audit logs, debug HTML, pagination metadata. That payload sits in role: tool messages and gets resent on every subsequent turn. By turn 20, you are paying to re-read the same 50KB eleven times.

Teams optimize tool schemas and forget tool results. Schemas are static; results are dynamic and often huge. One verbose API response can outweigh your entire tool catalog in token cost.

Tool results dominate cost

In a typical multi-turn agent session, input tokens come from:

Payload sectionTurn 1Turn 20 (cumulative resend)
Tool schemas (50 tools)~10K~10K × 20
One 50KB API result~12K~12K × 20
User + assistant prose~500~8K

Routing fixes the schema column (see MCP tool sprawl). Tool result compression fixes the middle column — often the larger problem in data-heavy agents.

Provider caching helps — if the prefix is stable. Cached tool results still occupy context window space and affect model quality. Compression reduces both cost and noise. See context caching.

What bloat looks like

Common bloat patterns in production tool results:

  • Over-fetching APIs — REST endpoints return full objects when the agent needed one field.
  • HTML error pages — scrapers return entire DOM trees instead of extracted text.
  • Pretty-printed JSON — whitespace alone can double size.
  • Debug fields_links, audit_log, raw_response the model never uses.
{
  "status": "ok",
  "data": {
    "users": [
      {
        "id": 8842,
        "email": "sarah@acme.com",
        "profile": { /* 200 lines of nested JSON */ },
        "audit_log": [ /* 50 entries */ ]
      }
    ],
    "meta": { "page": 1, "total": 1, "request_id": "req_abc123" }
  },
  "html_debug": "<html>...</html>"
}

The model only needed email and id. Everything else is token tax on every future turn until the session ends or history is truncated — which risks losing the ID you still need (see context window limits).

Tool result compression

Orqen runs tool result compression as an always-on fast path targeting only role: tool messages — user and assistant turns stay untouched at this stage.

# Tool result compression — always-on fast path for role:tool messages
#
# When query context is available:
#   Keep JSON fields relevant to the user's intent
#
# Fallback:
#   Minify JSON, strip empty keys, HTML→text, truncate oversized payloads

This runs early in the pipeline, before heavy conversation compression. Failures return the original content.

Query-aware field extraction

When routing context (or the last user query) and a warm embedder are available, Orqen parses JSON tool results and scores each top-level field against the user's intent. Irrelevant branches are dropped; nested objects are pruned recursively.

Example: user asked "What's Sarah's email?" — the compressor keeps email and id, drops audit_log, profile.preferences, and html_debug.

On large API/DB responses where most fields are irrelevant to the query, extraction can remove up to 60–90% of tool result characters. Actual savings depend on how much of the payload the current turn needs.

Structural fallback

Without query context or when the result is not JSON, Orqen falls back to structural compression:

  • HTML → text. Strip tags, scripts, and styles; keep visible text content.
  • JSON minify. Remove whitespace, empty arrays/objects, and null fields.
  • String truncation. Long string values clip at a configurable max length per field.
  • Hard cap. Total result size is capped after other passes.
{"status":"ok","data":{"email":"sarah@acme.com","audit_log":[/* 50 entries */]}}

When compression runs

Tool result compression is on by default. Query-aware field extraction adds embedder work only when routing context is already available from the pruning stage — no duplicate embedding pass. Cache-protected tool messages are skipped so provider prefix caches stay valid.

Saved bytes appear in dashboard compression metrics, separate from tool schema pruning savings.

Stack with tool routing

The highest-leverage agent stack addresses both sides of the payload:

  1. Route tools per turn — shrink schemas from 50 tools to 4.
  2. Compress tool results — shrink the 50KB JSON to 2KB.
  3. Tier history at high fill_ratio — summarize old turns without dropping IDs.

Typical workloads see 50–70% fewer prompt tokens from routing alone; tool result compression adds another large chunk on data-heavy agents. The two are independent savings — both show in the dashboard.

Measure tool result savings

If your agent calls APIs, databases, or scrapers that return large payloads:

  1. Sign up for Orqen and route one data-heavy workflow.
  2. Check Usage for compression_tokens_saved and techniques including tool result compression.
  3. Compare prompt_tokens on turn 10+ with and without Orqen on the same session.
Tagged:tool-resultscompressionllm-costagent-optimizationjson
O

Orqen Team

We build the optimization layer for tool-heavy LLM agents. Our goal is to make agent costs predictable as your tool set grows.

Try Orqen free

250K saved tokens per month. Free forever. Two-line integration.

See your savings in the dashboard within seconds of your first request.