Skip to content
Free forever · 250K saved tokens / month

Your LLM bill is inflated by tokens your agent doesn't need. Orqen cuts them out

Every LLM call your agent makes carries tools, schemas, and history the model doesn't need for that turn — and you pay for all of it. Orqen sits between your SDK and the provider: it prunes tools, compresses context, routes each call to the right model for the task, and activates provider caching automatically. Lower bill, better accuracy, zero code changes.

50–70%lower LLM bill53.9%less tool context~300msadded overhead2lines to integrateFail-openno broken agents

You're paying for tokens the model ignores

Without Orqen

Your agent's full context → LLM

Tools + history + schemas + images + tool results → LLM

→ Prompt grows each turn: stale history, repeated schemas, old results

→ The model sees context it does not need

→ Higher cost. More chances for the model to choose badly.

With Orqen

Your agent's full context → Orqen → LLM

Tools, history, schemas, images → cleaned and routed → LLM

53.9% less tool context · history compressed

→ Schemas deduped · images reduced · tool results trimmed

7,688,105 context tokens removed

→ $30.01 observed provider-bill reduction

Observed live Orqen traffic, May 2026: 437 calls, 379 successful. Recent Bedrock rows went 3315 tools.

This is probably for you if

Your LLM bills are growing and you want to spend less without rewriting your agent

You need to preserve critical IDs, URLs, and constraints while cutting cost

Anthropic SDK, OpenAI SDK, AWS Bedrock, Groq — keep the SDK you know

Your requests are already tiny and never include tools, long history, or multimodal content

You've already handled tool selection, context compression, and cache optimisation in-house — and your bill reflects it

You want to rewrite your agent framework instead of adding a proxy

Sound familiar?

Six reasons your agent costs more than it should

Most teams patch one or two of these. Orqen addresses all six on every request — one plan, before the call leaves your stack, without a single change to your agent code.

“My agent ships every tool on every call.”

Tool routing

Orqen reads what the turn is actually asking for and forwards only the tools it needs — about 53.9% fewer schemas per call, and it still hands the model the tool it would have chosen.

“The prompt grows every turn until it's huge.”

Context compression

Repeated schemas, verbose tool results, and stale images get compacted; older turns summarize in tiers while recent ones stay verbatim. Prompt size stays flat as the session runs long.

“I'm paying frontier prices for simple tasks.”

Cost-aware model routing

Set model to orqen/auto and Orqen classifies each request — data retrieval, code generation, analysis, and more — then routes simple operations to fast, cheap models and complex ones to capable models. Per-request, automatic, no rules to write.

“Provider caching never activates for my agent.”

Cache optimization

Orqen auto-injects Anthropic cache_control markers, stabilises the tool prefix across turns so the cache actually hits, and recognises recurring context across sessions so caching works from the very first turn — not just the second.

“I can't tell what I'm actually saving.”

Honest accounting

Every request shows what you paid versus what you'd have paid without Orqen — broken down by operation type, model, and provider. Cache savings are tracked separately and never double-counted.

“Every new session starts from scratch.”

Session intelligence

Orqen fingerprints your context across sessions. When a new session reuses the same system prompt and tools, it skips redundant processing, activates caching immediately, and protects stable content from re-compression.

Trusted by teams building agents

Design partnersEarly access teamsAgent builders in production

See your savings

ProDashboard · May 2026
Open your dashboard →
dash.orqen.app

Saved on your LLM bill this month

$30.01

53.9% less tool context per call

Requests

437

this month

Tools forwarded

33→15

avg per call

Tokens saved

7.7M

this month

Pro plan · unlimited optimization · $39 / monthActive

Before & after

Fewer tokens sent means a lower provider bill

Without Orqen

All 33 tools + history + schemas → LLM
Orqen

With Orqen

~15 tools + cleaned context → LLM

53.9% narrower tool context · ~300ms typical overhead

The integration

Switch the key and endpoint — keep your request shape.

Anthropic SDK, OpenAI SDK, and AWS Bedrock SDK can use their usual request shapes. Point them at Orqen with your Orqen key — your prompts, tools, model names, streaming, and multimodal inputs stay familiar.

Anthropic SDK
import anthropic

# Before
# client = anthropic.Anthropic(api_key="sk-ant-...")

# After — point the client at Orqen
client = anthropic.Anthropic(
    api_key="sk-orq-...",            # your Orqen key
    base_url="https://api.orqen.app",
)

# Your request body stays the same
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "..."}],
    tools=[...],  # Orqen optimizes the request before forwarding
)

Works with major providers and agent frameworks

OpenAIAnthropicAWS BedrockGoogle GeminixAI (Grok)PerplexityGroqMistralDeepSeekOpenRouterLangChainLlamaIndexMCP tools✓ dynamic

Under the hood

One plan per request — in about 300ms

Every call is intercepted, analyzed, and optimized before it reaches your provider — then forwarded with the answer-critical parts intact.

01 · Intercept

Your SDK calls Orqen

Orqen sits in front of your LLM provider. Anthropic, OpenAI, and Bedrock SDKs keep their request shapes — point them at Orqen with your key.

[INCOMING]
↳ tool_schemas33 schemas
↳ conversation18 turns
↳ est_tokens~142K
02 · Analyze

Isolate what this turn needs

The engine examines context, tools, schemas, and budget signals to decide what to keep, what to prune, and which model to route to — typically in ~300ms total.

Pruning tools
03 · Deliver

Smaller request, lower bill

The provider receives a smaller request — fewer tools, compressed history, deduped schemas — so you pay for fewer input tokens. Critical IDs and constraints are verified before forwarding.

Tools in
33
Forwarded
1553.9%
Continuous safety

It stays safe while it gets smarter

Orqen tracks which optimizations ran, which tools the LLM actually called, whether critical terms survived, and when recovery widened the next request. The system learns from live traffic without storing raw prompts.

Zero prompt storage
Critical ID verification
Provider cache preserved

Real numbers

Real savings from real traffic — not a synthetic benchmark

These numbers come from live agent traffic through Orqen in May 2026: 437 calls, 379 successful. The savings are measured by comparing what your provider actually bills — not estimates.

0.00M

Tokens removed from provider bills

7,688,105 observed this month

0%

Lower bill per call

live test — 9,2351,605 billed tokens

$0.00

Saved on the provider bill

based on what the provider actually billed

Recent request rows

Bedrock Opus turns

33 → 15 tools

Saved per recent call

15.0K–28.0K tokens

Observed latency

3.5–10.2s

Observed from recent us.anthropic.claude-opus-4-7 requests in the Orqen dashboard. Actual savings depend on your tools, model prices, traffic, and task mix.

Pricing

Free forever. Pro when you scale.

No trial. Free forever — 250K saved tokens per month, resets monthly. Free includes 75k saved tokens / day and 500k saved tokens / week caps. Pro is $39 / month — unlimited optimization, no token caps. Full comparison →

Free

$0

forever

Free forever — 250K saved tokens per month, resets monthly. When you hit the limit, your agents keep working — requests forward directly until the next monthly reset.

  • 250K saved tokens / month — resets monthly
  • Agents keep working at limit (no hard stop)
  • Tool and context optimization
  • Prompt reconstruction and validation traces
  • Routing Quality analysis
  • Session tracking
  • Community support
Start for free
Most popular

Pro

$39

per month

Unlimited optimization, no monthly cap. Cancel any time — if Orqen doesn't save you money, it takes 2 minutes.

  • Everything in Free
  • Unlimited optimization — no monthly cap
  • Advanced compression calibration
  • Advanced reranking and intent enrichment
  • Email support
Upgrade to Pro

Enterprise

Custom

volume pricing

Custom contracts, SLA, and dedicated support.

  • Everything in Pro
  • Custom pricing & volume discounts
  • SSO / SAML (coming soon)
  • Dedicated success manager
  • Custom data retention
Talk to sales

Estimate your savings

Savings calculator

Estimates tool routing, history compression, and deduplication savings on your input token bill.

Selected model

GPT-5.4

$2.50 in · $15.00 out/M

5,000 calls / month

20 tools 12 pruned (60%)

10 turns 2 compressed

Monthly savings breakdown

Tool routing · 9.0M input tokens$22.50
History compression · 2 turns × ~400 tok$6.00
Prompt dedup · ~60 tok/call$0.750

LLM bill reduction

11.7M input tokens × $2.5/M

$29.25/mo

Orqen Pro

$39.00

Flat monthly · cancel anytime

Net gain

<$0.001/mo

Start free — upgrade when savings exceed $39/mo

Input-token pricing from provider docs (May 2026). Tiered models use standard ≤200k context rates. Savings use input $/M only — output and reasoning tokens are not modeled. Assumes ~150 tokens/tool, K=8, ~400 tokens/turn, 60% history reduction beyond 8 turns.

Frequently asked questions

Plain questions, developer answers — browse the man page below.

orqen-faq(1)

Orqen-FAQ(1)

Orqen Manual

Orqen-FAQ(1)

Name

orqen-faq - frequently asked questions about cutting your LLM bill

Synopsis

orqen --help [topic]

$orqen --help "What does Orqen actually do?"

Output

Orqen cuts your LLM bill. Your agent sends tools, history, and schemas on every call — most of which the model doesn't need for that turn. Orqen sits between your SDK and the provider: it prunes irrelevant tools, compresses stale context, auto-routes each request to the right model for the task, and activates provider caching automatically. It also classifies each request by operation type (code generation, data retrieval, analysis, etc.) to show you where your tokens go. Update your API key and endpoint URL; your agent code stays the same.

$orqen --help "How does Orqen work with provider caching (Anthropic cache_control)?"

Output

Orqen makes caching work better and activates it automatically. For Anthropic and Bedrock, Orqen auto-injects cache_control markers on stable prefixes — you don't need to set them yourself. It stabilises the forwarded tool set across turns so the prefix cache actually hits instead of being invalidated by changing tools. It also fingerprints your context across sessions, so when a new session starts with the same system prompt and tools, caching activates on the very first turn instead of waiting for the second. The dashboard shows your cache hit rate and how much Orqen's injection saved.

$orqen --help "What if Orqen drops a tool the model needed?"

Output

Orqen snapshots the original request, reconstructs the outbound prompt, validates critical IDs, URLs, constraints, and tool schemas, and restores context if validation fails. For tools, recall@K shows whether every tool the LLM actually used was in the forwarded set. The system widens tool selection on retries and fail-open paths so production agents stay reliable.

$orqen --help "How does Orqen detect and recover from quality degradation?"

Output

Orqen monitors each request for signs that compression hurt quality: tool recall misses, HTTP errors, and correction signals in the user's next message (phrases like 'that's wrong', 'you missed', 'as I said'). If a single request shows severe degradation, or two or more bad-quality turns occur within a 10-minute window, the system immediately writes a more conservative compression setting that takes effect on the very next request. It auto-recovers to calibrated levels after roughly two hours. A weekly calibration pass separately sets the long-term aggressiveness baseline for each key based on observed traffic.

$orqen --help "What is payload optimization?"

Output

Payload optimization is Orqen's core job: shrink the full agent payload (tools, tool results, history, images, and schemas) while keeping what the model needs for this turn. Orqen snapshots the original request, reconstructs the outbound prompt, validates critical IDs, URLs, constraints, and tool schemas, and restores context if validation fails.

$orqen --help "What is smart routing?"

Output

Smart routing picks which tools to forward based on what the user is asking for right now — not just keyword matching. Orqen reads the current message and intent, scores your tools, and forwards a smaller relevant set (often 1–4 tools instead of dozens).

$orqen --help "What is model routing?"

Output

Set model="orqen/auto" and Orqen classifies each request by operation type — code generation, data retrieval, analysis, communication, and more — then picks the right model from your connected providers. Simple operations like file lookups route to fast, cheap models; complex tasks like code generation route to capable ones. You can also force the trade-off with orqen/cheap, orqen/fast, or orqen/capable. The toggle is per-customer in the dashboard.

$orqen --help "Do I need to change my agent code?"

Output

Update your API key and endpoint URL to point at Orqen. Anthropic SDK, OpenAI SDK, and AWS Bedrock SDK keep their usual request shapes.

$orqen --help "What's the actual latency cost?"

Output

Orqen adds roughly 300ms to a typical request — mostly network roundtrips and Redis lookups, not computation. Your LLM call itself takes 2–15 seconds, so the overhead is well under 5% of total round-trip time. The pipeline is fail-open: if processing exceeds budget it forwards the original payload unchanged.

$orqen --help "Does it work with tool-calling frameworks like LangChain or LlamaIndex?"

Output

Yes. If your framework uses the Anthropic SDK, OpenAI SDK, or AWS Bedrock SDK, it should work with the same key and endpoint change. LangChain, LlamaIndex, Haystack, and custom agents are good fits.

$orqen --help "Does it work with MCP (Model Context Protocol) tools?"

Output

Yes — and MCP actually makes Orqen more valuable, not less. MCP lets agents connect to many tool servers at once, which means payloads grow quickly. Orqen routes each request to the tools that turn needs, compresses verbose schemas after routing, and caches schema analysis so MCP's dynamic nature adds minimal overhead.

$orqen --help "Does Orqen summarize long conversations?"

Output

Yes. Orqen manages conversation history in three tiers: recent turns are kept verbatim (hot), older turns are compressed and deduplicated (warm), and for long sessions Orqen summarizes earlier chunks into compact summaries (cold). For conversations with 100+ turns, summaries are merged in a hierarchical pass so no single LLM call sees unbounded input and early context is never silently truncated. This keeps prompt sizes from growing unboundedly as agent sessions run longer — without storing raw prompts.

$orqen --help "Are my provider API keys safe?"

Output

You store them once in your dashboard. They're encrypted with AES-128 before being written to our database. Each request decrypts them in memory to forward to your provider.

$orqen --help "What happens when I hit the free monthly limit?"

Output

Your agents keep working. When you reach the 250K saved tokens / month limit for the month, requests still forward directly to your LLM provider — payload optimization pauses until the 1st of next month when the allowance resets. There is no trial clock and no hard cutoff — the free tier is free forever.

$orqen --help "What is passthrough mode?"

Output

When you've used your free saved-token allowance for the month, Orqen stops optimizing payloads but still forwards requests to your LLM provider normally. Your agent doesn't break — you just don't get savings until the monthly reset. Pro is $39/month for unlimited optimization with no monthly cap.

$orqen --help "What's included on the free plan?"

Output

Free forever — 250K saved tokens per month, resets monthly. You get payload optimization, model routing, session tracking, and Routing quality in the dashboard. Pro is $39/month — unlimited optimization, no token caps.

$orqen --help "What if Orqen goes down?"

Output

For production-critical agents, keep the same provider fallback you would use with any API dependency. Orqen preserves your native request shape, so falling back to your provider directly is straightforward if you need it.

Orqen v1.0

2026-05-14

Orqen-FAQ(1)

Your next agent request will cost more than it needs to.
Orqen fixes that before it hits the provider.

Sign up, connect your provider keys, route your agent through Orqen. Free forever — 250K saved tokens per month, resets monthly, no credit card required.

Get started free

250K saved tokens / month · Free forever · No credit card · Upgrade when ready