Free trial: 1M tokens saved

Your agent is sending way too many tools to the LLM.

When your agent has 30 tools and sends all of them on every call, the LLM gets confused and you pay for tokens that don't help. Orqen fixes this with one line for OpenAI-compatible clients, or a small tool-format adapter for native Anthropic and Bedrock agents.

83% fewer prompt tokens·<20ms added latency·one base URL swap

What's happening right now

Without Orqen

Your agent's message → LLM

Your agent's tools (all 51 of them) → LLM

4,469 prompt tokens (1st call)

→ LLM sees irrelevant tools

→ Higher cost. More confusion.

With Orqen

Your agent's message → Orqen → LLM

Your agent's tools → Orqen prunes to 1 → LLM

654 prompt tokens (1st call)

→ LLM sees only relevant tools

→ ~85% fewer tokens on that call. More accurate.

Real numbers from a live two-call agent (examples/bedrock_multi_tool_agent.py). May 2026. Model: Claude Haiku 4.5 via Bedrock. Orqen logs: 51 tools in → 1 out on round 1.

You're in the right place if

Your agent has 10+ tools and you send them all on every request

Your LLM costs are higher than you expected when you added tools

Your agent occasionally picks the wrong tool when given many options

You use OpenAI, Anthropic, Bedrock, Groq, or any OpenAI-compatible API

You have fewer than 5 tools (Orqen won't help much here)

You don't use tools at all (this is designed for tool-calling agents)

The integration

One line for OpenAI-compatible clients.

If your agent already speaks Chat Completions, your prompts and tools stay the same. Native Anthropic and Bedrock SDKs use different tool payloads, so the docs show the small mapping.

Python
from openai import OpenAI# before
client = OpenAI(
api_key="sk-...",
base_url="https://api.openai.com/v1"
)
after
client = OpenAI(
api_key="sk-orq-..." # your Orqen key
base_url="https://api.orqen.app/v1"← this is the only change
)
response = client.chat.completions.create(# everything else is identical
model="gpt-4o", messages=[...], tools=[...]
)
Using native Anthropic Messages or Bedrock Converse tool calls? Keep your provider model, map the tool payload into the OpenAI-compatible shape, then route it through Orqen. See provider migration examples.

Works with every major provider and framework

OpenAIAnthropicAWS BedrockGoogle GeminiGroqMistralOpenRouterLangChainLlamaIndex

Under the hood

How Orqen decides which tools matter

01

Your agent calls Orqen instead of the LLM directly

Orqen sits in front of your LLM provider. OpenAI-compatible agents send the same messages, tools, and model name; native Anthropic or Bedrock agents send the equivalent OpenAI-compatible shape.

02

Orqen reads the user's message and your tool descriptions

Using a local embedding model, Orqen compares the semantic meaning of the user's query against each tool's description. This takes under 20ms.

03

Only the relevant tools go to the LLM

Orqen forwards the request with a pruned tool list — often a handful instead of dozens; in one live Bedrock run with 51 tools, only 1 was forwarded after routing. The LLM sees what it needs. Nothing more.

It gets smarter over time

Orqen tracks which tools the LLM actually calls after routing. Over thousands of requests, it learns which tools your users genuinely use for different query types — and weighs them more heavily. The routing improves automatically without any configuration from you.

Real numbers

From a live test, not a benchmark

We ran the same agent — 51 tools, real weather via Open-Meteo — against Bedrock Claude Haiku directly, then through Orqen. Same model. Same question. Two upstream calls each; very different prompt-token totals.

9,235

Prompt tokens (direct)

both calls; all 51 tools each time

1,605

Prompt tokens (via Orqen)

both calls; down to 1 tool after routing (round 1)

7,630

Prompt tokens saved

83% fewer vs direct (same two calls)

What that means at scale

100 calls/day · GPT-4o

~$1.90/day · ~$695/year

1,000 calls/day · GPT-4o

~$19/day · ~$6,950/year

10,000 calls/day · GPT-4o

~$190/day · ~$69,500/year

Scale rows use GPT-4o input pricing ($2.50/1M) on ~7.6k prompt tokens saved per two-call run (the live Bedrock test above). Actual savings depend on your tools, models, and traffic.

Pricing

Only pay for what Orqen actually saves you

Start with 1M saved tokens and clear daily/weekly caps. Pro removes the saved-token cap, keeps rate limits visible, and bills only for tokens Orqen actually saves.

Free

$0

trial

For proving Orqen works safely before production traffic.

  • 1M tokens saved included
  • 75k saved-token daily cap
  • 500k saved-token weekly cap
  • Semantic tool routing
  • Tool health dashboard
  • Works with all providers
  • Community support
Get started free
Most popular

Pro

$0.20

per 1M tokens saved

No hard saved-token cap

For production agents. Pay only for the value Orqen creates.

  • Everything in Free
  • No hard saved-token cap
  • High existing rate limits
  • Per-key token budgets
  • Intelligent model routing
  • Email support
  • Transparent usage dashboard
Start for free, upgrade when ready

Enterprise

Custom

volume pricing

Dedicated infrastructure and custom contracts.

  • Everything in Pro
  • Custom pricing
  • Dedicated deployment
  • SSO / SAML
  • Dedicated success manager
Talk to sales

Questions

Before you decide

What if Orqen goes down?

Your agents keep running. Orqen has a transparent fallback — if our service is unreachable, requests pass through directly to your LLM provider. We never block your agent.

Do I need to change my agent code?

For OpenAI-compatible clients, change the base_url and swap your LLM API key for an Orqen key. For native Anthropic or Bedrock SDKs, add a small request-shape adapter for messages, tools, and tool results.

What if Orqen picks the wrong tools?

We track recall@K per request — whether every tool the LLM actually used was in the pruned set. If Orqen misses a tool, it shows in your dashboard. The default K is 5–8 tools; you can raise it per API key if your agent needs more.

Do I need to store my OpenAI/Anthropic keys with Orqen?

You store them once in your dashboard. They're encrypted with AES-128 before being written to our database. Each request decrypts them in memory to forward to your provider.

What's the actual latency cost?

Under 20ms on warm cache. Embedding is done locally (no external API calls), and tool embeddings are cached permanently after first seen. The first request for a new tool set takes ~8ms for batch embedding; subsequent requests are typically <5ms.

Does it work with tool-calling frameworks like LangChain or LlamaIndex?

Yes. Orqen is OpenAI-compatible — if your framework calls the OpenAI API, it works through Orqen. LangChain, LlamaIndex, Haystack, and any custom agent all work without changes.

Your next agent request sends 30 tools to the LLM.
It only needs 5.

Sign up, connect your provider keys, route your agent through Orqen. Your free trial includes 1M saved tokens with sensible daily and weekly caps — no credit card required.

Get started free

1M saved-token trial · No credit card · Upgrade when ready