Your agent is sending way too many tools to the LLM.
When your agent has 30 tools and sends all of them on every call, the LLM gets confused and you pay for tokens that don't help. Orqen fixes this with one line for OpenAI-compatible clients, or a small tool-format adapter for native Anthropic and Bedrock agents.
What's happening right now
Without Orqen
Your agent's message → LLM
Your agent's tools (all 51 of them) → LLM
→ 4,469 prompt tokens (1st call)
→ LLM sees irrelevant tools
→ Higher cost. More confusion.
With Orqen
Your agent's message → Orqen → LLM
Your agent's tools → Orqen prunes to 1 → LLM
→ 654 prompt tokens (1st call)
→ LLM sees only relevant tools
→ ~85% fewer tokens on that call. More accurate.
Real numbers from a live two-call agent (examples/bedrock_multi_tool_agent.py). May 2026. Model: Claude Haiku 4.5 via Bedrock. Orqen logs: 51 tools in → 1 out on round 1.
You're in the right place if
Your agent has 10+ tools and you send them all on every request
Your LLM costs are higher than you expected when you added tools
Your agent occasionally picks the wrong tool when given many options
You use OpenAI, Anthropic, Bedrock, Groq, or any OpenAI-compatible API
You have fewer than 5 tools (Orqen won't help much here)
You don't use tools at all (this is designed for tool-calling agents)
The integration
One line for OpenAI-compatible clients.
If your agent already speaks Chat Completions, your prompts and tools stay the same. Native Anthropic and Bedrock SDKs use different tool payloads, so the docs show the small mapping.
Works with every major provider and framework
Under the hood
How Orqen decides which tools matter
Your agent calls Orqen instead of the LLM directly
Orqen sits in front of your LLM provider. OpenAI-compatible agents send the same messages, tools, and model name; native Anthropic or Bedrock agents send the equivalent OpenAI-compatible shape.
Orqen reads the user's message and your tool descriptions
Using a local embedding model, Orqen compares the semantic meaning of the user's query against each tool's description. This takes under 20ms.
Only the relevant tools go to the LLM
Orqen forwards the request with a pruned tool list — often a handful instead of dozens; in one live Bedrock run with 51 tools, only 1 was forwarded after routing. The LLM sees what it needs. Nothing more.
It gets smarter over time
Orqen tracks which tools the LLM actually calls after routing. Over thousands of requests, it learns which tools your users genuinely use for different query types — and weighs them more heavily. The routing improves automatically without any configuration from you.
Real numbers
From a live test, not a benchmark
We ran the same agent — 51 tools, real weather via Open-Meteo — against Bedrock Claude Haiku directly, then through Orqen. Same model. Same question. Two upstream calls each; very different prompt-token totals.
9,235
Prompt tokens (direct)
both calls; all 51 tools each time
1,605
Prompt tokens (via Orqen)
both calls; down to 1 tool after routing (round 1)
7,630
Prompt tokens saved
83% fewer vs direct (same two calls)
What that means at scale
100 calls/day · GPT-4o
~$1.90/day · ~$695/year
1,000 calls/day · GPT-4o
~$19/day · ~$6,950/year
10,000 calls/day · GPT-4o
~$190/day · ~$69,500/year
Scale rows use GPT-4o input pricing ($2.50/1M) on ~7.6k prompt tokens saved per two-call run (the live Bedrock test above). Actual savings depend on your tools, models, and traffic.
Pricing
Only pay for what Orqen actually saves you
Start with 1M saved tokens and clear daily/weekly caps. Pro removes the saved-token cap, keeps rate limits visible, and bills only for tokens Orqen actually saves.
Free
$0
trial
For proving Orqen works safely before production traffic.
- 1M tokens saved included
- 75k saved-token daily cap
- 500k saved-token weekly cap
- Semantic tool routing
- Tool health dashboard
- Works with all providers
- Community support
Pro
$0.20
per 1M tokens saved
No hard saved-token cap
For production agents. Pay only for the value Orqen creates.
- Everything in Free
- No hard saved-token cap
- High existing rate limits
- Per-key token budgets
- Intelligent model routing
- Email support
- Transparent usage dashboard
Enterprise
Custom
volume pricing
Dedicated infrastructure and custom contracts.
- Everything in Pro
- Custom pricing
- Dedicated deployment
- SSO / SAML
- Dedicated success manager
Questions
Before you decide
What if Orqen goes down?
Your agents keep running. Orqen has a transparent fallback — if our service is unreachable, requests pass through directly to your LLM provider. We never block your agent.
Do I need to change my agent code?
For OpenAI-compatible clients, change the base_url and swap your LLM API key for an Orqen key. For native Anthropic or Bedrock SDKs, add a small request-shape adapter for messages, tools, and tool results.
What if Orqen picks the wrong tools?
We track recall@K per request — whether every tool the LLM actually used was in the pruned set. If Orqen misses a tool, it shows in your dashboard. The default K is 5–8 tools; you can raise it per API key if your agent needs more.
Do I need to store my OpenAI/Anthropic keys with Orqen?
You store them once in your dashboard. They're encrypted with AES-128 before being written to our database. Each request decrypts them in memory to forward to your provider.
What's the actual latency cost?
Under 20ms on warm cache. Embedding is done locally (no external API calls), and tool embeddings are cached permanently after first seen. The first request for a new tool set takes ~8ms for batch embedding; subsequent requests are typically <5ms.
Does it work with tool-calling frameworks like LangChain or LlamaIndex?
Yes. Orqen is OpenAI-compatible — if your framework calls the OpenAI API, it works through Orqen. LangChain, LlamaIndex, Haystack, and any custom agent all work without changes.
Your next agent request sends 30 tools to the LLM.
It only needs 5.
Sign up, connect your provider keys, route your agent through Orqen. Your free trial includes 1M saved tokens with sensible daily and weekly caps — no credit card required.
Get started free1M saved-token trial · No credit card · Upgrade when ready