Your agent asked for update_invoice. The model tried to call it. But your router only forwarded search_invoices and list_customers. The tool call failed — or the model hallucinated a workaround. The user sends the same message again. Turn 48 begins.
This is a recall miss: the optimization layer removed a tool the model still needed for this turn. Per-turn routing saves tokens, but when recall drops below 1.0, agents loop, users retry, and the savings evaporate into wasted upstream calls.
What a recall miss looks like
Tool routing narrows a large catalog to a small subset each turn. Orqen might receive 51 MCP tools and forward 4 that match the current intent. That is usually correct — most turns need a handful of tools, not the full catalog.
A recall miss happens when the model's actual tool call is not in the forwarded set:
- HTTP 200 with a bad tool call. The model picks a tool name that was pruned. Some providers return a validation error; others let the model invent arguments for a tool it cannot execute.
- Wrong-tool workaround. The model uses a nearby tool (
search_invoicesinstead ofupdate_invoice) and produces a plausible but wrong answer. - User retry. The user re-sends "no, update it" — a signal that the previous turn failed even if no explicit error surfaced.
Recall misses are different from upstream errors. A 503 from your provider is not Orqen's routing fault. Session recovery classifies error types separately so widening only fires when pruning likely caused the failure.
Why agents loop after a miss
Without recovery, a recall miss poisons the next several turns:
- The model still "remembers" it wanted
update_invoicefrom context, but the tool is absent from the schema list. - The user clarifies ("now update it") — but if routing only reads the last message, the router may still miss the dependency on the prior
get_invoicecall. - Each retry resends a large history plus the full tool catalog upstream, burning tokens without progress.
The fix is not "never prune." It is measure recall, recover fast, and calibrate K so misses are rare and self-healing when they happen. See why multi-turn routing context matters for the follow-up misroute pattern.
Measuring routing with recall@K
Orqen computes recall@K on every tool-using response: the fraction of tools the model actually called that were present in the pruned set forwarded upstream.
# recall@K = |called_tools ∩ pruned_tools| / |called_tools|
#
# Example: model called ["get_weather", "format_report"]
# Orqen forwarded ["get_weather", "search_files", "format_report"]
# recall@K = 2/2 = 1.0 ✓
#
# Example: model called ["update_invoice"]
# Orqen forwarded ["search_invoices", "list_customers"]
# recall@K = 0/1 = 0.0 → recall miss| recall@K | Meaning | Typical action |
|---|---|---|
| 1.0 | All called tools were forwarded | Healthy — keep K |
| 0.5 | Half of called tools were missing | Recovery widens next turn |
| 0.0 | Every called tool was pruned | Strong recovery + dashboard alert |
| NULL | No tool calls this turn | Not scored |
Recall@K is stored per request in the dashboard alongside tools_in → tools_out and which tools were called. Aggregated over a session, it tells you whether your routing window (K) is too aggressive for that workflow.
Session recovery: widen, boost, retry
When Orqen detects a recall miss, it writes short-lived session signals. The next turn's optimization plan consumes them:
# After a recall miss, Orqen stores short-lived session signals:
# which tool was missed
# which tools were pruned at error time
#
# Next turn's plan may:
# widen the routing window
# boost previously removed tools
# run extra intent analysis after repeated misses
# keep more raw history while recovering- Wider routing window. Borderline tools re-enter the candidate set. Repeated misses escalate the widening.
- Tool boost. The missed tool and tools pruned at error time get score bonuses on the next turn.
- Extra intent analysis. After repeated pruning-related errors, Orqen may run enrichment to disambiguate vague follow-ups.
- Conservative compression. Aggressive context assembly pauses while the session recovers — you keep more raw history until routing stabilizes.
- Decay on success. After consecutive successful turns, recovery signals fade so token savings resume.
Recovery is automatic. No SDK changes required. Route through Orqen, run a real multi-tool session, and watch recall@K in Usage — misses should trigger visible widening on the next turn.
Fail-open when routing is uncertain
Orqen's default posture is fail-open on infrastructure:
- If session storage is unavailable, recovery returns empty signals — requests proceed normally, not blocked.
- If the embedder or reranker times out, Stage 1 scores still forward a pruned set; the request never hard-fails because routing hiccuped.
- Small tool sets pass through untouched — no risk of over-pruning a 3-tool agent.
- Free-tier passthrough (monthly savings cap hit) disables optimization but still forwards requests — agents keep running.
Fail-open means optimization is best-effort acceleration, not a single point of failure. It does not mean Orqen sends the full catalog on every turn — pruning still runs. Recovery widens the window after a measured miss, not preemptively.
Measuring routing quality over time
Recall@K measures what happened after the fact. Orqen also stores privacy-preserving routing metadata — candidate tool names, policy version, alternate variants — so you can compare policies offline without logging user prompts.
Pro customers can opt into shadow comparison calls that run after the response delivers (zero latency impact on the user). Those calls help calibrate routing — they are an internal quality signal, not something your agent depends on at runtime.
Together, live recall@K + session recovery + offline eval form a closed loop: measure misses, heal the session, tune descriptions and routing policy over time. For tool sprawl context, see MCP Gave Your Agent 50 Tools.
Check recall on your agent
If you route tools per turn and see user retries or mystery tool errors:
- Create a free Orqen account and point your SDK at
https://api.orqen.app. - Run a multi-tool session with your full MCP or function catalog.
- Open Usage and filter for requests with
recall_at_kbelow 1.0 — note which tools were called vs forwarded. - Retry the failing turn and confirm K widens (more
tools_outon the recovery turn).
Next step: Sign up free · Sessions docs · Introducing Orqen