Embeddings Aren't Enough for Agent Tool Selection

"Book a flight to Oslo." Embedding similarity finds search_flights and get_airports. It misses process_payment — because booking a flight inherently requires payment, but those phrases share almost no tokens and sit far apart in embedding space.

Single-stage embedding routers are fast and good enough for direct matches. They fail on indirect dependencies — tool chains where step two is implied but not stated. That is exactly where agent workflows live.

Where embeddings stop

Local embeddings compare two independent vectors: the routing query and each tool description. Cosine similarity is symmetric — it measures lexical and semantic overlap, not causal structure.

Direct match: "weather in Oslo" → get_weather ✓
Sibling confusion: "search files" vs "search database" — similar embeddings, wrong choice
Indirect chain: "generate invoice" → needs lookup_customer first — not obvious from embeddings alone
Negative selection: "send email" should not pull in get_weather — embeddings cannot express exclusion well

Stage 1 alone works for narrow catalogs and first-turn queries. At 50+ tools with multi-step intents, you need a second pass that reads query and candidate together — cross-attention, not cosine distance.

Indirect tool dependencies

User intent	Stage 1 picks	Also needed
Book a flight	search_flights	process_payment, select_seat
Generate invoice	create_invoice	lookup_customer, get_line_items
Deploy to prod	run_deploy	check_approval, run_tests

Multi-turn context helps Stage 1 (see multi-turn routing), but it cannot infer domain chains that never appear in the conversation text. Stage 2 closes that gap.

Stage 1: fast recall

Orqen runs semantic routing inside its own infrastructure — no extra API keys on your side for the default path. It embeds:

Multi-turn routing context (system domain, recent user messages, tools already called — not just the last message)
Each tool's description, parameter docs, and optional x-orqen-examples when present

Stage 1 narrows large catalogs to a short candidate list quickly. Tool schemas are cached by content hash — static MCP catalogs are not re-processed every turn.

Session hints and intent scoring adjust ranks before candidates pass to Stage 2. Orqen measures recall@K after the model responds — if routing misses too often, session recovery widens the window (see recall misses).

Stage 2: reranking pass

For large catalogs and multi-step intents, Orqen can run a second pass that reads the query and each Stage 1 candidate together — cross-attention, not cosine distance. It sees "book flight" and "process_payment: charges card for booking" in the same context window.

# Stage 1 — fast semantic recall
#   Embed routing context + tool descriptions
#   Narrow a large catalog to a short candidate list quickly
#   Strength: fast, cheap, good on direct matches
#   Weakness: "book flight" and "process_payment" sit far apart in embedding space
#
# Stage 2 — optional reranking pass
#   Reads query and candidate tools together
#   Better at indirect dependencies, negative selection, domain chains
#   Falls back to Stage 1 if unavailable or slow — request still succeeds

Stage 2 uses Orqen's internal infrastructure — not your provider key. It is invisible to the agent; only the final pruned tool list changes.

Latency tradeoff: a reranking pass adds overhead on top of the upstream LLM call. Orqen enables it only when tool count and session context justify the cost — not on every request.

Fail-open by design

Stage 2 is an enhancement, not a dependency. If anything fails — timeout, provider outage, missing internal capacity — the pipeline returns Stage 1 rankings unchanged. The request still succeeds; your agent never sees a routing error.

When reranking helps

Orqen enables the second pass when:

The catalog is large enough that Stage 1 alone leaves ambiguous chains
Your plan tier includes advanced routing features
The reranker is available and within latency budget

Dashboard traces show whether reranking ran and how long routing took. Better tool descriptions improve both stages — see the routing quality checklist.

See two-stage routing live

Sign up for Orqen Pro with a 30+ tool catalog containing chained workflows.
Send multi-step intents ("book flight for two passengers").
Compare tools_out — payment and booking tools should co-appear when Stage 2 is active.
Check recall@K on chained tasks vs single-tool queries.

Next step: Sign up free · MCP tool sprawl · Introducing Orqen

Embeddings Aren't Enough for Agent Tool Selection

Where embeddings stop

Indirect tool dependencies

Stage 1: fast recall

Stage 2: reranking pass

Fail-open by design

When reranking helps

See two-stage routing live

The 50KB JSON Your Agent Sends Every Turn

See Orqen optimize your agent payloads