Skip to content
All posts
Technical//4 MIN READ

Embeddings Aren't Enough for Agent Tool Selection

Cosine similarity misses indirect tool dependencies. Why a second routing pass helps on chained workflows — and why it must fail open when unavailable.

O

Orqen Team

orqen.app

"Book a flight to Oslo." Embedding similarity finds search_flights and get_airports. It misses process_payment — because booking a flight inherently requires payment, but those phrases share almost no tokens and sit far apart in embedding space.

Single-stage embedding routers are fast and good enough for direct matches. They fail on indirect dependencies — tool chains where step two is implied but not stated. That is exactly where agent workflows live.

Where embeddings stop

Local embeddings compare two independent vectors: the routing query and each tool description. Cosine similarity is symmetric — it measures lexical and semantic overlap, not causal structure.

  • Direct match: "weather in Oslo" → get_weather
  • Sibling confusion: "search files" vs "search database" — similar embeddings, wrong choice
  • Indirect chain: "generate invoice" → needs lookup_customer first — not obvious from embeddings alone
  • Negative selection: "send email" should not pull in get_weather — embeddings cannot express exclusion well

Stage 1 alone works for narrow catalogs and first-turn queries. At 50+ tools with multi-step intents, you need a second pass that reads query and candidate together — cross-attention, not cosine distance.

Indirect tool dependencies

User intentStage 1 picksAlso needed
Book a flightsearch_flightsprocess_payment, select_seat
Generate invoicecreate_invoicelookup_customer, get_line_items
Deploy to prodrun_deploycheck_approval, run_tests

Multi-turn context helps Stage 1 (see multi-turn routing), but it cannot infer domain chains that never appear in the conversation text. Stage 2 closes that gap.

Stage 1: fast recall

Orqen runs semantic routing inside its own infrastructure — no extra API keys on your side for the default path. It embeds:

  • Multi-turn routing context (system domain, recent user messages, tools already called — not just the last message)
  • Each tool's description, parameter docs, and optional x-orqen-examples when present

Stage 1 narrows large catalogs to a short candidate list quickly. Tool schemas are cached by content hash — static MCP catalogs are not re-processed every turn.

Session hints and intent scoring adjust ranks before candidates pass to Stage 2. Orqen measures recall@K after the model responds — if routing misses too often, session recovery widens the window (see recall misses).

Stage 2: reranking pass

For large catalogs and multi-step intents, Orqen can run a second pass that reads the query and each Stage 1 candidate together — cross-attention, not cosine distance. It sees "book flight" and "process_payment: charges card for booking" in the same context window.

# Stage 1 — fast semantic recall
#   Embed routing context + tool descriptions
#   Narrow a large catalog to a short candidate list quickly
#   Strength: fast, cheap, good on direct matches
#   Weakness: "book flight" and "process_payment" sit far apart in embedding space
#
# Stage 2 — optional reranking pass
#   Reads query and candidate tools together
#   Better at indirect dependencies, negative selection, domain chains
#   Falls back to Stage 1 if unavailable or slow — request still succeeds

Stage 2 uses Orqen's internal infrastructure — not your provider key. It is invisible to the agent; only the final pruned tool list changes.

Latency tradeoff: a reranking pass adds overhead on top of the upstream LLM call. Orqen enables it only when tool count and session context justify the cost — not on every request.

Fail-open by design

Stage 2 is an enhancement, not a dependency. If anything fails — timeout, provider outage, missing internal capacity — the pipeline returns Stage 1 rankings unchanged. The request still succeeds; your agent never sees a routing error.

When reranking helps

Orqen enables the second pass when:

  • The catalog is large enough that Stage 1 alone leaves ambiguous chains
  • Your plan tier includes advanced routing features
  • The reranker is available and within latency budget

Dashboard traces show whether reranking ran and how long routing took. Better tool descriptions improve both stages — see the routing quality checklist.

See two-stage routing live

  1. Sign up for Orqen Pro with a 30+ tool catalog containing chained workflows.
  2. Send multi-step intents ("book flight for two passengers").
  3. Compare tools_out — payment and booking tools should co-appear when Stage 2 is active.
  4. Check recall@K on chained tasks vs single-tool queries.
Tagged:embeddingstool-callingroutingrerankingagent-optimization
O

Orqen Team

We build the optimization layer for tool-heavy LLM agents. Our goal is to make agent costs predictable as your tool set grows.

Try Orqen free

250K saved tokens per month. Free forever. Two-line integration.

See your savings in the dashboard within seconds of your first request.