Skip to content
All posts
Guide//4 MIN READ

Your Tool Descriptions Are the Bug

Vague tool schemas cause routing failures before the model runs. A routing quality checklist, Orqen's schema audit, x-orqen-examples, and the dashboard view — what you fix on your side.

O

Orqen Team

orqen.app

You tuned K, added a reranker, and widened recovery — but recall@K is still low on a handful of tools. Look at the descriptions: handle_data, process_request, manage_records. The router cannot distinguish them because the author never wrote distinguishable text.

Tool routing quality is bounded by schema quality. Embeddings, rerankers, and multi-turn context amplify whatever signal exists in the description — they cannot invent scope that was never documented.

Descriptions are routing input

Orqen's embedder and optional reranker read the same fields the model reads: name, description, parameter docs, and optional x-orqen-examples. Weak text produces weak scores across every stage — Orqen cannot rewrite your MCP servers for you.

This is especially painful with MCP tool sprawl: five servers written by five teams, overlapping verbs, zero coordination. Routing fails before the model gets a fair chance.

Fix schemas first. Widening K or disabling pruning treats the symptom. Better descriptions raise recall@K at the same K — fewer tokens, fewer misses.

Common failure patterns

Orqen's Routing Quality audit flags these automatically:

  • Missing description — essentially invisible to semantic routing.
  • Too short (<8 words) — "Query Postgres" with no scope.
  • Restates the function nameget_user: "Gets user."
  • Vague verbs without specifics — "handles", "processes", "manages" with no when-to-use clause.
  • Undocumented parameters — properties with no description; model and router both lack argument signal.
  • Interchangeable siblings — three "search" tools with identical scope language.
{
  "name": "handle_data",
  "description": "Handles data operations for the system."
}

Routing quality checklist

Before tuning router knobs, run each tool through this checklist:

  1. When, not just what. Include "Use this when…" and "Do not use for…" clauses. Negative scope disambiguates siblings.
  2. Minimum 15 words. Enough room for domain, trigger phrases, and return shape.
  3. Specific verbs. Replace "handle" with "query", "create", "send", "calculate" — match what the tool actually does.
  4. Parameter docs on every property. Especially enums and IDs the model must fill correctly.
  5. Disambiguate name collisions. Prefer search_workspace_files over search when multiple search tools exist.
  6. Add x-orqen-examples. Positive and negative example phrases — highest ROI metadata for embedding and reranking.
  7. Re-score after MCP updates. Orqen hashes schemas and re-analyses when servers change tools between sessions.

Routing Quality scores

Orqen scores every tool schema 0–100 using heuristics — no LLM calls, no added request latency. Analysis runs in the background when new schemas appear. Use the score to prioritize fixes; the checklist above is what actually moves recall@K.

# Routing Quality score bands (heuristic, no LLM):
#   80–100  Excellent
#   60–79   Good — minor improvements possible
#   40–59   Fair — meaningful gains available
#   0–39    Poor — likely hurting recall@K

Each flagged issue includes an actionable suggestion you can paste to the team that owns the MCP server or OpenAPI spec. Orqen surfaces the problem — your team ships the fix.

x-orqen-examples metadata

Optional x-orqen-examples on tool schemas (OpenAI function format or MCP inputSchema sibling) gives the embedder concrete phrases:

  • use_when — example user queries that should route here
  • not_when — example queries that should route elsewhere

Examples are concatenated into the embedding text and shown to the Stage 2 reranker. They are the fastest way to fix sibling confusion without renaming production APIs.

Routing Quality dashboard

The Routing Quality view in the Orqen dashboard lists tools sorted by score, with issue codes and suggestions. Use it to:

  • Prioritize fixes on tools with score < 40
  • Correlate low scores with low recall@K on specific workflows
  • Track improvement after schema updates (re-hash triggers re-analysis)

Pair dashboard scores with recall@K and two-stage routing — if recall is low but scores are high, widen K or check multi-turn context; if scores are low, fix descriptions first.

Audit your tool catalog

  1. Sign up for Orqen and send one request with your full tool list.
  2. Open Routing Quality — sort by lowest score.
  3. Fix the bottom five tools using the checklist above.
  4. Re-run the failing workflow and compare recall@K.
Tagged:tool-schemasroutingmcpagent-optimizationdocumentation
O

Orqen Team

We build the optimization layer for tool-heavy LLM agents. Our goal is to make agent costs predictable as your tool set grows.

Try Orqen free

250K saved tokens per month. Free forever. Two-line integration.

See your savings in the dashboard within seconds of your first request.