Skip to content
All posts
Guide//7 MIN READ

MCP Gave Your Agent 50 Tools — Now What?

Connecting MCP servers is easy; stopping your agent from sending every tool on every LLM call is not. Here's why tool sprawl happens, what it costs, and how to route each turn to a small relevant subset.

O

Orqen Team

orqen.app

You wired up Model Context Protocol servers — filesystem, browser, database, Slack, GitHub, internal APIs. Each one worked on its own. Then you looked at the tool list your agent sends to the model: fifty tools, every single turn.

That's not a misconfiguration. It's what MCP is designed to do: compose capability by adding servers. The hard part is what happens after — when your LLM provider bills you for processing tool schemas the current message will never use.

How MCP creates tool sprawl

MCP treats each server as a plug-in. A filesystem server might expose a dozen tools (read_file, write_file, list_directory, …). A browser server adds navigation and extraction tools. A database server adds query and schema tools. Connect five servers and you can easily land at 40–80 tools without writing custom integration code.

Most agent frameworks merge those tools into one array and pass them with every chat completion or messages.create call. The model doesn't get "filesystem tools this turn, browser tools next turn." It gets the union of everything — static tools you defined plus every tool every MCP server advertised at connect time.

MCP didn't break your agent. It succeeded at making capabilities composable. The bottleneck moved downstream: the LLM context window and your per-request input token bill.

What the model actually sees

From the provider's perspective, MCP tools are ordinary function definitions. Each one carries a name, description, and JSON schema — typically 150–400 tokens depending on how verbose the server author was.

With 50 tools at ~200 tokens each, you're adding roughly 10,000 input tokens to every request before the user's message, conversation history, or tool results are counted.

SetupTools per requestSchema tokens (approx.)
Single MCP server8–151.5K–3K
3–4 common servers30–506K–10K
Large internal catalog80+15K+

On a busy agent doing thousands of turns a day, most of those tokens are noise: the user asked to summarize a PDF, but the model still ingested your Stripe, Postgres, and calendar tool definitions.

Why 50 tools on every turn hurts

Two problems compound:

  • Cost. Input tokens are priced per million. Ten thousand schema tokens per turn adds up fast — especially on capable models — and it scales linearly with MCP adoption ("let's add one more server").
  • Tool selection quality. Models pick worse tools when the option set is large and many descriptions overlap ("search", "query", "fetch", "get"). MCP servers written by different teams rarely coordinate naming or scope, which makes confusion more likely, not less.

In production workloads with large tool sets, teams often see 50–70% fewer prompt tokens when only a small relevant subset is forwarded per turn — more when the user message is narrow, less when they genuinely need a wide surface. The pattern is consistent: most turns need a handful of tools, not the full catalog.

What doesn't scale

Teams usually try one of these first:

  • Disconnect servers you "don't need right now." Works until the next feature request needs them back — then you're playing whack-a-mole.
  • Manually partition tools per workflow. Fine for two fixed products, brittle for a general assistant that should use whatever MCP exposes.
  • Hope the framework filters tools. Few hosts do semantic filtering by default; most forward the full merged list.

What you actually want is per-turn routing: keep all MCP servers connected for capability, but only send the 3–8 tools that match the current user message and recent context.

Orqen + MCP: same agent, cleaner payload per turn

Orqen sits between your agent and your LLM provider. It doesn't replace MCP, your host, or your framework. It reads each outbound request, scores tools for relevance, and forwards a pruned list. The model's response — including tool calls — is unchanged.

MCP tools are first-class: they're the same JSON schemas your agent already sends. Orqen doesn't require MCP-specific configuration.

  • Cache on first sight. When a new MCP tool schema appears, Orqen analyzes it once and reuses that signal on later requests — you're not paying embedding latency on every turn for the same static catalog.
  • Detect schema changes. MCP servers can update tools between sessions. Orqen keys schemas by content hash and re-processes only what changed.
  • Surface weak descriptions. Vague MCP tool docs ("handles data") hurt routing. The Routing Quality view in the dashboard flags tools that look interchangeable so you can tighten descriptions at the source.

Integration is the same two-line change as any other agent: point your SDK at Orqen and use your Orqen API key. Keep passing the full MCP tool array — pruning happens on the proxy.

Anthropic SDK
import anthropic

client = anthropic.Anthropic(
    api_key="sk-orq-YOUR_KEY",
    base_url="https://api.orqen.app",
)

# Your agent still lists every MCP tool on every request —
# Orqen forwards only the subset that matches this turn.
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=messages,
    tools=all_mcp_tools,  # 50+ tools from connected servers
)

Your MCP host still discovers servers, negotiates capabilities, and runs tool handlers. Orqen only optimizes what crosses the wire to the model. See the quickstart for provider keys and verification steps.

Dashboard check: after a few routed requests, open Usage and compare tools in → tools out per call. A 50-tool MCP stack often shows single-digit tools forwarded on focused queries.

Write MCP tools that route well

Routing quality is only as good as the schemas you give it. MCP makes it easy to ship tools fast; spending five extra minutes on descriptions pays back immediately.

  • Say when to use the tool, not just what it does. "Query Postgres" is weak. "Run read-only SQL against the analytics warehouse when the user asks for metrics, cohorts, or revenue — not for file or web tasks" is strong.
  • Disambiguate siblings. If two servers expose search, differentiate scope in the name or description so the router (and the model) can tell them apart.
  • Keep parameter docs specific. Empty or generic property descriptions inflate schemas without helping selection.
{
  "name": "search_workspace_files",
  "description": "Search files in the connected workspace. Use when the user asks to find, list, or open project files — not for web search or database queries.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": { "type": "string", "description": "Filename or path fragment" }
    },
    "required": ["query"]
  }
}

Orqen also applies schema compression on large tool definitions — trimming redundant JSON without changing behavior — which helps when MCP servers ship verbose OpenAPI-style payloads.

Try it on your MCP stack

If you already have MCP connected and your tool count is in double digits, you can test routing in one session:

  1. Create a free Orqen account and add your LLM provider key.
  2. Point your SDK at https://api.orqen.app (Anthropic / Bedrock) or https://api.orqen.app/v1 (OpenAI).
  3. Send a real agent turn with your full MCP tool list — no changes to how you discover or invoke MCP tools.
  4. Check the dashboard for payload savings, tool-context reduction, and token savings.

Free includes 250K saved tokens per month; when you hit the cap, requests still forward so your agent keeps running. Pro removes the monthly savings cap and adds prompt compression for long loops.

Already using Orqen without MCP? This post is the same idea in reverse — MCP is one of the fastest ways to grow a tool catalog. If you're new to the product, start with Introducing Orqen for the full picture on payload optimization, compression, and benchmarks.

Tagged:mcptool-callingagent-optimizationllm-costmodel-context-protocol
O

Orqen Team

We build the optimization layer for tool-heavy LLM agents. Our goal is to make agent costs predictable as your tool set grows.

Try Orqen free

250K saved tokens per month. Free forever. Two-line integration.

See your savings in the dashboard within seconds of your first request.