Core Concepts

Payload optimization

Orqen optimizes the full agent payload before it reaches the model. That includes tools, schemas, tool results, history, images, model choice, reconstruction, validation, and recovery.

Orqen snapshots, optimizes, validates, then forwards each request.

One intent-aware plan coordinates every stage

Before forwarding a request, Orqen builds one request plan. The same plan guides tool routing, compression, summarization, reconstruction, validation, and recovery so those stages stay aligned instead of making isolated decisions.

What happens on a request

Snapshot and understand the request

Orqen snapshots the original payload, reads the current goal, and optionally enriches high-value requests before changing anything.

Decide the right cleanup level

Orqen chooses how much history to keep, when to compress, how many tools to forward, how to trim schemas, and how strict validation should be.

Optimize the whole payload

Orqen deduplicates prompts, compresses images and tool results, manages hot/warm/cold history, routes relevant tools, and trims schemas after routing.

Rebuild and validate

Orqen assembles the final model-facing request, checks critical terms and tool schemas, and restores context if validation fails.

Intent-aware tool routing

Tool routing is one part of the payload optimization layer. Orqen does not only compare text similarity; it understands the current request well enough to build a frame such as:

{
  "domain": "weather",
  "action": "forecast",
  "slots": { "location": "Sittingbourne Kent" },
  "side_effect_allowed": false,
  "previous_tool_error": false,
  "confidence": 0.73
}

That frame is matched against tool capability cards derived from function names, descriptions, schemas, required inputs, and optional routing examples.

{
  "type": "function",
  "function": {
    "name": "open_meteo_weather",
    "description": "Get real weather forecast for a city.",
    "x-orqen-examples": [
      "weather in London",
      "forecast for Sittingbourne Kent"
    ],
    "parameters": {
      "type": "object",
      "properties": { "city": { "type": "string" } },
      "required": ["city"]
    }
  }
}

Adaptive K

Orqen chooses how many tools to forward based on confidence, risk, and recovery signals:

Crisp single intent	1 tool	Example: list files, weather in London.
Moderate confidence	2-3 tools	Enough room for close alternatives.
Multi-step or failed retry	up to 4 tools	Protects recall when the agent is recovering.
Side effects	minimum 3 tools	Write/send/execute operations are widened unless confidence is very high.
Timeout or error	all tools	Fail-open behavior keeps customer requests reliable.

Conversation history management

For multi-turn agents, prompt size grows with each exchange. Orqen manages history in three tiers:

Hot — recent turns	Kept verbatim in the forwarded payload.
Warm — older turns	Compressed and deduplicated — semantically equivalent content collapsed.
Cold — early turns	Summarized — replaced by a compact LLM-generated summary of the chunk.

For very long sessions (100+ turns), summaries are merged in a hierarchical pass — each merge call handles exactly two summaries — so no single LLM call sees unbounded input and early context is never silently truncated.

Learning loop

Orqen stores privacy-safe optimization traces: detected intent, selected tools, top candidates, recall, compression strategy, reconstruction strategy, shadow proactive rebuild, and recovery signals. This lets Orqen calibrate the system from real data without storing raw prompts.

x-orqen-tools-input:  51
x-orqen-tools-output: 1
x-orqen-prune-ratio:  1/51
x-orqen-routing:      semantic

Best practice

Write tool descriptions with explicit scope, keep required schema fields accurate, and preserve meaningful IDs/URLs in user-visible turns. Add x-orqen-examples when two tools have similar names or overlapping domains.

Continue to the Chat Completions API reference →