Orqen Docs

API Reference

POST /v1/chat/completions

The primary endpoint. Accepts the same request format as the OpenAI Chat Completions API and returns the same response format. Orqen prunes the tools array before forwarding and adds observability headers to the response.

Native Anthropic Messages and Bedrock Converse tool payloads should be adapted into this OpenAI-compatible shape before calling Orqen. See provider migration examples for the mapping.

POSThttps://api.orqen.app/v1/chat/completions

Request

Headers

AuthorizationrequiredstringBearer token. Format: Bearer sk-orq-...
Content-TyperequiredstringMust be application/json

Body parameters

ParameterTypeDescription
modelrequiredstringThe model to use. Accepts any LiteLLM model string (e.g. "gpt-4o", "claude-3-5-sonnet-20241022", "bedrock/...", "groq/llama-3.3-70b-versatile"). Also accepts Orqen routing strings: "orqen/auto", "orqen/cheap", "orqen/fast", "orqen/capable".
messagesrequiredarrayArray of message objects. Same format as OpenAI. Orqen uses the last user message (and conversation context) to score tool relevance.
toolsarrayArray of tool definition objects (OpenAI function calling format). Orqen prunes this array before forwarding. If absent, the request is forwarded unchanged.
streambooleanIf true, responses stream as Server-Sent Events. Default: false.
tool_choicestring | objectTool selection hint. Forwarded to the LLM unchanged.
temperaturenumberSampling temperature. Forwarded unchanged.
max_tokensintegerMaximum tokens in the completion. Forwarded unchanged.

Orqen-specific extensions

These fields are consumed by Orqen and not forwarded to the LLM:

aws_access_key_idstringAWS access key for Bedrock. Alternative to storing credentials in the dashboard.
aws_secret_access_keystringAWS secret key for Bedrock.
aws_region_namestringAWS region for Bedrock. Default: us-east-1.
aws_session_tokenstringAWS session token for temporary credentials.
function.x-orqen-examplesarrayOptional tool-level routing examples. Orqen uses them for pruning and strips them before forwarding upstream.

Example request

from openai import OpenAI

client = OpenAI(
    api_key="sk-orq-YOUR_KEY",
    base_url="https://api.orqen.app/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "What's the weather in London today?"},
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather conditions for a city.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string", "description": "City name"},
                    },
                    "required": ["city"],
                },
            },
        },
        # ... 30 more tools — Orqen prunes the irrelevant ones
    ],
    tool_choice="auto",
)

print(response.choices[0].message)

Response

The response body is identical to the OpenAI Chat Completions response format. Orqen adds the following headers:

x-orqen-tools-inputintegerNumber of tools in your original request.
x-orqen-tools-outputintegerNumber of tools forwarded to the LLM after pruning.
x-orqen-prune-ratiostringRatio as output/input — e.g. "8/32".
x-orqen-routingstringHow tools were selected: "semantic" (embedding-based), "keyword" (fallback), or "none" (no pruning).
X-Request-IDstringUnique request ID. Pass this to support for tracing.
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1748823456,
  "model": "gpt-4o-2024-11-20",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"London\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 287,
    "completion_tokens": 18,
    "total_tokens": 305
  }
}

Orqen routing model strings

Instead of specifying a model directly, you can use an Orqen routing string and let Orqen pick the best model from your connected providers:

orqen/automodel stringPicks based on task complexity. Simple queries → fast cheap models. Complex reasoning → capable models.
orqen/cheapmodel stringAlways the cheapest capable model from your connected providers.
orqen/fastmodel stringAlways the lowest-latency model, based on observed latency data.
orqen/capablemodel stringAlways the most capable model, prioritising recall accuracy.