API Reference

POST /v1/chat/completions

The primary endpoint. Accepts the same request format as the OpenAI Chat Completions API and returns the same response format. Orqen prunes the tools array before forwarding and adds observability headers to the response.

Native Anthropic Messages and Bedrock Converse tool payloads should be adapted into this OpenAI-compatible shape before calling Orqen. See provider migration examples for the mapping.

POSThttps://api.orqen.app/v1/chat/completions

Request

Headers

`Authorization`required	string	Bearer token. Format: Bearer sk-orq-...
`Content-Type`required	string	Must be application/json

Body parameters

Parameter	Type	Description
`model`required	string	The model to use. Accepts any LiteLLM model string (e.g. "gpt-4o", "claude-3-5-sonnet-20241022", "bedrock/...", "groq/llama-3.3-70b-versatile"). Also accepts Orqen routing strings: "orqen/auto", "orqen/cheap", "orqen/fast", "orqen/capable".
`messages`required	array	Array of message objects. Same format as OpenAI. Orqen uses the last user message (and conversation context) to score tool relevance.
`tools`	array	Array of tool definition objects (OpenAI function calling format). Orqen prunes this array before forwarding. If absent, the request is forwarded unchanged.
`stream`	boolean	If true, responses stream as Server-Sent Events. Default: false.
`tool_choice`	string \| object	Tool selection hint. Forwarded to the LLM unchanged.
`temperature`	number	Sampling temperature. Forwarded unchanged.
`max_tokens`	integer	Maximum tokens in the completion. Forwarded unchanged.

Orqen-specific extensions

These fields are consumed by Orqen and not forwarded to the LLM:

`aws_access_key_id`	string	AWS access key for Bedrock. Alternative to storing credentials in the dashboard.
`aws_secret_access_key`	string	AWS secret key for Bedrock.
`aws_region_name`	string	AWS region for Bedrock. Default: us-east-1.
`aws_session_token`	string	AWS session token for temporary credentials.
`function.x-orqen-examples`	array	Optional tool-level routing examples. Orqen uses them for pruning and strips them before forwarding upstream.

Example request

from openai import OpenAI

client = OpenAI(
    api_key="sk-orq-YOUR_KEY",
    base_url="https://api.orqen.app/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "What's the weather in London today?"},
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather conditions for a city.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string", "description": "City name"},
                    },
                    "required": ["city"],
                },
            },
        },
        # ... 30 more tools — Orqen prunes the irrelevant ones
    ],
    tool_choice="auto",
)

print(response.choices[0].message)

Response

The response body is identical to the OpenAI Chat Completions response format. Orqen adds the following headers:

`x-orqen-tools-input`	integer	Number of tools in your original request.
`x-orqen-tools-output`	integer	Number of tools forwarded to the LLM after pruning.
`x-orqen-prune-ratio`	string	Ratio as output/input — e.g. "8/32".
`x-orqen-routing`	string	How tools were selected: "semantic" (embedding-based), "keyword" (fallback), or "none" (no pruning).
`X-Request-ID`	string	Unique request ID. Pass this to support for tracing.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1748823456,
  "model": "gpt-4o-2024-11-20",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"London\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 287,
    "completion_tokens": 18,
    "total_tokens": 305
  }
}

Orqen routing model strings

Instead of specifying a model directly, you can use an Orqen routing string and let Orqen pick the best model from your connected providers:

`orqen/auto`	model string	Picks based on task complexity. Simple queries → fast cheap models. Complex reasoning → capable models.
`orqen/cheap`	model string	Always the cheapest capable model from your connected providers.
`orqen/fast`	model string	Always the lowest-latency model, based on observed latency data.
`orqen/capable`	model string	Always the most capable model, prioritising recall accuracy.