API Reference
POST /v1/chat/completions
The primary endpoint. Accepts the same request format as the OpenAI Chat Completions API and returns the same response format. Orqen prunes the tools array before forwarding and adds observability headers to the response.
Native Anthropic Messages and Bedrock Converse tool payloads should be adapted into this OpenAI-compatible shape before calling Orqen. See provider migration examples for the mapping.
POST
https://api.orqen.app/v1/chat/completionsRequest
Headers
Authorizationrequired | string | Bearer token. Format: Bearer sk-orq-... |
Content-Typerequired | string | Must be application/json |
Body parameters
| Parameter | Type | Description |
|---|---|---|
modelrequired | string | The model to use. Accepts any LiteLLM model string (e.g. "gpt-4o", "claude-3-5-sonnet-20241022", "bedrock/...", "groq/llama-3.3-70b-versatile"). Also accepts Orqen routing strings: "orqen/auto", "orqen/cheap", "orqen/fast", "orqen/capable". |
messagesrequired | array | Array of message objects. Same format as OpenAI. Orqen uses the last user message (and conversation context) to score tool relevance. |
tools | array | Array of tool definition objects (OpenAI function calling format). Orqen prunes this array before forwarding. If absent, the request is forwarded unchanged. |
stream | boolean | If true, responses stream as Server-Sent Events. Default: false. |
tool_choice | string | object | Tool selection hint. Forwarded to the LLM unchanged. |
temperature | number | Sampling temperature. Forwarded unchanged. |
max_tokens | integer | Maximum tokens in the completion. Forwarded unchanged. |
Orqen-specific extensions
These fields are consumed by Orqen and not forwarded to the LLM:
aws_access_key_id | string | AWS access key for Bedrock. Alternative to storing credentials in the dashboard. |
aws_secret_access_key | string | AWS secret key for Bedrock. |
aws_region_name | string | AWS region for Bedrock. Default: us-east-1. |
aws_session_token | string | AWS session token for temporary credentials. |
function.x-orqen-examples | array | Optional tool-level routing examples. Orqen uses them for pruning and strips them before forwarding upstream. |
Example request
from openai import OpenAI
client = OpenAI(
api_key="sk-orq-YOUR_KEY",
base_url="https://api.orqen.app/v1",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather in London today?"},
],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather conditions for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
},
# ... 30 more tools — Orqen prunes the irrelevant ones
],
tool_choice="auto",
)
print(response.choices[0].message)Response
The response body is identical to the OpenAI Chat Completions response format. Orqen adds the following headers:
x-orqen-tools-input | integer | Number of tools in your original request. |
x-orqen-tools-output | integer | Number of tools forwarded to the LLM after pruning. |
x-orqen-prune-ratio | string | Ratio as output/input — e.g. "8/32". |
x-orqen-routing | string | How tools were selected: "semantic" (embedding-based), "keyword" (fallback), or "none" (no pruning). |
X-Request-ID | string | Unique request ID. Pass this to support for tracing. |
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1748823456,
"model": "gpt-4o-2024-11-20",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"London\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 287,
"completion_tokens": 18,
"total_tokens": 305
}
}Orqen routing model strings
Instead of specifying a model directly, you can use an Orqen routing string and let Orqen pick the best model from your connected providers:
orqen/auto | model string | Picks based on task complexity. Simple queries → fast cheap models. Complex reasoning → capable models. |
orqen/cheap | model string | Always the cheapest capable model from your connected providers. |
orqen/fast | model string | Always the lowest-latency model, based on observed latency data. |
orqen/capable | model string | Always the most capable model, prioritising recall accuracy. |