API Reference
POST /v1/chat/completions
The OpenAI SDK endpoint. Accepts the OpenAI Chat Completions request format, builds an intent-aware optimization plan, routes relevant tools, compresses schemas and long agent context when useful, validates the outbound payload, and forwards to your LLM provider. Adds observability headers and dashboard traces to the response.
The three main SDK formats are supported directly:
POST /v1/messages— Anthropic SDK (base_url=https://api.orqen.app)POST /v1/chat/completions— OpenAI SDK (base_url=https://api.orqen.app/v1)POST /model/{id}/converse— AWS Bedrock boto3 (endpoint_url=https://api.orqen.app)
Use whichever SDK you already have; your message and tool payloads keep their usual shape.
https://api.orqen.app/v1/chat/completionsRequest
Headers
Authorizationrequired | string | Bearer token. Format: Bearer sk-orq-... |
Content-Typerequired | string | Must be application/json |
X-Orqen-Session-Id | string | Optional. Pass any string to group requests into a conversation session visible in the dashboard Sessions page. |
Body parameters
| Parameter | Type | Description |
|---|---|---|
modelrequired | string | The model to use. Accepts provider model strings (e.g. "gpt-4o", "claude-sonnet-4-6", "bedrock/...", "groq/llama-3.3-70b-versatile"). Also accepts Orqen routing strings: "orqen/auto", "orqen/cheap", "orqen/fast", "orqen/capable". |
messagesrequired | array | Array of message objects. Same format as OpenAI. Orqen uses the last user message and conversation context to build the optimization plan and preserve critical terms. |
tools | array | Array of tool definition objects (OpenAI function calling format). Orqen scores, routes, and schema-compresses this array before forwarding. If absent, the request is forwarded unchanged. |
stream | boolean | If true, responses stream as Server-Sent Events. Default: false. |
tool_choice | string | object | Tool selection hint. Forwarded to the LLM unchanged. |
temperature | number | Sampling temperature. Forwarded unchanged. |
max_tokens | integer | Maximum tokens in the completion. Forwarded unchanged. |
Orqen-specific extensions
These fields are consumed by Orqen and not forwarded to the LLM. They apply to this OpenAI-compatible endpoint; native boto3 converse() rejects unknown fields such as aws_bearer_token_bedrock.
aws_access_key_id | string | AWS access key for Bedrock. Alternative to storing credentials in the dashboard. |
aws_secret_access_key | string | AWS secret key for Bedrock. |
aws_region_name | string | AWS region for Bedrock. Default: us-east-1. |
aws_session_token | string | AWS session token for temporary credentials. |
aws_bearer_token_bedrock | string | Amazon Bedrock API key bearer token. Use this instead of IAM access key fields. |
function.x-orqen-examples | array | Optional tool-level routing examples. Orqen uses them for tool routing and strips them before forwarding upstream. |
Example request
import anthropic
client = anthropic.Anthropic(
api_key="sk-orq-YOUR_KEY",
base_url="https://api.orqen.app",
)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "What is the weather in London?"}],
tools=[
{
"name": "get_weather",
"description": "Get current weather for a city.",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
}
],
)
print(response.content[0].text)Response
The response body is identical to the OpenAI Chat Completions response format. Orqen adds the following headers:
x-orqen-tools-input | integer | Number of tools in your original request. |
x-orqen-tools-output | integer | Number of tools forwarded to the LLM after tool routing. |
x-orqen-prune-ratio | string | Ratio as output/input — e.g. "8/32". |
x-orqen-routing | string | Tool routing when pruning ran: "semantic" (embeddings + optional Stage-2 rerank) or "keyword" (fast path). Omitted when there are no tools or tools were forwarded unchanged. |
x-orqen-ctx-tokens-in | integer | Estimated context tokens before compression. Present when message/context compression ran. |
x-orqen-ctx-tokens-out | integer | Estimated context tokens after compression. |
x-orqen-ctx-tokens-saved | integer | Context tokens saved (in − out). Dashboard logs also store compression_tokens_saved separately from tool pruning savings. |
X-Request-ID | string | Unique request ID. Pass this to support for tracing. |
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1748823456,
"model": "gpt-4o-2024-11-20",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"London\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 287,
"completion_tokens": 18,
"total_tokens": 305
}
}GET /v1/models
OpenAI-compatible model list built from your connected providers. Same auth as chat completions.
curl https://api.orqen.app/v1/models \
-H "Authorization: Bearer sk-orq-YOUR_KEY"Orqen routing model strings
Instead of specifying a model directly, you can use an Orqen routing string and let Orqen pick the best model from your connected providers:
orqen/auto | model string | Picks based on task complexity. Simple queries → fast cheap models. Complex reasoning → capable models. |
orqen/cheap | model string | Always the cheapest capable model from your connected providers. |
orqen/fast | model string | Always the lowest-latency model, based on observed latency data. |
orqen/capable | model string | Always the most capable model, prioritising recall accuracy. |