Skip to content
Orqen Docs

API Reference

POST /v1/chat/completions

The OpenAI SDK endpoint. Accepts the OpenAI Chat Completions request format, builds an intent-aware optimization plan, routes relevant tools, compresses schemas and long agent context when useful, validates the outbound payload, and forwards to your LLM provider. Adds observability headers and dashboard traces to the response.

The three main SDK formats are supported directly:

  • POST /v1/messages — Anthropic SDK (base_url=https://api.orqen.app)
  • POST /v1/chat/completions — OpenAI SDK (base_url=https://api.orqen.app/v1)
  • POST /model/{id}/converse — AWS Bedrock boto3 (endpoint_url=https://api.orqen.app)

Use whichever SDK you already have; your message and tool payloads keep their usual shape.

POSThttps://api.orqen.app/v1/chat/completions

Request

Headers

AuthorizationrequiredstringBearer token. Format: Bearer sk-orq-...
Content-TyperequiredstringMust be application/json
X-Orqen-Session-IdstringOptional. Pass any string to group requests into a conversation session visible in the dashboard Sessions page.

Body parameters

ParameterTypeDescription
modelrequiredstringThe model to use. Accepts provider model strings (e.g. "gpt-4o", "claude-sonnet-4-6", "bedrock/...", "groq/llama-3.3-70b-versatile"). Also accepts Orqen routing strings: "orqen/auto", "orqen/cheap", "orqen/fast", "orqen/capable".
messagesrequiredarrayArray of message objects. Same format as OpenAI. Orqen uses the last user message and conversation context to build the optimization plan and preserve critical terms.
toolsarrayArray of tool definition objects (OpenAI function calling format). Orqen scores, routes, and schema-compresses this array before forwarding. If absent, the request is forwarded unchanged.
streambooleanIf true, responses stream as Server-Sent Events. Default: false.
tool_choicestring | objectTool selection hint. Forwarded to the LLM unchanged.
temperaturenumberSampling temperature. Forwarded unchanged.
max_tokensintegerMaximum tokens in the completion. Forwarded unchanged.

Orqen-specific extensions

These fields are consumed by Orqen and not forwarded to the LLM. They apply to this OpenAI-compatible endpoint; native boto3 converse() rejects unknown fields such as aws_bearer_token_bedrock.

aws_access_key_idstringAWS access key for Bedrock. Alternative to storing credentials in the dashboard.
aws_secret_access_keystringAWS secret key for Bedrock.
aws_region_namestringAWS region for Bedrock. Default: us-east-1.
aws_session_tokenstringAWS session token for temporary credentials.
aws_bearer_token_bedrockstringAmazon Bedrock API key bearer token. Use this instead of IAM access key fields.
function.x-orqen-examplesarrayOptional tool-level routing examples. Orqen uses them for tool routing and strips them before forwarding upstream.

Example request

Anthropic SDK
import anthropic

client = anthropic.Anthropic(
    api_key="sk-orq-YOUR_KEY",
    base_url="https://api.orqen.app",
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What is the weather in London?"}],
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "input_schema": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        }
    ],
)
print(response.content[0].text)

Response

The response body is identical to the OpenAI Chat Completions response format. Orqen adds the following headers:

x-orqen-tools-inputintegerNumber of tools in your original request.
x-orqen-tools-outputintegerNumber of tools forwarded to the LLM after tool routing.
x-orqen-prune-ratiostringRatio as output/input — e.g. "8/32".
x-orqen-routingstringTool routing when pruning ran: "semantic" (embeddings + optional Stage-2 rerank) or "keyword" (fast path). Omitted when there are no tools or tools were forwarded unchanged.
x-orqen-ctx-tokens-inintegerEstimated context tokens before compression. Present when message/context compression ran.
x-orqen-ctx-tokens-outintegerEstimated context tokens after compression.
x-orqen-ctx-tokens-savedintegerContext tokens saved (in − out). Dashboard logs also store compression_tokens_saved separately from tool pruning savings.
X-Request-IDstringUnique request ID. Pass this to support for tracing.
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1748823456,
  "model": "gpt-4o-2024-11-20",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"London\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 287,
    "completion_tokens": 18,
    "total_tokens": 305
  }
}

GET /v1/models

OpenAI-compatible model list built from your connected providers. Same auth as chat completions.

curl https://api.orqen.app/v1/models \
  -H "Authorization: Bearer sk-orq-YOUR_KEY"

Orqen routing model strings

Instead of specifying a model directly, you can use an Orqen routing string and let Orqen pick the best model from your connected providers:

orqen/automodel stringPicks based on task complexity. Simple queries → fast cheap models. Complex reasoning → capable models.
orqen/cheapmodel stringAlways the cheapest capable model from your connected providers.
orqen/fastmodel stringAlways the lowest-latency model, based on observed latency data.
orqen/capablemodel stringAlways the most capable model, prioritising recall accuracy.