Endpoint

POST /api/openai/chat/completions

Auxot implements the OpenAI Chat Completions API specification, so any tool or SDK that works with OpenAI can point at Auxot instead by changing the base URL.

Authenticate with a personal (user.…) or team (team.…) API key from Settings → API Keys on Auxot Server (authentication).

Request Format

curl http://localhost:8420/api/openai/chat/completions \
  -H "Authorization: Bearer <api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the difference between TCP and UDP."}
    ],
    "temperature": 0.7,
    "max_tokens": 1024,
    "stream": false
  }'

Supported Parameters

ParameterTypeDescription
modelstringModel ID or "auto" for router-selected.
messagesarrayConversation messages (system, user, assistant, tool roles).
temperaturefloatSampling temperature (0.0–2.0). Default: 1.0.
max_tokensintegerMaximum tokens to generate.
streambooleanEnable streaming via Server-Sent Events. Default: false.
toolsarrayTool definitions for function calling.
tool_choicestring/objectControl tool use: "auto", "none", or specific tool.
top_pfloatNucleus sampling parameter.
stopstring/arrayStop sequences.

When model is "auto", Auxot picks the best available provider using priority routing (GPU → CLI → Cloud). You can also specify a concrete model ID like "gpt-4o" or "claude-sonnet-4-20250514" to target a specific model.

Response Format

{
  "id": "chatcmpl-auxot-abc123",
  "object": "chat.completion",
  "created": 1710432000,
  "model": "llama-3.3-70b-q4_k_m",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are both transport layer protocols..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 156,
    "total_tokens": 184
  }
}

The response includes which model actually served the request in the model field, which is useful when using "auto" routing.

Streaming

Set "stream": true to receive responses as Server-Sent Events:

curl http://localhost:8420/api/openai/chat/completions \
  -H "Authorization: Bearer <api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Write a haiku about servers."}],
    "stream": true
  }'

Response stream:

data: {"id":"chatcmpl-auxot-abc123","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-auxot-abc123","choices":[{"delta":{"content":"Silicon"},"index":0}]}

data: {"id":"chatcmpl-auxot-abc123","choices":[{"delta":{"content":" hums"},"index":0}]}

...

data: {"id":"chatcmpl-auxot-abc123","choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}

data: [DONE]

Tool Calls (Function Calling)

Define tools in the request and the model can call them:

curl http://localhost:8420/api/openai/chat/completions \
  -H "Authorization: Bearer <api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is the weather in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

The model responds with a tool call:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

You then send the tool result back in a follow-up request with role: "tool".

Using with the OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8420/api/openai",
    api_key="user.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Using with Cursor (local Auxot)

Cursor can send chat requests through Auxot’s OpenAI-compatible endpoint by setting your OpenAI API Key to an Auxot key and overriding the OpenAI base URL to http://<host>:<port>/api/openai. See Claude Code & Cursor (Local) → for API key, Cursor UI, and Claude Code env on one Auxot instance.

/api/openai/v1/... aliases are still accepted for client compatibility.