Authentication

All API requests require a bearer token:

Authorization: Bearer rtr_xxx

The rtr_xxx value is the raw API key generated by auxot-router setup (not the hash). Store it securely — the router only stores the Argon2id hash.


OpenAI-Compatible Endpoints

POST /api/openai/chat/completions

Drop-in replacement for the OpenAI chat completions endpoint. Supports streaming, tool calls, and vision.

curl http://localhost:8080/api/openai/chat/completions \
  -H "Authorization: Bearer rtr_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Explain tensor parallelism"}],
    "stream": true
  }'

Use "model": "auto" to route to any available worker, or specify a model name from the registry.

GET /api/openai/models

Returns the list of models currently available from connected workers.

curl http://localhost:8080/api/openai/models \
  -H "Authorization: Bearer rtr_xxx"

Anthropic-Compatible Endpoints

POST /api/anthropic/v1/messages

Drop-in replacement for the Anthropic messages endpoint. Supports streaming, tool use, and thinking tokens.

curl http://localhost:8080/api/anthropic/v1/messages \
  -H "Authorization: Bearer rtr_xxx" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "auto",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello"}]
  }'

POST /api/anthropic/v1/messages/count_tokens

Count tokens for a messages payload without running inference.

GET /api/anthropic/v1/models

Returns available models in Anthropic API format.


Tools API

GET /api/tools/v1/list

List all tools registered with the connected tools worker.

curl http://localhost:8080/api/tools/v1/list \
  -H "Authorization: Bearer rtr_xxx"

POST /api/tools/v1/execute

Execute a tool directly without going through the LLM.

curl http://localhost:8080/api/tools/v1/execute \
  -H "Authorization: Bearer rtr_xxx" \
  -H "Content-Type: application/json" \
  -d '{"tool": "web_search", "input": {"query": "Go concurrency patterns"}}'

Health Check

curl http://localhost:8080/health

Returns 200 OK with no authentication required.


Streaming (SSE)

Set "stream": true in your request body. The router returns Server-Sent Events in the standard OpenAI SSE format:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: [DONE]

Reasoning / Thinking Tokens

For models that support extended thinking (e.g. Claude with thinking budget), pass the relevant parameters in the request body. The router passes them through to the worker and relays thinking tokens in the response stream.