Authentication
All API requests require a bearer token:
Authorization: Bearer rtr_xxx
The rtr_xxx value is the raw API key generated by auxot-router setup (not the hash). Store it securely — the router only stores the Argon2id hash.
OpenAI-Compatible Endpoints
POST /api/openai/chat/completions
Drop-in replacement for the OpenAI chat completions endpoint. Supports streaming, tool calls, and vision.
curl http://localhost:8080/api/openai/chat/completions \
-H "Authorization: Bearer rtr_xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Explain tensor parallelism"}],
"stream": true
}'
Use "model": "auto" to route to any available worker, or specify a model name from the registry.
GET /api/openai/models
Returns the list of models currently available from connected workers.
curl http://localhost:8080/api/openai/models \
-H "Authorization: Bearer rtr_xxx"
Anthropic-Compatible Endpoints
POST /api/anthropic/v1/messages
Drop-in replacement for the Anthropic messages endpoint. Supports streaming, tool use, and thinking tokens.
curl http://localhost:8080/api/anthropic/v1/messages \
-H "Authorization: Bearer rtr_xxx" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "auto",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
}'
POST /api/anthropic/v1/messages/count_tokens
Count tokens for a messages payload without running inference.
GET /api/anthropic/v1/models
Returns available models in Anthropic API format.
Tools API
GET /api/tools/v1/list
List all tools registered with the connected tools worker.
curl http://localhost:8080/api/tools/v1/list \
-H "Authorization: Bearer rtr_xxx"
POST /api/tools/v1/execute
Execute a tool directly without going through the LLM.
curl http://localhost:8080/api/tools/v1/execute \
-H "Authorization: Bearer rtr_xxx" \
-H "Content-Type: application/json" \
-d '{"tool": "web_search", "input": {"query": "Go concurrency patterns"}}'
Health Check
curl http://localhost:8080/health
Returns 200 OK with no authentication required.
Streaming (SSE)
Set "stream": true in your request body. The router returns Server-Sent Events in the standard OpenAI SSE format:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: [DONE]
Reasoning / Thinking Tokens
For models that support extended thinking (e.g. Claude with thinking budget), pass the relevant parameters in the request body. The router passes them through to the worker and relays thinking tokens in the response stream.