Prerequisites
- Docker (for Option 1) or a Linux/macOS machine with 8+ GB VRAM (for Option 2)
- For GPU workers: an NVIDIA, AMD, or Apple Silicon GPU
Option 1: Docker (Fastest)
The easiest way to run the router is with Docker. With no keys configured, the router auto-generates random API keys and prints them to stdout on first startup.
docker run -p 8080:8080 ghcr.io/auxothq/auxot-router:latest
Copy the keys from stdout — you’ll see something like:
auxot-router: no keys configured, generating random keys
AUXOT_ADMIN_KEY=adm_xxxxxxxxxxxxxxxxxxxxxxxx
AUXOT_API_KEY=rtr_xxxxxxxxxxxxxxxxxxxxxxxx
Set AUXOT_ADMIN_KEY_HASH and AUXOT_API_KEY_HASH to persist these keys across restarts.
Option 2: Binary
Download the latest auxot-router binary from GitHub Releases:
curl -Lo auxot-router https://github.com/auxothq/auxot/releases/latest/download/auxot-router-$(uname -s)-$(uname -m)
chmod +x auxot-router
Generate and persist your keys:
./auxot-router setup --write-env
source .env
./auxot-router
Connect a GPU Worker
Download auxot-worker and connect it to the router. The worker auto-downloads the model from HuggingFace and spawns llama.cpp:
curl -Lo auxot-worker https://github.com/auxothq/auxot/releases/latest/download/auxot-worker-$(uname -s)-$(uname -m)
chmod +x auxot-worker
AUXOT_GPU_KEY=adm_xxx AUXOT_ROUTER_URL=localhost:8080 ./auxot-worker
The first startup downloads the default model and llama.cpp into ~/.auxot/. Subsequent starts use the local cache.
Send Your First Request
curl http://localhost:8080/api/openai/chat/completions \
-H "Authorization: Bearer rtr_xxx" \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "stream": true}'
Use "model": "auto" to route to whatever worker is connected, or specify a model name from the registry.