Prerequisites

  • Docker (for Option 1) or a Linux/macOS machine with 8+ GB VRAM (for Option 2)
  • For GPU workers: an NVIDIA, AMD, or Apple Silicon GPU

Option 1: Docker (Fastest)

The easiest way to run the router is with Docker. With no keys configured, the router auto-generates random API keys and prints them to stdout on first startup.

docker run -p 8080:8080 ghcr.io/auxothq/auxot-router:latest

Copy the keys from stdout — you’ll see something like:

auxot-router: no keys configured, generating random keys
  AUXOT_ADMIN_KEY=adm_xxxxxxxxxxxxxxxxxxxxxxxx
  AUXOT_API_KEY=rtr_xxxxxxxxxxxxxxxxxxxxxxxx
  Set AUXOT_ADMIN_KEY_HASH and AUXOT_API_KEY_HASH to persist these keys across restarts.

Option 2: Binary

Download the latest auxot-router binary from GitHub Releases:

curl -Lo auxot-router https://github.com/auxothq/auxot/releases/latest/download/auxot-router-$(uname -s)-$(uname -m)
chmod +x auxot-router

Generate and persist your keys:

./auxot-router setup --write-env
source .env
./auxot-router

Connect a GPU Worker

Download auxot-worker and connect it to the router. The worker auto-downloads the model from HuggingFace and spawns llama.cpp:

curl -Lo auxot-worker https://github.com/auxothq/auxot/releases/latest/download/auxot-worker-$(uname -s)-$(uname -m)
chmod +x auxot-worker

AUXOT_GPU_KEY=adm_xxx AUXOT_ROUTER_URL=localhost:8080 ./auxot-worker

The first startup downloads the default model and llama.cpp into ~/.auxot/. Subsequent starts use the local cache.


Send Your First Request

curl http://localhost:8080/api/openai/chat/completions \
  -H "Authorization: Bearer rtr_xxx" \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "Hello!"}], "stream": true}'

Use "model": "auto" to route to whatever worker is connected, or specify a model name from the registry.