Run the open-source inference router

Spin up auxot-router — single Go binary, OpenAI- and Anthropic-shaped APIs — attach auxot-worker for GPU or CLI inference, and curl your first completion without Postgres or agent accounts.

Plus: three moves — Admin-Agent tradeoff brief when you already run Auxot Server, a smoke-test curl that proves routing, and where to look when auto model picks nothing.

Audience Developers · Admins
Time ~15 min
Prerequisites Docker **or** permission to run downloaded binaries on macOS/Linux. For GPU inference: a machine with a supported GPU (NVIDIA, AMD, or Apple Silicon) per [GPU Workers](/docs/oss/gpu-workers). This path is **not** the same npm installer as [Connect a GPU worker](/tutorials/connect-a-gpu-worker) — that tutorial wires `@auxot/worker-cli` into **commercial** Auxot Server; here you run **`auxot-router`** + **`auxot-worker`** from the OSS releases.
You'll end up with Router listening on a port you chose, at least one worker connected, and a successful `POST /api/openai/chat/completions` — plus a clear sense of when to graduate from OSS-only routing to full Auxot Server ([What is Auxot?](/docs/getting-started/overview)).

When a tutorial shows italic text in quotation marks, it usually mirrors a label or helper string inside Auxot. Product copy changes between releases — if something reads differently in your workspace, trust what you see on screen.

Callouts with a Worth knowing gold accent are meant as must-read context before you move on. Blockquotes that open with Tip are lighter, optional depth.

Why this matters

Auxot ships two related ideas:

  1. Auxot Server (commercial): Postgres-backed accounts, agents, skills, audit trails, the product most tutorials on this site assume.
  2. Open-source router (auxot-router): one static Go binary, optional Redis, no database, no agent framework. It speaks OpenAI and Anthropic HTTP APIs and forwards work to GPU, CLI, or tools workers over WebSockets (Open Source Router).

If you only need “route HTTP inference to silicon we control” (CI sandboxes, edge sites, air-gapped labs, or a sharp prototype before you commit to full governance), the OSS stack is intentionally smaller. You trade agents/context files/RBAC for minutes-to-first-token and Apache 2.0 source you can read end-to-end (github.com/auxothq/auxot).

Most tutorials here assume Auxot Server: Say hello to the Admin Agent onward. This lesson is the escape hatch when Server is more product than you need today.

Nothing routes itself without you starting auxot-router, you connecting workers, and you sending HTTP, cron and CI included.


Quick start

  1. Start auxot-router — Docker one-liner or download the binary + ./auxot-router setup --write-env (Quickstart).
  2. Copy keys from stdout / .env — router prints rtr_… (callers) and adm_… (workers/admin). Persist hashes across restarts per log hints.
  3. Download auxot-worker — same GitHub Releases page as the router; chmod +x.
  4. Connect a GPU workerAUXOT_GPU_KEY=adm_… AUXOT_ROUTER_URL=host:port ./auxot-worker (first boot downloads GGUF + llama.cpp into ~/.auxot/).
  5. Call the OpenAI-compatible endpointPOST http://localhost:8080/api/openai/chat/completions with Authorization: Bearer rtr_…, JSON body {"model":"auto","messages":[{"role":"user","content":"Hello!"}]} (add "stream": true if you want SSE).

Done? You should see assistant text (streaming chunks or a single JSON payload) sourced from the worker that registered, not a login wall, because OSS auth is only those hashed keys.


The agent can do that?

1. Decide OSS vs Auxot Server before you split the team

If you also use hosted or self-hosted Auxot Server, paste into Admin Agent chat:

We're weighing the OSS auxot-router (no Postgres agents) vs staying entirely on Auxot Server for [edge POP | regulated lab | CI inference]. List decision bullets — data residency, RBAC, audit, and operator headcount — and say which workloads belong on each side if we run both.

Why it’s non-obvious: Hybrid is common (OSS at the edge, Server where humans need skills), but only if you align accountability. Admin Agent summarizes tradeoffs because you asked; architecture approval stays human.

2. Prove routing before you integrate SDKs

From any shell with curl:

curl -s http://localhost:8080/api/openai/chat/completions \
  -H "Authorization: Bearer rtr_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Reply with the word pong."}]}'

Why it’s non-obvious: Clients blame “Auxot broke” when the failure is model: auto with zero workers. Curl isolates HTTP vs your app.

3. Inspect what workers could serve

On the worker host:

./auxot-worker models list

Why it’s non-obvious: The registry is huge: pinning AUXOT_MODEL or explicit model IDs beats guessing whether VRAM fits (GPU Workers quantization table).


Go deeper

Ports & defaults

OSS quickstarts often use 8080. Commercial Auxot Server commonly listens on 8420 in docs: don’t swap them blindly when copying curl from another tutorial.

CLI workers

The same auxot-worker binary can back Claude Code pass-through when router policy + env vars match (CLI Workers). Economics and OAuth differ from GPU paths.

Tools workers

auxot-tools handles sandboxed code/search/MCP-style backends: third binary beside router + worker.

OSS vs Connect a GPU worker (worker-cli)
PieceAuxot Server + worker-cliOSS stack
RouterAuxot Server you already log intoauxot-router binary
Worker commandnpx @auxot/worker-cli … --router-url …/ws./auxot-worker + AUXOT_GPU_KEY
AccountsOrg + agentsAPI keys only
Docs anchorProviders UIOSS Quickstart

Pick one story per machine: mixing keys across products wastes weekends.

Troubleshooting
  • 401 from router: wrong rtr_ token or missing Authorization: Bearer.
  • Jobs hang / auto picks nothing: no healthy worker websocket; check worker logs and firewall egress.
  • OOM / slow first boot: model download + VRAM; lower quantization or set AUXOT_MODEL to a smaller GGUF.
  • Need HA: embedded miniredis is single-node; plan external Redis for horizontal router scaling (Deployment).

Walkthrough

Step 1: Choose Docker or bare binary

Docker (fastest):

docker run -p 8080:8080 ghcr.io/auxothq/auxot-router:latest

Read stdout for generated AUXOT_ADMIN_KEY / AUXOT_API_KEY lines: copy into a secrets file.

Binary:

Follow Quickstart → Option 2: download auxot-router, ./auxot-router setup --write-env, source .env, ./auxot-router.

Step 2: Fetch auxot-worker

Download auxot-worker for your OS/arch from GitHub Releases, chmod +x.

Step 3: Connect GPU worker

export AUXOT_GPU_KEY=adm_xxxxxxxx   # from router setup
export AUXOT_ROUTER_URL=127.0.0.1:8080
./auxot-worker

Wait for model cache warmup: logs should show registration with the router.

Step 4: HTTP smoke test

Use rtr_ key on the caller:

curl http://127.0.0.1:8080/api/openai/chat/completions \
  -H "Authorization: Bearer rtr_xxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Say hi in five words."}]}'

Expect assistant content in choices[0].message.content.

Step 5 (optional): Anthropic Messages shape

Same router also serves POST /api/anthropic/v1/messages: identical knobs to Call the Anthropic-compatible Messages API, but pointed at whichever host/port runs auxot-router.

Step 6: Decide what ships next

If you need agents, Linked Accounts, MCP governance: stop enlarging shell scripts and plan Auxot Server (What is Auxot?). If you need ultra-light routing only: stay OSS and harden keys + Redis.


What’s next

Reference