Run the open-source inference router
Spin up auxot-router — single Go binary, OpenAI- and Anthropic-shaped APIs — attach auxot-worker for GPU or CLI inference, and curl your first completion without Postgres or agent accounts.
Plus: three moves — Admin-Agent tradeoff brief when you already run Auxot Server, a smoke-test curl that proves routing, and where to look when auto model picks nothing.
| Audience | Developers · Admins |
|---|---|
| Time | ~15 min |
| Prerequisites | Docker **or** permission to run downloaded binaries on macOS/Linux. For GPU inference: a machine with a supported GPU (NVIDIA, AMD, or Apple Silicon) per [GPU Workers](/docs/oss/gpu-workers). This path is **not** the same npm installer as [Connect a GPU worker](/tutorials/connect-a-gpu-worker) — that tutorial wires `@auxot/worker-cli` into **commercial** Auxot Server; here you run **`auxot-router`** + **`auxot-worker`** from the OSS releases. |
| You'll end up with | Router listening on a port you chose, at least one worker connected, and a successful `POST /api/openai/chat/completions` — plus a clear sense of when to graduate from OSS-only routing to full Auxot Server ([What is Auxot?](/docs/getting-started/overview)). |
When a tutorial shows italic text in quotation marks, it usually mirrors a label or helper string inside Auxot. Product copy changes between releases — if something reads differently in your workspace, trust what you see on screen.
Callouts with a Worth knowing gold accent are meant as must-read context before you move on. Blockquotes that open with Tip are lighter, optional depth.
Why this matters
Auxot ships two related ideas:
- Auxot Server (commercial): Postgres-backed accounts, agents, skills, audit trails, the product most tutorials on this site assume.
- Open-source router (
auxot-router): one static Go binary, optional Redis, no database, no agent framework. It speaks OpenAI and Anthropic HTTP APIs and forwards work to GPU, CLI, or tools workers over WebSockets (Open Source Router).
If you only need “route HTTP inference to silicon we control” (CI sandboxes, edge sites, air-gapped labs, or a sharp prototype before you commit to full governance), the OSS stack is intentionally smaller. You trade agents/context files/RBAC for minutes-to-first-token and Apache 2.0 source you can read end-to-end (github.com/auxothq/auxot).
Most tutorials here assume Auxot Server: Say hello to the Admin Agent onward. This lesson is the escape hatch when Server is more product than you need today.
Nothing routes itself without you starting auxot-router, you connecting workers, and you sending HTTP, cron and CI included.
Quick start
- Start
auxot-router— Docker one-liner or download the binary +./auxot-router setup --write-env(Quickstart). - Copy keys from stdout /
.env— router printsrtr_…(callers) andadm_…(workers/admin). Persist hashes across restarts per log hints. - Download
auxot-worker— same GitHub Releases page as the router;chmod +x. - Connect a GPU worker —
AUXOT_GPU_KEY=adm_… AUXOT_ROUTER_URL=host:port ./auxot-worker(first boot downloads GGUF + llama.cpp into~/.auxot/). - Call the OpenAI-compatible endpoint —
POST http://localhost:8080/api/openai/chat/completionswithAuthorization: Bearer rtr_…, JSON body{"model":"auto","messages":[{"role":"user","content":"Hello!"}]}(add"stream": trueif you want SSE).
Done? You should see assistant text (streaming chunks or a single JSON payload) sourced from the worker that registered, not a login wall, because OSS auth is only those hashed keys.
The agent can do that?
1. Decide OSS vs Auxot Server before you split the team
If you also use hosted or self-hosted Auxot Server, paste into Admin Agent chat:
We're weighing the OSS auxot-router (no Postgres agents) vs staying entirely on Auxot Server for [edge POP | regulated lab | CI inference]. List decision bullets — data residency, RBAC, audit, and operator headcount — and say which workloads belong on each side if we run both.
Why it’s non-obvious: Hybrid is common (OSS at the edge, Server where humans need skills), but only if you align accountability. Admin Agent summarizes tradeoffs because you asked; architecture approval stays human.
2. Prove routing before you integrate SDKs
From any shell with curl:
curl -s http://localhost:8080/api/openai/chat/completions \
-H "Authorization: Bearer rtr_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Reply with the word pong."}]}'
Why it’s non-obvious: Clients blame “Auxot broke” when the failure is model: auto with zero workers. Curl isolates HTTP vs your app.
3. Inspect what workers could serve
On the worker host:
./auxot-worker models list
Why it’s non-obvious: The registry is huge: pinning AUXOT_MODEL or explicit model IDs beats guessing whether VRAM fits (GPU Workers quantization table).
Go deeper
Ports & defaults
OSS quickstarts often use 8080. Commercial Auxot Server commonly listens on 8420 in docs: don’t swap them blindly when copying curl from another tutorial.
CLI workers
The same auxot-worker binary can back Claude Code pass-through when router policy + env vars match (CLI Workers). Economics and OAuth differ from GPU paths.
Tools workers
auxot-tools handles sandboxed code/search/MCP-style backends: third binary beside router + worker.
OSS vs Connect a GPU worker (worker-cli)
| Piece | Auxot Server + worker-cli | OSS stack |
|---|---|---|
| Router | Auxot Server you already log into | auxot-router binary |
| Worker command | npx @auxot/worker-cli … --router-url …/ws | ./auxot-worker + AUXOT_GPU_KEY |
| Accounts | Org + agents | API keys only |
| Docs anchor | Providers UI | OSS Quickstart |
Pick one story per machine: mixing keys across products wastes weekends.
Troubleshooting
- 401 from router: wrong
rtr_token or missingAuthorization: Bearer. - Jobs hang / auto picks nothing: no healthy worker websocket; check worker logs and firewall egress.
- OOM / slow first boot: model download + VRAM; lower quantization or set
AUXOT_MODELto a smaller GGUF. - Need HA: embedded miniredis is single-node; plan external Redis for horizontal router scaling (Deployment).
Walkthrough
Step 1: Choose Docker or bare binary
Docker (fastest):
docker run -p 8080:8080 ghcr.io/auxothq/auxot-router:latest
Read stdout for generated AUXOT_ADMIN_KEY / AUXOT_API_KEY lines: copy into a secrets file.
Binary:
Follow Quickstart → Option 2: download auxot-router, ./auxot-router setup --write-env, source .env, ./auxot-router.
Step 2: Fetch auxot-worker
Download auxot-worker for your OS/arch from GitHub Releases, chmod +x.
Step 3: Connect GPU worker
export AUXOT_GPU_KEY=adm_xxxxxxxx # from router setup
export AUXOT_ROUTER_URL=127.0.0.1:8080
./auxot-worker
Wait for model cache warmup: logs should show registration with the router.
Step 4: HTTP smoke test
Use rtr_ key on the caller:
curl http://127.0.0.1:8080/api/openai/chat/completions \
-H "Authorization: Bearer rtr_xxxxxxxx" \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Say hi in five words."}]}'
Expect assistant content in choices[0].message.content.
Step 5 (optional): Anthropic Messages shape
Same router also serves POST /api/anthropic/v1/messages: identical knobs to Call the Anthropic-compatible Messages API, but pointed at whichever host/port runs auxot-router.
Step 6: Decide what ships next
If you need agents, Linked Accounts, MCP governance: stop enlarging shell scripts and plan Auxot Server (What is Auxot?). If you need ultra-light routing only: stay OSS and harden keys + Redis.
What’s next
- → Route Cursor and Claude Code through Auxot. Point editors at the same
auxot-routerorigin you just smoke-tested. - → Call the OpenAI-compatible Chat Completions API. Chat Completions dialect on the same host, pairs with Messages-shaped calls.
- → Run the tools worker (auxot-tools). Third binary for tool calls (fetch/search/code/MCP) once inference routing already works.
- → Connect a GPU worker. Commercial Server path with
npx @auxot/worker-cliwhen org agents should own the workload. - → Call the Anthropic-compatible Messages API. Same Messages JSON shape against whichever host exposes
/api/anthropic/v1. - → Self-host Auxot stage by stage. Full Auxot Server bring-up when OSS router simplicity stops matching org needs.
- → Open Source Router. Diagrams, binaries, links into GPU/CLI/tools docs.
Reference
- Binaries:
auxot-router,auxot-worker, andauxot-tools - Docs: OSS overview, Quickstart, GPU Workers, CLI Workers, Tools Workers, Deployment, OSS API notes
- Commercial product: What is Auxot?
- See also: Call the Anthropic-compatible Messages API, Route Cursor and Claude Code through Auxot, Call the OpenAI-compatible Chat Completions API, Self-host Auxot stage by stage, Run the tools worker (auxot-tools), Connect a cloud AI model (cloud fallback mental model on Server), Generate your first API key (API keys on Server vs hashed router keys: different animals)