These docs cover Auxot Server — the commercial product with agents, skills, teams, and licensing. Looking for the open-source GPU router? Open Source Router →
What is Auxot?
Auxot is a self-hosted AI router that gives you a single, unified API for managing every kind of AI provider your organization uses — GPU workers running open-weight models, CLI-based agents like Claude Code, and cloud APIs from OpenAI and Anthropic. Instead of scattering API keys across teams, juggling provider-specific SDKs, and losing visibility into usage, you deploy Auxot once and route all AI traffic through it.
Key Concepts
Auxot is built around a small set of composable primitives:
-
Providers — The compute backends that actually run inference. Auxot supports three provider types: GPU workers (GGUF models via llama.cpp), CLI workers (Claude Code), and cloud APIs (OpenAI, Anthropic). Providers are auto-prioritized: requests route to GPU first, then CLI, then cloud — minimizing cost while maximizing availability.
-
Agents — AI agents with distinct personalities, skills, and tool access. Every Auxot installation ships with an admin agent that manages the system itself. You create additional agents for domain-specific tasks: a code reviewer, a support agent, a data analyst.
-
Skills — Reusable behavior instructions that you attach to agents. Skills define how an agent behaves — its rules, workflows, and constraints. They’re scoped to the organization, team, or individual user level and injected into the system prompt at inference time.
-
Tool Worker Policies — Agents can call external tools through connected tools workers. Policies define which built-in tools (web search, code execution) are enabled and which MCP packages to run — including how credentials are injected (static values, org/team/user secrets, or per-user OAuth tokens). Tool schemas are auto-discovered from connected workers and injected into the conversation context.
-
Context Files — Persistent knowledge documents (org policies, team runbooks, product specs) that get injected into the system prompt so agents always have the right background information.
Architecture Overview
Auxot runs as a single Go binary (auxot-server) backed by Postgres for persistent state and Redis for pub/sub and ephemeral caching. Workers — whether GPU or CLI — run separately and connect outbound to the router over HTTPS, so you never need to open inbound ports on your inference machines.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ GPU Worker │ │ CLI Worker │ │ Cloud API │
│ (llama.cpp) │ │ (Claude Code)│ │ (OpenAI) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────┬───────┘────────────────────┘
│
┌──────▼───────┐
│ auxot-server │
│ (Go binary) │
└──────┬───────┘
│
┌────────┴────────┐
│ │
┌────▼────┐ ┌────▼────┐
│ Postgres │ │ Redis │
└─────────┘ └─────────┘
Requests arrive via an OpenAI-compatible or Anthropic-compatible API. The router selects a provider based on availability, priority, and model requirements, then streams the response back to the caller.
Who is Auxot For?
- Engineering teams that want to give every developer access to AI without managing a dozen API keys and providers individually.
- Enterprises that need self-hosted AI inference for compliance, data residency, or air-gapped environments.
- Platform engineers building internal AI tooling who want a stable abstraction over rapidly changing model providers.
- Individual developers who run local GPU inference and want a clean API in front of it, with automatic fallback to cloud when their GPU is busy.