What is Auxot?

These docs cover Auxot Server — the commercial product with agents, skills, teams, and licensing. Looking for the open-source GPU router? Open Source Router →

Auxot is a self-hosted AI router that gives you a single, unified API for managing every kind of AI provider your organization uses — GPU workers running open-weight models, CLI-based agents like Claude Code, and cloud APIs from OpenAI and Anthropic. Instead of scattering API keys across teams, juggling provider-specific SDKs, and losing visibility into usage, you deploy Auxot once and route all AI traffic through it.

Key Concepts

Auxot is built around a small set of composable primitives:

Providers — The compute backends that actually run inference. Auxot supports three provider types: GPU workers (GGUF models via llama.cpp), CLI workers (Claude Code), and cloud APIs (OpenAI, Anthropic). Providers are auto-prioritized: requests route to GPU first, then CLI, then cloud — minimizing cost while maximizing availability.
Agents — AI agents with distinct personalities, skills, and tool access. Every Auxot installation ships with an admin agent that manages the system itself. You create additional agents for domain-specific tasks: a code reviewer, a support agent, a data analyst.
Skills — Reusable behavior instructions that you attach to agents. Skills define how an agent behaves — its rules, workflows, and constraints. They’re scoped to the organization, team, or individual user level and injected into the system prompt at inference time.
Tool Worker Policies — Agents can call external tools through connected tools workers. Policies define which built-in tools (web search, code execution) are enabled and which MCP packages to run — including how credentials are injected (static values, org/team/user secrets, or per-user OAuth tokens). Tool schemas are auto-discovered from connected workers and injected into the conversation context.
Context Files — Persistent knowledge documents (org policies, team runbooks, product specs) that get injected into the system prompt so agents always have the right background information.

Architecture Overview

Auxot runs as a single Go binary (auxot-server) backed by Postgres for persistent state and Redis for pub/sub and ephemeral caching. Workers — whether GPU or CLI — run separately and connect outbound to the router over HTTPS, so you never need to open inbound ports on your inference machines.

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  GPU Worker  │     │  CLI Worker  │     │  Cloud API   │
│  (llama.cpp) │     │ (Claude Code)│     │  (OpenAI)    │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       └────────────┬───────┘────────────────────┘
                    │
             ┌──────▼───────┐
             │ auxot-server │
             │ (Go binary)  │
             └──────┬───────┘
                    │
          ┌────────┴────────┐
          │                 │
     ┌────▼────┐      ┌────▼────┐
     │ Postgres │      │  Redis  │
     └─────────┘      └─────────┘

Requests arrive via an OpenAI-compatible or Anthropic-compatible API. The router selects a provider based on availability, priority, and model requirements, then streams the response back to the caller.

Who is Auxot For?

Engineering teams that want to give every developer access to AI without managing a dozen API keys and providers individually.
Enterprises that need self-hosted AI inference for compliance, data residency, or air-gapped environments.
Platform engineers building internal AI tooling who want a stable abstraction over rapidly changing model providers.
Individual developers who run local GPU inference and want a clean API in front of it, with automatic fallback to cloud when their GPU is busy.