These docs cover Auxot Server — the commercial product with agents, skills, teams, and licensing. Looking for the open-source GPU router? Open Source Router →

What is Auxot?

Auxot is a self-hosted AI router that gives you a single, unified API for managing every kind of AI provider your organization uses — GPU workers running open-weight models, CLI-based agents like Claude Code, and cloud APIs from OpenAI and Anthropic. Instead of scattering API keys across teams, juggling provider-specific SDKs, and losing visibility into usage, you deploy Auxot once and route all AI traffic through it.

Key Concepts

Auxot is built around a small set of composable primitives:

  • Providers — The compute backends that actually run inference. Auxot supports three provider types: GPU workers (GGUF models via llama.cpp), CLI workers (Claude Code), and cloud APIs (OpenAI, Anthropic). Providers are auto-prioritized: requests route to GPU first, then CLI, then cloud — minimizing cost while maximizing availability.

  • Agents — AI agents with distinct personalities, skills, and tool access. Every Auxot installation ships with an admin agent that manages the system itself. You create additional agents for domain-specific tasks: a code reviewer, a support agent, a data analyst.

  • Skills — Reusable behavior instructions that you attach to agents. Skills define how an agent behaves — its rules, workflows, and constraints. They’re scoped to the organization, team, or individual user level and injected into the system prompt at inference time.

  • Tool Worker Policies — Agents can call external tools through connected tools workers. Policies define which built-in tools (web search, code execution) are enabled and which MCP packages to run — including how credentials are injected (static values, org/team/user secrets, or per-user OAuth tokens). Tool schemas are auto-discovered from connected workers and injected into the conversation context.

  • Context Files — Persistent knowledge documents (org policies, team runbooks, product specs) that get injected into the system prompt so agents always have the right background information.

Architecture Overview

Auxot runs as a single Go binary (auxot-server) backed by Postgres for persistent state and Redis for pub/sub and ephemeral caching. Workers — whether GPU or CLI — run separately and connect outbound to the router over HTTPS, so you never need to open inbound ports on your inference machines.

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  GPU Worker  │     │  CLI Worker  │     │  Cloud API   │
│  (llama.cpp) │     │ (Claude Code)│     │  (OpenAI)    │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       └────────────┬───────┘────────────────────┘

             ┌──────▼───────┐
             │ auxot-server │
             │ (Go binary)  │
             └──────┬───────┘

          ┌────────┴────────┐
          │                 │
     ┌────▼────┐      ┌────▼────┐
     │ Postgres │      │  Redis  │
     └─────────┘      └─────────┘

Requests arrive via an OpenAI-compatible or Anthropic-compatible API. The router selects a provider based on availability, priority, and model requirements, then streams the response back to the caller.

Who is Auxot For?

  • Engineering teams that want to give every developer access to AI without managing a dozen API keys and providers individually.
  • Enterprises that need self-hosted AI inference for compliance, data residency, or air-gapped environments.
  • Platform engineers building internal AI tooling who want a stable abstraction over rapidly changing model providers.
  • Individual developers who run local GPU inference and want a clean API in front of it, with automatic fallback to cloud when their GPU is busy.