Architecture Overview

How Auxot Works

A unified AI platform that routes requests across your GPUs, CLI tools, and cloud APIs — governed by policy, managed by agents, extended with skills.

Platform Architecture

One control plane. Any mix of providers — GPUs, CLI workers, cloud APIs. Every request governed.

CALLERS Chat Web UI Coding Agents Claude Code Internal APIs REST, Webhooks Custom Agents Domain-specific agents AUXOT SERVER Admin Agent Setup & management via conversation Unified Provider Router GPU → CLI → Cloud priority routing Skills & MCP Tools Reusable behaviors, external integrations Policy & Governance Team roles, model approvals, rate limits Audit & Observability Usage tracking, cost allocation, logs PROVIDERS GPU Workers On-prem, cloud, edge ● PRIORITY 1 CLI Workers Claude Code ● PRIORITY 2 Cloud APIs OpenAI, Anthropic ● PRIORITY 3 HTTPS WSS CLI HTTPS Requests route through the priority cascade: GPU → CLI → Cloud
Incoming Request Auth + Policy checked Unified Provider Router GPU Workers Lowest latency Priority 1 CLI Workers Agentic tools Priority 2 Cloud APIs Elastic fallback Priority 3 Mixed-Mode Support Run GPU inference for speed, CLI for agents, cloud for overflow — simultaneously

Unified Provider Routing

Auxot routes every request through a priority cascade — trying your fastest, cheapest providers first and falling back automatically.

  • GPU first — On-prem and cloud GPUs get priority for lowest latency and cost
  • CLI fallback — Claude Code workers handle agentic tool-use workloads
  • Cloud overflow — OpenAI and Anthropic APIs provide elastic capacity when local providers are saturated
  • Mixed mode — Run all three provider types simultaneously for different workloads
  • Automatic failover — If a provider goes down, traffic reroutes instantly

Admin Agent Setup

Configure your entire platform through a guided conversation with the Admin Agent — no YAML, no dashboards.

1

Connect Providers

"Add our 4×A100 server at gpu-rack.internal and set up Anthropic as our cloud fallback."

2

Define Teams & Policies

"Create an Engineering team with interactive priority and 50 req/min. Data Science gets batch priority."

3

Configure Routing

"Route coding requests to GPU first, then Claude Code CLI. Use Anthropic API only when GPUs are full."

4

Deploy Agents

"Create a code-review agent with the review skill and connect it to our GitHub MCP server."

5

Go Live

Admin Agent validates the configuration, generates API keys, and hands the system to your teams.

Agents, Skills & MCP

Auxot's agent architecture separates personality from capability. Build domain-specific agents by composing reusable skills and connecting external tools.

  • Agent = personality + skills + tools — Each Agent is a named agent with a defined role, a set of skills, and MCP tool connections
  • Skills = reusable behavior — Skills are composable instruction sets that can be shared across Agents. Write once, attach to many agents.
  • MCP = external tools — Model Context Protocol connects Agents to GitHub, Jira, Slack, databases, or any system with an MCP server
  • Admin Agent — The built-in Agent that manages your platform through conversation
  • Custom Agents — Build domain-specific agents for code review, ops triage, data analysis, or any workflow your team needs
AGENT: CODE REVIEWER Personality Thorough, security-aware Model Pref Claude 4, GPU-first Routes through Unified Provider Router ATTACHED SKILLS Code Review PR analysis logic Security Scan Vuln detection Style Guide Team conventions MCP TOOL CONNECTIONS GitHub PRs, issues, files Slack Notifications Jira Ticket tracking
Acme Corp Engineering Priority: Interactive GPU + CLI + Cloud Limit: 50 req/min Data Science Priority: Normal GPU + Cloud only Limit: 20 req/min QA & Automation Priority: Batch Cloud fallback only Limit: 10 req/min Policy Engine — enforces all rules at request time Provider Pool — GPU, CLI, Cloud capacity allocated by policy

Team Boundaries & Governance

AI access is defined by leadership — not by whoever grabs a key first. Teams, roles, and service classes determine who can use AI, which providers they access, and how much capacity they consume.

  • Organization & team structure — Mirror your org chart in Auxot
  • Role-based access — Admins, members, and service accounts with distinct permissions
  • Provider policies — Control which teams can use GPU, CLI, or cloud providers
  • Usage limits — Per-team rate limits and concurrency caps enforced at runtime
  • Audit trail — Every request logged with team, user, provider, duration, and token usage

Standard API Access

Auxot exposes OpenAI-compatible and Anthropic-compatible endpoints. Any tool that speaks these protocols connects without code changes.

  • OpenAI-compatible/api/openai/v1/chat/completions
  • Anthropic-compatible/api/anthropic/v1/messages
  • Streaming — Server-Sent Events matching upstream API format exactly
  • Tool calls — Full function calling for agentic workflows

Your developers point Cursor, Claude Code, or any custom integration at Auxot. They use their familiar tools. Leadership controls what runs underneath.

AUXOT API GATEWAY Policy enforced at entry Cursor Claude Code Custom Apps OpenAI Protocol Bearer token auth Anthropic Protocol x-api-key header

Your providers. Your agents. Your rules.

Define how AI runs inside your organization.

Built for organizations where AI governance is a leadership responsibility.