Providers Overview – Auxot Docs

What Are Providers?

A provider in Auxot is any backend capable of running AI inference. Rather than treating local GPUs, CLI coding agents, and cloud APIs as fundamentally different systems, Auxot abstracts them behind a unified provider interface. Every provider registers with the router, reports its available models, and receives inference requests through the same protocol.

There are three provider types:

Type	Backend	Use Case
GPU	llama.cpp serving GGUF models	Low-latency, zero-cost-per-token local inference
CLI	Claude Code	Coding agents with tool use and agentic workflows
Cloud	OpenAI, Anthropic APIs	High availability, broadest model selection

Auto-Routing Priority

When a request arrives and the caller specifies model: "auto" (or a model available on multiple provider types), Auxot routes using a priority cascade:

GPU workers — checked first. If a GPU worker is online, healthy, and serves the requested model (or a compatible one), the request goes there. This is the cheapest option since you own the hardware.
CLI workers — checked second. Claude Code offers strong reasoning and tool use. CLI workers are typically provisioned on fixed-cost compute (EC2, ECS), so per-token cost is zero after infrastructure.
Cloud APIs — used as the final fallback. Always available (assuming valid API keys), but incur per-token charges.

This priority order minimizes cost automatically. You can override it per-agent or per-request by specifying an explicit model or provider ID.

Provider Lifecycle

Registration

Providers register with the router on startup. GPU and CLI workers call a registration endpoint with their capabilities (supported models, available VRAM, concurrency limits). Cloud providers are configured through the admin agent or API with an API key.

Health Monitoring

Every provider type is monitored:

GPU/CLI workers send heartbeats at a configurable interval (default: every 30 seconds). If a worker misses heartbeats for longer than the dead threshold (default: 90 seconds), it’s marked offline and removed from the routing pool.
Cloud APIs are assumed available. Auxot tracks error rates and latency; if a cloud provider returns repeated 5xx errors, it’s temporarily deprioritized.

Failover

If a selected provider fails mid-request (worker crashes, cloud API returns an error), Auxot automatically retries on the next available provider in the priority chain. The caller sees a seamless response — the failover is invisible.

Deregistration

Workers deregister gracefully on shutdown. If a worker disappears without deregistering (crash, network loss), the heartbeat timeout handles cleanup automatically.

Viewing Providers

Go to Settings → Providers to see status, heartbeat timestamps, and routing configuration for all connected providers.

Or ask the admin agent:

Show me all providers and their status.