What Are Providers?
A provider in Auxot is any backend capable of running AI inference. Rather than treating local GPUs, CLI coding agents, and cloud APIs as fundamentally different systems, Auxot abstracts them behind a unified provider interface. Every provider registers with the router, reports its available models, and receives inference requests through the same protocol.
There are three provider types:
| Type | Backend | Use Case |
|---|---|---|
| GPU | llama.cpp serving GGUF models | Low-latency, zero-cost-per-token local inference |
| CLI | Claude Code | Coding agents with tool use and agentic workflows |
| Cloud | OpenAI, Anthropic APIs | High availability, broadest model selection |
Auto-Routing Priority
When a request arrives and the caller specifies model: "auto" (or a model available on multiple provider types), Auxot routes using a priority cascade:
-
GPU workers — checked first. If a GPU worker is online, healthy, and serves the requested model (or a compatible one), the request goes there. This is the cheapest option since you own the hardware.
-
CLI workers — checked second. Claude Code offers strong reasoning and tool use. CLI workers are typically provisioned on fixed-cost compute (EC2, ECS), so per-token cost is zero after infrastructure.
-
Cloud APIs — used as the final fallback. Always available (assuming valid API keys), but incur per-token charges.
This priority order minimizes cost automatically. You can override it per-agent or per-request by specifying an explicit model or provider ID.
Provider Lifecycle
Registration
Providers register with the router on startup. GPU and CLI workers call a registration endpoint with their capabilities (supported models, available VRAM, concurrency limits). Cloud providers are configured through the admin agent or API with an API key.
Health Monitoring
Every provider type is monitored:
- GPU/CLI workers send heartbeats at a configurable interval (default: every 30 seconds). If a worker misses heartbeats for longer than the dead threshold (default: 90 seconds), it’s marked offline and removed from the routing pool.
- Cloud APIs are assumed available. Auxot tracks error rates and latency; if a cloud provider returns repeated 5xx errors, it’s temporarily deprioritized.
Failover
If a selected provider fails mid-request (worker crashes, cloud API returns an error), Auxot automatically retries on the next available provider in the priority chain. The caller sees a seamless response — the failover is invisible.
Deregistration
Workers deregister gracefully on shutdown. If a worker disappears without deregistering (crash, network loss), the heartbeat timeout handles cleanup automatically.
Viewing Providers
Go to Settings → Providers to see status, heartbeat timestamps, and routing configuration for all connected providers.
Or ask the admin agent:
Show me all providers and their status.