Read provider routing like an operator
Learn Auxot's GPU → CLI → cloud cascade, heartbeat timeouts, and invisible failover — so latency spikes, surprise cloud bills, and gray worker dots tell a story instead of feeling random.
Plus: three Admin-Agent briefings — narrate a symptom against the cascade, draft an on-call checklist from pasted provider rows, and decide when pinning beats leaving model on auto.
| Audience | Admins · Developers |
|---|---|
| Time | ~8 min |
| Prerequisites | Org visibility into **Settings → Providers** and **System Health** ([Take Auxot's pulse in 10 seconds](/tutorials/take-auxots-pulse)). At least one **cloud** provider connected ([Connect a cloud AI model](/tutorials/connect-a-cloud-ai-model)) **or** intent to connect GPU/CLI soon ([Connect a GPU worker](/tutorials/connect-a-gpu-worker), [Connect a CLI provider (Claude Code, Cursor, Codex)](/tutorials/connect-a-cli-provider)). When jobs already failed once, [Trace a failing job end to end](/tutorials/trace-a-failing-job-end-to-end) shows where receipts live — this lesson explains **why** traffic chose a tier. |
| You'll end up with | A mental model of **auto** routing order, worker heartbeats vs cloud assumptions, and failover — plus three glanceable checks you run before rewriting prompts or filing vendor tickets. |
When a tutorial shows italic text in quotation marks, it usually mirrors a label or helper string inside Auxot. Product copy changes between releases — if something reads differently in your workspace, trust what you see on screen.
Callouts with a Worth knowing gold accent are meant as must-read context before you move on. Blockquotes that open with Tip are lighter, optional depth.
Why this matters
Every completion passes through routing. Auxot doesn’t pick providers at random. It follows a priority cascade when you leave model broad (Providers overview):
- GPU: your silicon, no per-token bill from a hyperscaler.
- CLI: Claude Code-style workers on infra you already pay for.
- Cloud: OpenAI/Anthropic APIs, broad catalogs, metered spend.
Workers register and heartbeat; clouds are treated as usually up until error patterns say otherwise. Mid-request failures retry on the next eligible provider, and callers often see success while Jobs and Providers still record which provider actually answered.
Auxot doesn’t reroute on its own between requests. Tiers change when provider health, model availability, or your explicit pins change.
Quick start
- Open Settings → Providers: scan GPU / CLI / cloud rows for status, last heartbeat, and models advertised.
- Recall the cascade: for
auto(or models that exist on multiple tiers), order is GPU → CLI → cloud. - Check worker freshness: GPU/CLI rows depend on heartbeats (defaults are often ~30s interval, ~90s dead threshold; your deployment may vary).
- Watch cloud as fallback: if workers drop out, traffic falls back to cloud, and bills can jump without anyone changing a dropdown.
- Cross-check a job: open Audit Logs → Jobs (Trace a failing job end to end). The model + provider column shows who actually answered.
- Pick auto vs pin: pin model or provider when you need determinism (Pick the right model for the job). Stay
autowhen you want cost-aware failover.
Done? You can explain why Tuesday’s traffic looked cloud-heavy and whether that’s expected, before blaming the agent’s quality.
The agent can do that?
1. Narrate a symptom against the cascade
Chat → Admin Agent:
Symptom: [latency spike / cloud bill jump / users seeing different model quality hour-to-hour]. Providers we run: [GPU yes/no, CLI yes/no, clouds]. Recent worker maintenance: [describe]. Explain how Auxot's routing cascade could produce this — ranked hypotheses.
Why it’s non-obvious: Same user-visible slowness could be heartbeat dropout and failover, cloud rate limits, or model pins. Paste facts you collected first.
2. Turn pasted provider statuses into an on-call checklist
Here's what I see under Settings → Providers (paste names + statuses + last heartbeat ages). Give me a five-step operator checklist I can reuse weekly — green vs yellow vs red signals only.
Why it’s non-obvious: Reading the dashboard beats guessing. The checklist stays yours to run, and the Admin Agent compresses what you paste because you asked.
3. Auto vs explicit routing trade-off
Agent "[name]" handles [task]. Should default stay `auto` or pin [model/provider]? Describe the impact if GPU goes offline at night vs the compliance reason for pinning to a fixed vendor.
Why it’s non-obvious: auto saves money until workers go in and out of healthy state. Pins give you predictable provider attribution. Paste a description of the workload, and you still make the final call before saving.
Go deeper
Heartbeats: what “offline” means
GPU/CLI workers missing heartbeats leave the pool; routing skips them until healthy registration returns. Cloud paths don’t heartbeat the same way; watch error-rate deprioritization instead (Providers overview).
Invisible failover
A failure mid-flight triggers retry on the next eligible provider. Users may not notice. Jobs and token attribution still show which provider answered (Trace a failing job end to end).
Symptom → likely layer
| What you see | First suspicion |
|---|---|
| Gray GPU dot | Worker stopped / network / heartbeat timeout |
| Sudden cloud spend | Workers offline → cascade exhausted local tiers |
| Flappy quality hour-to-hour | Mixed failover and different model strengths |
| Everything slow | Concurrency limits (configuration mentions concurrency env vars on server installs) |
Walkthrough
Step 1: Inventory tiers
List providers grouped by the type column: GPU boxes, CLI runners, cloud entries. Note which models each exposes; mismatched names never route.
Step 2: Read System Health first
Take Auxot’s pulse in 10 seconds; cards summarize live status. Pair with Providers to see who should have answered auto requests this minute.
Step 3: Drill into a representative Job
Pick any recent job → read model + provider. If cloud served while GPU exists, ask: was GPU offline, wrong model string, or pinned cloud?
Step 4: Simulate worker loss mentally
“If GPU dropped now, where would auto land?” If the answer surprises finance, tighten pins or harden workers (Connect a GPU worker).
Step 5: Document team norms
Write three bullets: default auto vs pinned agents, maintenance windows that accept cloud spillover, who owns provider rotations (Rotate credentials without surprising your agents).
What’s next
- → Take Auxot’s pulse in 10 seconds. Live dashboard habits before deep provider lists overwhelm you.
- → Run a five-minute operator health check. A structured System Health review when you need patterns, not just green dots.
- → Trace a failing job end to end. Row-level records when routing suspicion needs proof.
- → Pick the right model for the job. Deliberate pins once cascade behavior is understood.
- → Connect OAuth for MCP tools. Per-user vendor grants for MCP tool calls (a separate layer from model + provider routing).
- → Providers overview. Authoritative cascade and lifecycle copy.
Reference
- Pages in Auxot: Settings → Providers, System Health, Audit Logs → Jobs
- Manual: Providers overview, Configuration
- See also: Run a five-minute operator health check, Connect OAuth for MCP tools, Connect a cloud AI model, Connect a GPU worker, Connect a CLI provider (Claude Code, Cursor, Codex)