Read provider routing like an operator

Learn Auxot's GPU → CLI → cloud cascade, heartbeat timeouts, and invisible failover — so latency spikes, surprise cloud bills, and gray worker dots tell a story instead of feeling random.

Plus: three Admin-Agent briefings — narrate a symptom against the cascade, draft an on-call checklist from pasted provider rows, and decide when pinning beats leaving model on auto.

Audience Admins · Developers
Time ~8 min
Prerequisites Org visibility into **Settings → Providers** and **System Health** ([Take Auxot's pulse in 10 seconds](/tutorials/take-auxots-pulse)). At least one **cloud** provider connected ([Connect a cloud AI model](/tutorials/connect-a-cloud-ai-model)) **or** intent to connect GPU/CLI soon ([Connect a GPU worker](/tutorials/connect-a-gpu-worker), [Connect a CLI provider (Claude Code, Cursor, Codex)](/tutorials/connect-a-cli-provider)). When jobs already failed once, [Trace a failing job end to end](/tutorials/trace-a-failing-job-end-to-end) shows where receipts live — this lesson explains **why** traffic chose a tier.
You'll end up with A mental model of **auto** routing order, worker heartbeats vs cloud assumptions, and failover — plus three glanceable checks you run before rewriting prompts or filing vendor tickets.

When a tutorial shows italic text in quotation marks, it usually mirrors a label or helper string inside Auxot. Product copy changes between releases — if something reads differently in your workspace, trust what you see on screen.

Callouts with a Worth knowing gold accent are meant as must-read context before you move on. Blockquotes that open with Tip are lighter, optional depth.

Why this matters

Every completion passes through routing. Auxot doesn’t pick providers at random. It follows a priority cascade when you leave model broad (Providers overview):

  1. GPU: your silicon, no per-token bill from a hyperscaler.
  2. CLI: Claude Code-style workers on infra you already pay for.
  3. Cloud: OpenAI/Anthropic APIs, broad catalogs, metered spend.

Workers register and heartbeat; clouds are treated as usually up until error patterns say otherwise. Mid-request failures retry on the next eligible provider, and callers often see success while Jobs and Providers still record which provider actually answered.

Auxot doesn’t reroute on its own between requests. Tiers change when provider health, model availability, or your explicit pins change.


Quick start

  1. Open Settings → Providers: scan GPU / CLI / cloud rows for status, last heartbeat, and models advertised.
  2. Recall the cascade: for auto (or models that exist on multiple tiers), order is GPU → CLI → cloud.
  3. Check worker freshness: GPU/CLI rows depend on heartbeats (defaults are often ~30s interval, ~90s dead threshold; your deployment may vary).
  4. Watch cloud as fallback: if workers drop out, traffic falls back to cloud, and bills can jump without anyone changing a dropdown.
  5. Cross-check a job: open Audit Logs → Jobs (Trace a failing job end to end). The model + provider column shows who actually answered.
  6. Pick auto vs pin: pin model or provider when you need determinism (Pick the right model for the job). Stay auto when you want cost-aware failover.

Done? You can explain why Tuesday’s traffic looked cloud-heavy and whether that’s expected, before blaming the agent’s quality.


The agent can do that?

1. Narrate a symptom against the cascade

Chat → Admin Agent:

Symptom: [latency spike / cloud bill jump / users seeing different model quality hour-to-hour]. Providers we run: [GPU yes/no, CLI yes/no, clouds]. Recent worker maintenance: [describe]. Explain how Auxot's routing cascade could produce this — ranked hypotheses.

Why it’s non-obvious: Same user-visible slowness could be heartbeat dropout and failover, cloud rate limits, or model pins. Paste facts you collected first.

2. Turn pasted provider statuses into an on-call checklist

Here's what I see under Settings → Providers (paste names + statuses + last heartbeat ages). Give me a five-step operator checklist I can reuse weekly — green vs yellow vs red signals only.

Why it’s non-obvious: Reading the dashboard beats guessing. The checklist stays yours to run, and the Admin Agent compresses what you paste because you asked.

3. Auto vs explicit routing trade-off

Agent "[name]" handles [task]. Should default stay `auto` or pin [model/provider]? Describe the impact if GPU goes offline at night vs the compliance reason for pinning to a fixed vendor.

Why it’s non-obvious: auto saves money until workers go in and out of healthy state. Pins give you predictable provider attribution. Paste a description of the workload, and you still make the final call before saving.


Go deeper

Heartbeats: what “offline” means

GPU/CLI workers missing heartbeats leave the pool; routing skips them until healthy registration returns. Cloud paths don’t heartbeat the same way; watch error-rate deprioritization instead (Providers overview).

Invisible failover

A failure mid-flight triggers retry on the next eligible provider. Users may not notice. Jobs and token attribution still show which provider answered (Trace a failing job end to end).

Symptom → likely layer
What you seeFirst suspicion
Gray GPU dotWorker stopped / network / heartbeat timeout
Sudden cloud spendWorkers offline → cascade exhausted local tiers
Flappy quality hour-to-hourMixed failover and different model strengths
Everything slowConcurrency limits (configuration mentions concurrency env vars on server installs)

Walkthrough

Step 1: Inventory tiers

List providers grouped by the type column: GPU boxes, CLI runners, cloud entries. Note which models each exposes; mismatched names never route.

Step 2: Read System Health first

Take Auxot’s pulse in 10 seconds; cards summarize live status. Pair with Providers to see who should have answered auto requests this minute.

Step 3: Drill into a representative Job

Pick any recent job → read model + provider. If cloud served while GPU exists, ask: was GPU offline, wrong model string, or pinned cloud?

Step 4: Simulate worker loss mentally

“If GPU dropped now, where would auto land?” If the answer surprises finance, tighten pins or harden workers (Connect a GPU worker).

Step 5: Document team norms

Write three bullets: default auto vs pinned agents, maintenance windows that accept cloud spillover, who owns provider rotations (Rotate credentials without surprising your agents).


What’s next

Reference