Run scheduled canary checks on production agents

Reuse your frozen prompts ([Catch regressions after you change an agent](/tutorials/catch-regressions-after-you-change-an-agent)) on a **calendar**, not only after edits — **cron-fired probe chats** ([Run an agent on a schedule](/tutorials/run-an-agent-on-a-schedule)) plus tight **pass-shaped output** so provider changes, silent instruction tweaks, or tool breakage surface in **Discord** or **Jobs** before customers carry the news.

Plus: three Admin-Agent passes — wrap a regression row into a scheduled prompt with explicit **PASS/FAIL header**, draft weekly diff ritual comparing two Job snippets, and decide when output shifts trigger a full migration retest ([Migrate agents when models or providers change](/tutorials/migrate-agents-when-models-or-providers-change)).

Audience Admins · Developers
Time ~10 min
Prerequisites Regression or health prompts already exist ([Catch regressions after you change an agent](/tutorials/catch-regressions-after-you-change-an-agent), [Run health checks on your must-not-fail agents](/tutorials/run-health-checks-on-your-must-not-fail-agents)). Cron wiring ([Run an agent on a schedule](/tutorials/run-an-agent-on-a-schedule)). Comfort skimming **Audit Logs → Jobs** ([View your audit logs](/tutorials/view-your-audit-logs)).
You'll end up with One **scheduled canary task** per tier-one agent — same wording each run, structured footer humans grep fast — plus a **weekly compare habit** recorded beside **Last green** dates.

When a tutorial shows italic text in quotation marks, it usually mirrors a label or helper string inside Auxot. Product copy changes between releases — if something reads differently in your workspace, trust what you see on screen.

Callouts with a Worth knowing gold accent are meant as must-read context before you move on. Blockquotes that open with Tip are lighter, optional depth.

Why this matters

Regression packs rerun when you change something (Catch regressions after you change an agent). Health checks prove liveness with small prompts (Run health checks on your must-not-fail agents). Production answers still shift without anyone opening Settings:

  • Cloud providers adjust models or routing behavior (Migrate agents when models or providers change).
  • Context files or upstream tools change answers without a tracked deploy.
  • Someone pastes a helpful instruction tweak at midnight: no change ticket.

A canary is the same boring probe on a schedule: one agent, one frozen prompt, machine-scannable output: so the fifth bad week does not start with a customer screenshot. Wire it with Scheduled Tasks (Run an agent on a schedule); receipts land in Jobs (View your audit logs) and optionally Discord channels.

Nothing alarms on its own — you freeze wording, you read diffs, you widen investigations when variance spikes.


Quick start

  1. Borrow prompts — lift one or two rows from your regression table (Catch regressions after you change an agent): pick deterministic checks (must include phrase, must refuse category).
  2. Lock schedule — weekly off-peak or daily for tier-zero agents: cron via Scheduled Tasks (Run an agent on a schedule).
  3. Shape output — append instructions: first line CANARY_STATUS: PASS or FAIL: + reason, then normal answer: grep-friendly in Discord logs or pasted Jobs text.
  4. Route signalError channel when task throws; Default channel carries PASS transcripts for optional human spot-checks (Connect Discord to your agents).
  5. Compare ritual — five minutes weekly: open last two successful Jobs; if FAIL or answer shape diverges, escalate: rerun full pack (Catch regressions after you change an agent), check providers (Take Auxot’s pulse in 10 seconds).

Done? Scheduled task named canary-<agent>: next run visible: baseline screenshot or pasted PASS archived once.


The agent can do that?

1. Prompt wrapping

Chat → Admin Agent:

Wrap regression prompt "[paste ID text]" with instructions: respond CANARY_STATUS PASS/FAIL first line; FAIL requires single-sentence reason; keep body under 120 words — markdown block only.

Why it’s non-obvious: Schedules amplify sloppy prompts: wrapper forces observability before #operations floods prose.

2. Diff narration

Job excerpt week A vs week B for same canary prompt pasted — classify: benign wording variance vs policy breach vs tool error — bullets — no fixes yet.

Why it’s non-obvious: Humans overreact to stylistic variance: structured comparison separates noise after you paste snippets.

3. Escalation ladder

Canary FAIL three runs in a row — ordered checklist: provider outage vs instruction change vs data change — cite tutorials where relevant — max six bullets.

Why it’s non-obvious: Panic reroutes to wrong owner: checklist assigns diagnostics you still execute.


Go deeper

Overlap discipline

Canary prompts should stay tiny: large probes waste tokens and numb reviewers: heavy suites stay manual (Catch regressions after you change an agent).

Adversarial caution

Scheduled text can leak if Discord channels are wide: keep PASS transcripts devoid of secrets; stress-test ingress separately (Red-team your agents against prompt injection).

Roster scale

Prioritize agents customers ping (Run a quarterly review of your agents): avoid fifty cron tabs nobody reads.

Workflow alternative

Multi-step probes belong in workflows (Run a workflow): canaries stay single-shot by definition here.


Walkthrough

Step 1: Pick agent + single prompt

Prove cadence before bundling five probes.

Step 2: Author wrapped prompt

Dry-run manually in Chat: confirm CANARY_STATUS line appears.

Step 3: Create scheduled task

Paste prompt: set cron: verify timezone (Run an agent on a schedule).

Step 4: First automated run

Capture Job id: file under baseline/ in repo or wiki.

Step 5: Add compare reminder

Calendar invite: link to Audit Logs saved filter: owns human diff ritual.


What’s next

Reference