Run scheduled canary checks on production agents
Reuse your frozen prompts ([Catch regressions after you change an agent](/tutorials/catch-regressions-after-you-change-an-agent)) on a **calendar**, not only after edits — **cron-fired probe chats** ([Run an agent on a schedule](/tutorials/run-an-agent-on-a-schedule)) plus tight **pass-shaped output** so provider changes, silent instruction tweaks, or tool breakage surface in **Discord** or **Jobs** before customers carry the news.
Plus: three Admin-Agent passes — wrap a regression row into a scheduled prompt with explicit **PASS/FAIL header**, draft weekly diff ritual comparing two Job snippets, and decide when output shifts trigger a full migration retest ([Migrate agents when models or providers change](/tutorials/migrate-agents-when-models-or-providers-change)).
| Audience | Admins · Developers |
|---|---|
| Time | ~10 min |
| Prerequisites | Regression or health prompts already exist ([Catch regressions after you change an agent](/tutorials/catch-regressions-after-you-change-an-agent), [Run health checks on your must-not-fail agents](/tutorials/run-health-checks-on-your-must-not-fail-agents)). Cron wiring ([Run an agent on a schedule](/tutorials/run-an-agent-on-a-schedule)). Comfort skimming **Audit Logs → Jobs** ([View your audit logs](/tutorials/view-your-audit-logs)). |
| You'll end up with | One **scheduled canary task** per tier-one agent — same wording each run, structured footer humans grep fast — plus a **weekly compare habit** recorded beside **Last green** dates. |
When a tutorial shows italic text in quotation marks, it usually mirrors a label or helper string inside Auxot. Product copy changes between releases — if something reads differently in your workspace, trust what you see on screen.
Callouts with a Worth knowing gold accent are meant as must-read context before you move on. Blockquotes that open with Tip are lighter, optional depth.
Why this matters
Regression packs rerun when you change something (Catch regressions after you change an agent). Health checks prove liveness with small prompts (Run health checks on your must-not-fail agents). Production answers still shift without anyone opening Settings:
- Cloud providers adjust models or routing behavior (Migrate agents when models or providers change).
- Context files or upstream tools change answers without a tracked deploy.
- Someone pastes a helpful instruction tweak at midnight: no change ticket.
A canary is the same boring probe on a schedule: one agent, one frozen prompt, machine-scannable output: so the fifth bad week does not start with a customer screenshot. Wire it with Scheduled Tasks (Run an agent on a schedule); receipts land in Jobs (View your audit logs) and optionally Discord channels.
Nothing alarms on its own — you freeze wording, you read diffs, you widen investigations when variance spikes.
Quick start
- Borrow prompts — lift one or two rows from your regression table (Catch regressions after you change an agent): pick deterministic checks (must include phrase, must refuse category).
- Lock schedule — weekly off-peak or daily for tier-zero agents: cron via Scheduled Tasks (Run an agent on a schedule).
- Shape output — append instructions: first line
CANARY_STATUS: PASSorFAIL:+ reason, then normal answer: grep-friendly in Discord logs or pasted Jobs text. - Route signal — Error channel when task throws; Default channel carries PASS transcripts for optional human spot-checks (Connect Discord to your agents).
- Compare ritual — five minutes weekly: open last two successful Jobs; if FAIL or answer shape diverges, escalate: rerun full pack (Catch regressions after you change an agent), check providers (Take Auxot’s pulse in 10 seconds).
Done? Scheduled task named canary-<agent>: next run visible: baseline screenshot or pasted PASS archived once.
The agent can do that?
1. Prompt wrapping
Chat → Admin Agent:
Wrap regression prompt "[paste ID text]" with instructions: respond CANARY_STATUS PASS/FAIL first line; FAIL requires single-sentence reason; keep body under 120 words — markdown block only.
Why it’s non-obvious: Schedules amplify sloppy prompts: wrapper forces observability before #operations floods prose.
2. Diff narration
Job excerpt week A vs week B for same canary prompt pasted — classify: benign wording variance vs policy breach vs tool error — bullets — no fixes yet.
Why it’s non-obvious: Humans overreact to stylistic variance: structured comparison separates noise after you paste snippets.
3. Escalation ladder
Canary FAIL three runs in a row — ordered checklist: provider outage vs instruction change vs data change — cite tutorials where relevant — max six bullets.
Why it’s non-obvious: Panic reroutes to wrong owner: checklist assigns diagnostics you still execute.
Go deeper
Overlap discipline
Canary prompts should stay tiny: large probes waste tokens and numb reviewers: heavy suites stay manual (Catch regressions after you change an agent).
Adversarial caution
Scheduled text can leak if Discord channels are wide: keep PASS transcripts devoid of secrets; stress-test ingress separately (Red-team your agents against prompt injection).
Roster scale
Prioritize agents customers ping (Run a quarterly review of your agents): avoid fifty cron tabs nobody reads.
Workflow alternative
Multi-step probes belong in workflows (Run a workflow): canaries stay single-shot by definition here.
Walkthrough
Step 1: Pick agent + single prompt
Prove cadence before bundling five probes.
Step 2: Author wrapped prompt
Dry-run manually in Chat: confirm CANARY_STATUS line appears.
Step 3: Create scheduled task
Paste prompt: set cron: verify timezone (Run an agent on a schedule).
Step 4: First automated run
Capture Job id: file under baseline/ in repo or wiki.
Step 5: Add compare reminder
Calendar invite: link to Audit Logs saved filter: owns human diff ritual.
What’s next
- → Run an agent on a schedule. Cron surface this lesson depends on.
- → Triage pull requests before humans review them. Same evidence discipline on the merge queue once prod probes earn trust.
- → Automate weekly checkups on your agents. Same clock, different question: roster paperwork and attachments; pair with canaries for tier-one agents.
- → Catch regressions after you change an agent. Frozen prompts: canaries reuse those rows on a clock.
- → Run health checks on your must-not-fail agents. Broader health stance; pair with canaries for tier-zero agents.
- → View your audit logs. Jobs receipts for every fired canary.
- → Migrate agents when models or providers change. When variance traces to catalog changes, not ghosts.
Reference
- Manual: Scheduled Tasks, Discord Integration
- Pages in Auxot: Settings → Agents → Scheduled Tasks, Audit Logs → Jobs
- See also: Triage pull requests before humans review them, Use two-person rules for high-impact actions, Take Auxot’s pulse in 10 seconds, Pick the right model for the job