Run an incident tabletop exercise

Rehearse an agent incident before you have one. Pick a realistic scenario (agent leaked data into the wrong team, a tool fired when it should not have, a prompt-injection attempt got past your defenses) and walk the team through detection, containment, comms, and post-incident review so the real one does not become the first time anyone tried.

Plus: three Admin-Agent passes: generate a scenario tailored to your agent set, script the four-phase walkthrough with realistic timing, and produce the post-exercise fix list ranked by likelihood-of-reuse.

Audience Admins · Developers · Executives
Time ~60–90 min (the exercise itself; ~15 min to draft the brief beforehand)
Prerequisites At least one production agent with real tools and a real audience ([Stress-test an agent before you widen access](/tutorials/stress-test-an-agent-before-you-widen-access)). Comfort opening Audit Logs ([View your audit logs](/tutorials/view-your-audit-logs)). Helpful: adversarial-testing rhythm ([Red-team your agents against prompt injection](/tutorials/red-team-your-agents-against-prompt-injection)), incident-trace muscle ([Trace a failing job end to end](/tutorials/trace-a-failing-job-end-to-end)), post-mortem framing ([Run a post-mortem that leads to action](/tutorials/run-a-post-mortem-that-leads-to-action)).
You'll end up with A written tabletop report with: scenario brief, what the team did at each phase, what the team missed, owner-assigned fix list with dates, and the date of the next rerun, plus an honest answer when a buyer asks *have you rehearsed this*.

When a tutorial shows italic text in quotation marks, it usually mirrors a label or helper string inside Auxot. Product copy changes between releases — if something reads differently in your workspace, trust what you see on screen.

Callouts with a Worth knowing gold accent are meant as must-read context before you move on. Blockquotes that open with Tip are lighter, optional depth.

Why this matters

The first time your team responds to a real agent incident should not be the first time they have ever responded to one. Real incidents are not the moment to figure out who calls the customer, who pulls the audit log, who has the authority to revoke a credential. By the time you are figuring it out, you have already lost an hour and added panic to a situation that needed calm.

A tabletop exercise is a low-stakes rehearsal. Same scenarios you would face in real life, run in a conference room or a video call, no production changes made. Everyone gets the muscle memory of the right move without paying for the wrong one.

Three scenarios cover most of the agent-specific exposure:

  1. Wrong-team data exposure. An agent attached to a multi-team Auxot account answered a question with content from a team the asker should not see. The customer sends a screenshot.
  2. Improper tool fire. A tool the agent has access to (CRM write, outbound email, payment refund) was triggered in a way nobody approved. The downstream system shows the bad call before anyone in Auxot does.
  3. Prompt injection that worked. A pasted instruction got past the agent’s job description. The agent dumped internal context (sales playbook, customer list, internal pricing) to a stranger who then tweeted the screenshot.

None of these are exotic. All three happen to real companies in 2026. Rehearsing them once a quarter is how you stop the rehearsal from being on the customer call.

This tutorial does not replace your incident-response policy, breach-notification clock, or legal counsel. It is the operational rehearsal that makes those policies executable when stress is real.

Nothing rehearses itself: you book the room, you pick the scenario, you assign the roles, you write the fix list.


Quick start

  1. Pick a scenario: one of the three above. Tailor with one specific detail (your real agent name, your real tool wired, your real customer profile).
  2. Assign roles: Incident Lead (runs the response), Comms Lead (drafts customer + internal messages), Technical Lead (opens audit logs, revokes keys), Observer (silent; takes notes for the report).
  3. Run the four phases: Detection (how did you find out?), Containment (what did you stop?), Communication (what did you say to whom?), Post-incident review (what do you fix?).
  4. Time-box it: 60–90 minutes total. The Observer keeps the clock.
  5. Write the report immediately: what worked, what stalled, what fixes have owners and dates. Schedule the next rerun before everyone leaves the room.

Done? A dated written tabletop report your security lead can show to an auditor or a buyer who asks “have you rehearsed an agent incident?”


The agent can do that?

1. Generate a tailored scenario

Chat → Admin Agent:

Generate an incident tabletop scenario tailored to my Auxot setup. Agents in scope: [paste agent names]. Tools wired: [paste, e.g. CRM write, outbound email, etc.]. Customer profile: [business-to-business SaaS / regulated / consumer]. Scenario type: [wrong-team data exposure / improper tool fire / prompt-injection that worked]. Output: a one-page brief with timeline (T+0 minutes: customer screenshot arrives → T+10: ...), one decision point per phase, and what the Incident Lead needs to figure out at each step. Realistic, no theatrics.

Why it’s non-obvious: A generic scenario rehearses generic muscle. Pasting your actual agents and tools forces the brief to touch the systems your team will actually open under stress.

2. Script the four-phase walkthrough with realistic timing

Tabletop brief: [paste]. Script the walkthrough as a facilitator's outline: what the facilitator says at minute 0, what new information drops at minute 15, 30, 45, etc. Include realistic decisions the team has to make (do we notify the customer now or after we know more? who has the authority to revoke the API key?). Keep total time to 60 minutes.

Why it’s non-obvious: Tabletops wander into philosophy without a script. The facilitator’s outline keeps the team in the scenario instead of debating policy. You still facilitate; the script is the prop.

3. Post-exercise fix list ranked by likelihood of reuse

Tabletop notes: [paste Observer's notes]. Produce a fix list ranked by likelihood the same gap shows up in a real incident. For each fix: one-line description, owner placeholder, suggested due date relative to today, and which Auxot page or external system the fix lives in. Mark anything that requires a policy change vs a configuration change vs a training change.

Why it’s non-obvious: The exercise produces a lot of “we should also…” notes. Ranking by reuse-likelihood and tagging by change-type makes the fix list actionable. You still assign real owners and dates.


Go deeper

The three scenarios in more detail

  1. Wrong-team data exposure. Auxot’s multi-team isolation (Set up multi-team isolation) is enforced at the data layer, not the UI. Tabletop tests the human side: a context file got attached to the wrong agent, a team was added to the wrong group, an API key was reused across teams. The real risk is configuration that goes wrong over time, not platform failure.

  2. Improper tool fire. Tool policies (Define a tool policy) constrain what an agent can do, but the human approval gate (Require human approval before risky actions) is what stops a runaway action. Tabletop tests whether the team knows the difference and whether the approval gates are actually wired on the agents that need them.

  3. Prompt injection that worked. Adversarial testing (Red-team your agents against prompt injection) catches most attempts. Tabletop tests what happens when one slips through: does the team detect it from audit logs (View your audit logs), or does the customer have to tell you?

The four phases, expanded

  • Detection. How do you find out an incident happened? Customer report, monitoring alert, audit-log anomaly, downstream system error. The tabletop reveals which channels your team actually checks vs which exist on paper.
  • Containment. What do you stop, and in what order? Revoke the API key? Pause the agent? Disable the tool? Roll back the configuration? Containment decisions made in the wrong order can prevent the investigation that needs to happen next.
  • Communication. Who tells whom, in what order, with what level of detail? Customer, leadership, board, regulators, the public. The tabletop catches the gap between “we have a comms plan” and “the comms plan does not name an owner.” (Ship clear customer communications covers the customer side.)
  • Post-incident review. What do you change? Not the same as a post-mortem on a real incident (Run a post-mortem that leads to action), but the same shape: facts, owners, dates, no scapegoats.

Cadence

  • Quarterly minimum. One scenario per quarter; rotate through the three so each gets rehearsed once a year.
  • On change. Run an extra tabletop the quarter after you deploy a new agent at scale, wire a new tool, or onboard a team that holds sensitive data.
  • After a real incident. The post-mortem is one thing; the next tabletop (Run a post-mortem that leads to action) verifies the fixes from the post-mortem actually changed how the team responds.

Sibling rehearsal: pre-audit walkthrough

The incident tabletop rehearses what the team does when something goes wrong; Run an internal pre-audit drill against your own narrative rehearses what the team says when an auditor or buyer walks through the documented controls. Different scenario, same muscle: surface the gaps in a low-stakes setting before the high-stakes one finds them.

Who attends

The roles below should be filled by different people:

  • Incident Lead: authority to make containment calls. Usually the on-call engineer or security lead.
  • Comms Lead: authority to send customer messages. Marketing, customer success, or founder.
  • Technical Lead: authority to revoke keys, disable agents, roll back configs. Ops or engineering.
  • Observer: silent. Takes notes for the report. Pick someone who will not be tempted to jump in.

Optional: a Facilitator who is not on the response team; runs the scenario and drops new information at planned intervals. For the first few tabletops, an outside facilitator (a security consultant, an experienced peer at another company) keeps the team honest.

Troubleshooting

  • The team treats it as theater. Tabletops fail when nobody believes the scenario. Tailor the brief with one specific detail that is uncomfortably real: your actual top-paying customer, your actual most-used agent, your actual on-call engineer’s name. Discomfort sharpens muscle memory.
  • The Incident Lead solves everything in 5 minutes. Pre-script new information that drops at minute 15 and minute 30: the customer escalates, a regulator emails, a second incident appears. Real incidents do not stop after the first decision; tabletops should not either.
  • The fix list never gets done. Assign owners in the room, before anyone leaves. A fix list without owners and dates is a record of conversation, not a plan.
  • Legal asks you not to run them at all. Tabletops produce written records that could be discoverable in litigation. Discuss with counsel; usually the answer is to label the doc “prepared at the direction of counsel for incident-response improvement” and limit distribution. Do not stop running them.

Variations & edge cases

  • Distributed team: run by video. Add 15 minutes; typing into a shared doc is slower than talking. Use breakout rooms for the Comms Lead and Technical Lead to work in parallel.
  • First tabletop ever: pick the gentlest scenario (wrong-team exposure) and lengthen the time-box to 90 minutes. The first one is mostly about establishing the format.
  • Multi-product company: if Auxot is one product among several, run the tabletop scoped to Auxot first; do not try to rehearse the whole company in one session.
  • Regulated industry: add a regulator role and rehearse the notification clock (24 / 48 / 72 hours depending on jurisdiction). Mistiming a regulatory notice is its own incident.

Walkthrough

Step 1: Pick the scenario and tailor it

Use power move 1 to generate a tailored brief from your actual agent names and tools. Read it once and add one detail that makes the team uncomfortable: name a real customer, name a real agent, name a real Slack channel. Discomfort is the point.

Step 2: Send the brief 24 hours before

Roles assigned, scenario stated, time-box noted. Do not pre-share the new information that drops mid-exercise; surprise is part of the rehearsal.

Step 3: Run the four phases on the clock

The Facilitator (or you, if no separate facilitator) keeps time:

  • Minute 0–15: Detection. Brief is read. Team asks clarifying questions. The Incident Lead names what they would check first. The Observer logs the first 3 actions.
  • Minute 15–30: Containment. New information drops (e.g. “the customer just tweeted the screenshot”). Technical Lead names what they would revoke or disable. Decisions are logged with timestamps.
  • Minute 30–50: Communication. Comms Lead drafts the customer message and the internal update. The team critiques. Real wording matters.
  • Minute 50–60: Post-incident review. Observer reads back what happened. The team names fixes. Owners and dates go on each fix before anyone leaves.

Step 4: Write the report within 48 hours

Use power move 3 to draft from the Observer’s notes. The report needs:

  • Date and participants.
  • Scenario brief.
  • Phase-by-phase summary of decisions made.
  • Fix list with owners and dates.
  • Next tabletop scheduled.

Step 5: Verify fixes 30 days later

Schedule a 15-minute followup on the calendar at fix-list creation time. The Observer (or you) verifies each item moved. Items that did not move become the first agenda item of the next quarterly tabletop.

Step 6: Reference the report when buyers or auditors ask

The dated report is the evidence. Cite it in vendor questionnaires (Answer vendor security questionnaires from your own evidence), in the /trust page (Write the agent section of your customer-facing security page), and in SOC 2 walkthroughs (Build a SOC 2 control mapping for your Auxot deployment).


What’s next

Reference