7 min read

AI Agent Social-Engineering Checklist Before Launch

A pre-launch checklist for reviewing AI agents that interact with people, tools, data, documents, or web environments.

In brief

Before launching an AI agent, review its external actors, sensitive data, tools, protected boundaries, evidence requirements, fix process, and regression checks.

Contents

How to use the checklist

A pre-launch social-engineering checklist helps teams find the agent's most important delegated-authority risks before users do. The checklist should focus on what the agent can read, reveal, change, remember, approve, send, or delegate.

The goal is not to make launch impossible. The goal is to identify the boundaries that need evidence before launch and the failures that should become recurring checks after launch.

1. Inventory the agent's authority

List what the agent is allowed to do. Include data access, tools, memory, web access, messages, handoffs, and escalation paths. If the agent can affect customers, candidates, prospects, employees, accounts, payments, or external systems, treat that as meaningful authority.

Then list what it is not allowed to do. This is where protected boundaries begin. The gap between allowed and prohibited behavior is where social-engineering tests should focus.

What private context can the agent read?
What tools can the agent call?
What actions can affect users or systems?
What sources can influence memory or future decisions?
What decisions require confirmation, approval, or escalation?

2. Identify external actors and untrusted content

Name who or what can influence the agent. That may include customers, prospects, candidates, employees, vendors, webpages, resumes, tickets, documents, emails, CRM notes, retrieved snippets, or other agents.

For each source, decide whether it can instruct the agent or only provide data. Many failures happen when untrusted data is treated as trusted instruction.

3. Write protected boundaries

Convert policies into testable boundaries. A good boundary says what action or disclosure is prohibited until a condition is satisfied. It should be clear enough that a reviewer can decide whether the agent held or failed.

Start with identity, authorization, data scope, tool preconditions, source trust, memory integrity, and delegation boundaries. Prioritize boundaries with high business impact.

4. Run realistic pressure scenarios

Create scenarios that match the agent's real workflow. Support agents should face ownership claims and refund pressure. Sales agents should face procurement urgency and fake approval. Recruiting agents should face untrusted candidate content. Browser agents should face deceptive page context.

Avoid relying only on obvious hostile strings. Many useful tests should look like ordinary workflow pressure because that is what the agent will see in production.

5. Preserve evidence and verify fixes

For each failure, preserve the attacker move, failed response or action, tool trace if relevant, violated invariant, severity, and reproduction context. Then assign the fix to the team that controls the boundary.

After the fix, rerun the same scenario or regression key. A fix that is not verified is only a claim. A severe verified fix should usually become a recurring check.

6. Decide what blocks launch

Not every issue should block launch, but severe boundary failures should be taken seriously. A launch blocker might involve sensitive data disclosure, unauthorized state change, unsafe external message, payment or pricing authority, candidate or employee data, or a repeated failure across variants.

Document the decision. If a risk is accepted, record why. If a risk is fixed, record the verification. If a risk needs monitoring, record the regression check.

Run the launch review meeting

The checklist is most useful when it produces decisions. A launch review should bring the owner of the agent, the person responsible for the workflow, and the person responsible for security or risk review. The group should review the top boundaries and the evidence for each high-risk scenario.

Keep the meeting concrete. For each boundary, ask whether it was tested, whether any failure remains open, whether the fix was verified, and whether a recurring check exists for severe failures. If the answer is unknown, the decision should be recorded as unknown rather than assumed safe.

The output should be a short launch-risk record: what was tested, what failed, what was fixed, what is accepted, and what will be monitored after launch.

Post-launch follow-up

The checklist should not disappear after launch. Agents continue to change through prompt edits, model updates, tool changes, policy changes, and new user behavior. The first post-launch review should look for gaps between test scenarios and real interactions.

If a new pattern appears in production feedback or support review, convert it into a scenario. If a verified fix protects an important boundary, consider adding it to recurring regression checks. If the agent gains a new tool or data source, rerun the relevant boundary checks.

This keeps the checklist from being a one-time approval artifact. It becomes a lightweight way to maintain confidence as the agent evolves.

Evidence required for launch confidence

A checklist item should not be marked complete only because someone discussed it. For important boundaries, the launch record should include evidence: the scenario, pressure pattern, expected safe behavior, actual behavior, and reviewer decision.

Evidence does not need to be heavy for every low-risk item. But for high-impact boundaries, a launch decision without evidence is mostly trust. If the agent can access sensitive data or call meaningful tools, the team should preserve proof that the boundary was tested.

The evidence should also explain unresolved risk. If a boundary was not tested before launch, record why and decide when it will be tested. Unknown risk should be visible rather than silently accepted.

Checklist outcomes

Each checklist item should end in one of a few clear outcomes. Passed means the boundary held in the tested scenario. Needs fix means the boundary failed and launch should wait or the scope should change. Accepted risk means the team knowingly accepts the current state. Needs monitoring means the boundary held or was fixed but should be checked again.

These outcomes are more useful than vague labels like reviewed or discussed. They tell the team what to do next and make the launch decision easier to audit later.

When the outcome is accepted risk, include the reason. The reason may be low impact, limited exposure, temporary mitigation, or a planned follow-up. Without that note, accepted risk can become a hiding place for unfinished work.

Common pre-launch mistakes

One mistake is testing the agent only through friendly happy paths. Another is testing only the final answer while ignoring tool calls, memory writes, browser actions, and handoffs.

A third mistake is relying on prompt review as proof. A strong prompt is useful, but it does not prove the agent preserves boundaries under realistic pressure. The checklist should include behavior tests, not only configuration review.

A fourth mistake is launching without a regression plan. If a severe failure was found and fixed before launch, the team should decide when that failure will be checked again.

Minimum evidence before launch

A team does not need perfect coverage before every launch, but it should know what evidence supports the decision. For each high-impact boundary, preserve at least one scenario result that shows the boundary holding or failing under realistic pressure.

The evidence should be specific enough for another reviewer to understand the decision later. Include the pressure pattern, the protected boundary, the agent behavior, any tool or browser action, and the reason the result was accepted or rejected.

If the agent has not been tested against a boundary, do not imply that the boundary is safe. Mark it as untested, explain why, and decide whether launch should wait, scope should be reduced, or monitoring should be added after launch.

What to revisit after launch

Post-launch review should focus on the differences between test conditions and real conditions. Real users may ask in different language, bring new documents, use different channels, or combine requests in ways the original scenarios did not cover.

Revisit the checklist when the agent receives new tools, accesses new data, serves a new user group, changes model, changes prompt, adds memory, or starts operating in a browser or multi-agent workflow. Each change can create a new path for manipulated delegation.

The checklist becomes more valuable over time when every incident, near miss, or verified fix improves the next version of the test set.

FAQ

How early should teams run this checklist?

Run it before launch and again before major changes. It is easier to define boundaries before workflows, tools, and prompts become difficult to change.

What should block launch?

Severe failures involving sensitive data, unauthorized actions, money, external commitments, employee or candidate data, or repeated boundary failure should be treated as launch blockers unless there is a documented mitigation.

How many scenarios are enough before launch?

Start with a small set of high-impact scenarios around the agent's top boundaries. Depth on meaningful risks is more useful than a large list of low-impact prompts.

What happens after a failed checklist item?

Turn the failure into evidence, assign the boundary fix, rerun the scenario, and decide whether the verified fix should become a regression check.

Should the checklist include prompt injection?

Yes, when untrusted text can influence the agent. The checklist should also include broader authority, identity, tool, memory, and workflow manipulation.

Deeper research

Read the June 2026 report.

For a deeper treatment of manipulated delegation and AI agent social-engineering risk, read Roleplay's June 2026 research report.

Read the report ->

Keep reading

GuideAI Agent Security Testing Maturity ModelRead ->GuideHow To Verify An AI Agent Security FixRead ->GuideProtected Boundaries For AI AgentsRead ->