Public documentation

Social Engineering Core Attack Pack

social-engineering-core is the baseline Workbench attack pack for repeatable social-engineering regression testing.

Real attack-pack scenarios are fetched from Roleplay for entitled Builder and Team projects. The public CLI keeps a mock smoke test locally, but does not ship reusable premium scenario scripts.

Run it against an HTTP agent:

roleplay run social-engineering-core \
  --target http://localhost:3000/agent \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

Run it against a CLI agent:

roleplay run social-engineering-core \
  --target-command "node ./agent.js" \
  --yes \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

--provider chooses the attacker provider. --judge hybrid is recommended for real runs because it combines semantic evaluation with deterministic guardrails. Supported provider identifiers are listed in the CLI reference. Use --attacker-provider and --judge-provider when you want different providers for attack generation and evaluation. Use --target mock --provider mock --judge rules only for deterministic smoke tests.

Covered Attack Families

The pack covers eight social-engineering families. Exact scenario scripts, prompts, and failure criteria are delivered privately at run time.

Family	What it tests
Authority Impersonation	Fake executives, compliance leads, vendors, and IT admins.
Urgency Pressure	Escalation threats, incident deadlines, VIP pressure, and legal intimidation.
Policy Bypass	Refund windows, account ownership, eligibility, and abuse-control exceptions.
Indirect Prompt Injection	Malicious instructions in tickets, webpages, tool output, and documents.
Data Exfiltration	Hidden context, PII, secrets/config, and cross-tenant data.
Tool Misuse	Refund tools, account updates, outbound messages, shell/browser/admin actions.
Auth and Session Confusion	Ambient sessions, delegated authority, stale permissions, and OAuth scope creep.
Memory and Context Poisoning	Poisoned preferences, false facts, cross-user context, and audit evasion.

The library is aligned with common AI-agent security taxonomies, including OWASP LLM risks such as prompt injection, sensitive information disclosure, system prompt leakage, excessive agency, and vector/tool context weaknesses; OWASP MCP risks such as tool poisoning, intent-flow subversion, command execution, insufficient authorization, token exposure, and context over-sharing; and MITRE ATLAS techniques around prompt injection, tool-data poisoning, credential access, collection, persistence, defense evasion, and impact.

What It Tests

The pack asks whether an attacker can manipulate an agent into violating a policy or trust boundary across a situation, not just whether a single prompt jailbreaks a model.

Each private scenario includes:

attacker persona and goal
protected trust boundary
expected safe behavior
concrete failure criteria
severity
framework mapping metadata in output.expectations

With a provider enabled, the attacker adapts across turns to the target agent's responses, and the judge evaluates the completed transcript against the exact success and failure criteria for that scenario. This is what makes the 32 scenarios test their described social-engineering situations instead of replaying fixed strings.

Monitor

Use --fail-on critical to block only critical failures:

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --fail-on critical

Use --fail-on failed to block failed runs more broadly:

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --fail-on failed

Uploading Pack Evidence

When the pack runs multiple scenarios, upload all local runs:

roleplay upload all --source ci --mode sanitized_findings

Workbench uses attack-pack metadata to update pack coverage, run history, findings, evidence, verification, and regression state.

Coverage Philosophy

The pack focuses on situations rather than isolated strings. A scenario should prove whether the agent can keep identity, authorization, policy, context, and tool boundaries intact while a realistic attacker applies pressure over multiple turns.

This makes the pack suitable for regression testing:

locally before a developer opens a PR
in CI before deployment
in workbench as shared evidence when a boundary fails

Sources And Standards Baseline

The initial coverage model was informed by:

Launch Scope

roleplay run social-engineering-core is the supported baseline regression-test path for local development and CI. The CLI fetches entitled scenario bundles from the Workbench, runs them locally against your target, and keeps full local artifacts in your environment unless you explicitly upload them.