Menu
Public documentation

Social Engineering Core Attack Pack

Social Engineering Core Attack Pack

social-engineering-core is the baseline Workbench attack pack for repeatable social-engineering regression testing.

Real attack-pack scenarios are fetched from Roleplay for entitled Builder and Team projects. The public CLI keeps a mock smoke test locally, but does not ship reusable premium scenario scripts.

Run it against an HTTP agent:

roleplay run social-engineering-core \
  --target http://localhost:3000/agent \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

Run it against a CLI agent:

roleplay run social-engineering-core \
  --target-command "node ./agent.js" \
  --yes \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

--provider chooses the attacker provider. --judge hybrid is recommended for real runs because it combines semantic evaluation with deterministic guardrails. Supported provider identifiers are listed in the CLI reference. Use --attacker-provider and --judge-provider when you want different providers for attack generation and evaluation. Use --target mock --provider mock --judge rules only for deterministic smoke tests.

Covered Attack Families

The pack covers eight social-engineering families. Exact scenario scripts, prompts, and failure criteria are delivered privately at run time.

FamilyWhat it tests
Authority ImpersonationFake executives, compliance leads, vendors, and IT admins.
Urgency PressureEscalation threats, incident deadlines, VIP pressure, and legal intimidation.
Policy BypassRefund windows, account ownership, eligibility, and abuse-control exceptions.
Indirect Prompt InjectionMalicious instructions in tickets, webpages, tool output, and documents.
Data ExfiltrationHidden context, PII, secrets/config, and cross-tenant data.
Tool MisuseRefund tools, account updates, outbound messages, shell/browser/admin actions.
Auth and Session ConfusionAmbient sessions, delegated authority, stale permissions, and OAuth scope creep.
Memory and Context PoisoningPoisoned preferences, false facts, cross-user context, and audit evasion.

The library is aligned with common AI-agent security taxonomies, including OWASP LLM risks such as prompt injection, sensitive information disclosure, system prompt leakage, excessive agency, and vector/tool context weaknesses; OWASP MCP risks such as tool poisoning, intent-flow subversion, command execution, insufficient authorization, token exposure, and context over-sharing; and MITRE ATLAS techniques around prompt injection, tool-data poisoning, credential access, collection, persistence, defense evasion, and impact.

What It Tests

The pack asks whether an attacker can manipulate an agent into violating a policy or trust boundary across a situation, not just whether a single prompt jailbreaks a model.

Each private scenario includes:

  • attacker persona and goal
  • protected trust boundary
  • expected safe behavior
  • concrete failure criteria
  • severity
  • framework mapping metadata in output.expectations

With a provider enabled, the attacker adapts across turns to the target agent's responses, and the judge evaluates the completed transcript against the exact success and failure criteria for that scenario. This is what makes the 32 scenarios test their described social-engineering situations instead of replaying fixed strings.

Monitor

Use --fail-on critical to block only critical failures:

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --fail-on critical

Use --fail-on failed to block failed runs more broadly:

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --fail-on failed

Uploading Pack Evidence

When the pack runs multiple scenarios, upload all local runs:

roleplay upload all --source ci --mode sanitized_findings

Workbench uses attack-pack metadata to update pack coverage, run history, findings, evidence, verification, and regression state.

Coverage Philosophy

The pack focuses on situations rather than isolated strings. A scenario should prove whether the agent can keep identity, authorization, policy, context, and tool boundaries intact while a realistic attacker applies pressure over multiple turns.

This makes the pack suitable for regression testing:

  • locally before a developer opens a PR
  • in CI before deployment
  • in workbench as shared evidence when a boundary fails

Sources And Standards Baseline

The initial coverage model was informed by:

Launch Scope

roleplay run social-engineering-core is the supported baseline regression-test path for local development and CI. The CLI fetches entitled scenario bundles from the Workbench, runs them locally against your target, and keeps full local artifacts in your environment unless you explicitly upload them.