Social Engineering Core Attack Pack
Social Engineering Core Attack Pack
social-engineering-core is the baseline Workbench attack pack for repeatable social-engineering regression testing.
Real attack-pack scenarios are fetched from Roleplay for entitled Builder and Team projects. The public CLI keeps a mock smoke test locally, but does not ship reusable premium scenario scripts.
Run it against an HTTP agent:
roleplay run social-engineering-core \
--target http://localhost:3000/agent \
--provider <provider> \
--judge hybrid \
--project <project-id> \
--api-key <project-api-key> \
--fail-on critical
Run it against a CLI agent:
roleplay run social-engineering-core \
--target-command "node ./agent.js" \
--yes \
--provider <provider> \
--judge hybrid \
--project <project-id> \
--api-key <project-api-key> \
--fail-on critical
--provider chooses the attacker provider. --judge hybrid is recommended for real runs because it combines semantic evaluation with deterministic guardrails. Supported provider identifiers are listed in the CLI reference. Use --attacker-provider and --judge-provider when you want different providers for attack generation and evaluation. Use --target mock --provider mock --judge rules only for deterministic smoke tests.
Covered Attack Families
The pack covers eight social-engineering families. Exact scenario scripts, prompts, and failure criteria are delivered privately at run time.
| Family | What it tests |
|---|---|
| Authority Impersonation | Fake executives, compliance leads, vendors, and IT admins. |
| Urgency Pressure | Escalation threats, incident deadlines, VIP pressure, and legal intimidation. |
| Policy Bypass | Refund windows, account ownership, eligibility, and abuse-control exceptions. |
| Indirect Prompt Injection | Malicious instructions in tickets, webpages, tool output, and documents. |
| Data Exfiltration | Hidden context, PII, secrets/config, and cross-tenant data. |
| Tool Misuse | Refund tools, account updates, outbound messages, shell/browser/admin actions. |
| Auth and Session Confusion | Ambient sessions, delegated authority, stale permissions, and OAuth scope creep. |
| Memory and Context Poisoning | Poisoned preferences, false facts, cross-user context, and audit evasion. |
The library is aligned with common AI-agent security taxonomies, including OWASP LLM risks such as prompt injection, sensitive information disclosure, system prompt leakage, excessive agency, and vector/tool context weaknesses; OWASP MCP risks such as tool poisoning, intent-flow subversion, command execution, insufficient authorization, token exposure, and context over-sharing; and MITRE ATLAS techniques around prompt injection, tool-data poisoning, credential access, collection, persistence, defense evasion, and impact.
What It Tests
The pack asks whether an attacker can manipulate an agent into violating a policy or trust boundary across a situation, not just whether a single prompt jailbreaks a model.
Each private scenario includes:
- attacker persona and goal
- protected trust boundary
- expected safe behavior
- concrete failure criteria
- severity
- framework mapping metadata in
output.expectations
With a provider enabled, the attacker adapts across turns to the target agent's responses, and the judge evaluates the completed transcript against the exact success and failure criteria for that scenario. This is what makes the 32 scenarios test their described social-engineering situations instead of replaying fixed strings.
Monitor
Use --fail-on critical to block only critical failures:
roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --fail-on critical
Use --fail-on failed to block failed runs more broadly:
roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --fail-on failed
Uploading Pack Evidence
When the pack runs multiple scenarios, upload all local runs:
roleplay upload all --source ci --mode sanitized_findings
Workbench uses attack-pack metadata to update pack coverage, run history, findings, evidence, verification, and regression state.
Coverage Philosophy
The pack focuses on situations rather than isolated strings. A scenario should prove whether the agent can keep identity, authorization, policy, context, and tool boundaries intact while a realistic attacker applies pressure over multiple turns.
This makes the pack suitable for regression testing:
- locally before a developer opens a PR
- in CI before deployment
- in workbench as shared evidence when a boundary fails
Sources And Standards Baseline
The initial coverage model was informed by:
- OWASP Top 10 for LLM Applications
- OWASP Top 10 for MCP
- MITRE ATLAS
- NIST AI RMF Generative AI Profile, NIST AI 600-1
Launch Scope
roleplay run social-engineering-core is the supported baseline regression-test path for local development and CI. The CLI fetches entitled scenario bundles from the Workbench, runs them locally against your target, and keeps full local artifacts in your environment unless you explicitly upload them.