Quickstart
Quickstart
This guide assumes you have created a workbench account, created a project, copied a project API key, and chosen the provider you want to use for real adaptive runs.
1. Configure The Local Runner
Use guided setup first:
roleplay setup
For scripted setup, roleplay init creates local config, a smoke-test scenario, and the .roleplay/runs directory:
roleplay init
2. Smoke Test The Install
Mock mode verifies that the CLI can run and save local evidence. It does not test a real agent.
roleplay run social-engineering-core --target mock --provider mock --judge rules
3. Configure Provider And Judge
Real agent tests require explicit attacker and judge choices.
export ROLEPLAY_PROJECT_ID=<project-id>
export ROLEPLAY_API_KEY=<project-api-key>
export ROLEPLAY_LLM_PROVIDER=<provider>
export ROLEPLAY_JUDGE_MODE=hybrid
export ROLEPLAY_JUDGE_PROVIDER=<provider>
export ROLEPLAY_<PROVIDER>_API_KEY=<provider-key>
Use ROLEPLAY_LLM_API_KEY instead when your provider uses the generic or OpenAI-compatible configuration path.
Judge modes:
rules: deterministic local checks for smoke/offline use.semantic: provider-backed security evaluation.hybrid: semantic evaluation plus deterministic guardrails, recommended for CI and serious real-agent tests.
4. Run The Workbench Social-Engineering Pack
Real attack-pack scenarios are fetched from Roleplay for entitled Builder and Team projects, then executed locally against your target.
Against an HTTP agent:
roleplay run social-engineering-core \
--target http://localhost:3000/agent \
--provider <provider> \
--judge hybrid \
--project <project-id> \
--api-key <project-api-key> \
--fail-on critical
Against a CLI agent:
roleplay run social-engineering-core \
--target-command "node ./agent.js" \
--yes \
--provider <provider> \
--judge hybrid \
--project <project-id> \
--api-key <project-api-key> \
--fail-on critical
5. Review Local Evidence
roleplay report latest
roleplay replay latest
Reports include the judge mode, provider/model when available, and whether deterministic guardrails contributed findings.
6. Upload Sanitized Findings
Upload only after you have created a real workbench project and copied a project API key.
ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
ROLEPLAY_PROJECT_ID=<project-id> \
ROLEPLAY_API_KEY=<project-api-key> \
roleplay upload latest --mode sanitized_findings
Upload all local runs from an attack-pack execution:
ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
ROLEPLAY_PROJECT_ID=<project-id> \
ROLEPLAY_API_KEY=<project-api-key> \
roleplay upload all --source ci --mode sanitized_findings
sanitized_findings is the default workbench privacy posture. Full transcript upload remains off unless Monitor policy and CLI mode both explicitly opt in.