Public documentation

Quickstart

This guide assumes you have created a workbench account, created a project, copied a project API key, and chosen the provider you want to use for real adaptive runs.

1. Configure The Local Runner

Use guided setup first:

roleplay setup

For scripted setup, roleplay init creates local config, a smoke-test scenario, and the .roleplay/runs directory:

roleplay init

2. Smoke Test The Install

Mock mode verifies that the CLI can run and save local evidence. It does not test a real agent.

roleplay run social-engineering-core --target mock --provider mock --judge rules

3. Configure Provider And Judge

Real agent tests require explicit attacker and judge choices.

export ROLEPLAY_PROJECT_ID=<project-id>
export ROLEPLAY_API_KEY=<project-api-key>
export ROLEPLAY_LLM_PROVIDER=<provider>
export ROLEPLAY_JUDGE_MODE=hybrid
export ROLEPLAY_JUDGE_PROVIDER=<provider>
export ROLEPLAY_<PROVIDER>_API_KEY=<provider-key>

Use ROLEPLAY_LLM_API_KEY instead when your provider uses the generic or OpenAI-compatible configuration path.

Judge modes:

rules: deterministic local checks for smoke/offline use.
semantic: provider-backed security evaluation.
hybrid: semantic evaluation plus deterministic guardrails, recommended for CI and serious real-agent tests.

Real attack-pack scenarios are fetched from Roleplay for entitled Builder and Team projects, then executed locally against your target.

Against an HTTP agent:

roleplay run social-engineering-core \
  --target http://localhost:3000/agent \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

Against a CLI agent:

roleplay run social-engineering-core \
  --target-command "node ./agent.js" \
  --yes \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

5. Review Local Evidence

roleplay report latest
roleplay replay latest

Reports include the judge mode, provider/model when available, and whether deterministic guardrails contributed findings.

6. Upload Sanitized Findings

Upload only after you have created a real workbench project and copied a project API key.

ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
ROLEPLAY_PROJECT_ID=<project-id> \
ROLEPLAY_API_KEY=<project-api-key> \
roleplay upload latest --mode sanitized_findings

Upload all local runs from an attack-pack execution:

ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
ROLEPLAY_PROJECT_ID=<project-id> \
ROLEPLAY_API_KEY=<project-api-key> \
roleplay upload all --source ci --mode sanitized_findings

sanitized_findings is the default workbench privacy posture. Full transcript upload remains off unless Monitor policy and CLI mode both explicitly opt in.

Quickstart

Quickstart

1. Configure The Local Runner

2. Smoke Test The Install

3. Configure Provider And Judge

4. Run The Workbench Social-Engineering Pack

5. Review Local Evidence

6. Upload Sanitized Findings