Menu
Public documentation

Quickstart

Quickstart

This guide assumes you have created a workbench account, created a project, copied a project API key, and chosen the provider you want to use for real adaptive runs.

1. Configure The Local Runner

Use guided setup first:

roleplay setup

For scripted setup, roleplay init creates local config, a smoke-test scenario, and the .roleplay/runs directory:

roleplay init

2. Smoke Test The Install

Mock mode verifies that the CLI can run and save local evidence. It does not test a real agent.

roleplay run social-engineering-core --target mock --provider mock --judge rules

3. Configure Provider And Judge

Real agent tests require explicit attacker and judge choices.

export ROLEPLAY_PROJECT_ID=<project-id>
export ROLEPLAY_API_KEY=<project-api-key>
export ROLEPLAY_LLM_PROVIDER=<provider>
export ROLEPLAY_JUDGE_MODE=hybrid
export ROLEPLAY_JUDGE_PROVIDER=<provider>
export ROLEPLAY_<PROVIDER>_API_KEY=<provider-key>

Use ROLEPLAY_LLM_API_KEY instead when your provider uses the generic or OpenAI-compatible configuration path.

Judge modes:

  • rules: deterministic local checks for smoke/offline use.
  • semantic: provider-backed security evaluation.
  • hybrid: semantic evaluation plus deterministic guardrails, recommended for CI and serious real-agent tests.

4. Run The Workbench Social-Engineering Pack

Real attack-pack scenarios are fetched from Roleplay for entitled Builder and Team projects, then executed locally against your target.

Against an HTTP agent:

roleplay run social-engineering-core \
  --target http://localhost:3000/agent \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

Against a CLI agent:

roleplay run social-engineering-core \
  --target-command "node ./agent.js" \
  --yes \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

5. Review Local Evidence

roleplay report latest
roleplay replay latest

Reports include the judge mode, provider/model when available, and whether deterministic guardrails contributed findings.

6. Upload Sanitized Findings

Upload only after you have created a real workbench project and copied a project API key.

ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
ROLEPLAY_PROJECT_ID=<project-id> \
ROLEPLAY_API_KEY=<project-api-key> \
roleplay upload latest --mode sanitized_findings

Upload all local runs from an attack-pack execution:

ROLEPLAY_CLOUD_URL=https://app.roleplay.sh \
ROLEPLAY_PROJECT_ID=<project-id> \
ROLEPLAY_API_KEY=<project-api-key> \
roleplay upload all --source ci --mode sanitized_findings

sanitized_findings is the default workbench privacy posture. Full transcript upload remains off unless Monitor policy and CLI mode both explicitly opt in.