Menu
Public documentation

CLI Overview

CLI Overview

The roleplay CLI is the included local runner for the Roleplay Workbench. It runs tests in your environment, saves replayable local evidence, and uploads sanitized proof when you provide a workbench project key.

Basic Usage

roleplay setup
roleplay run social-engineering-core --target mock --provider mock --judge rules
roleplay report latest
roleplay replay latest

Use roleplay upload only after the workbench has created a project and project API key for you.

Commands

CommandPurpose
roleplay setupGuided Workbench and local-runner setup.
roleplay initScriptable starter config for CI or manual setup.
roleplay scenario:createCreate a scenario from a built-in template.
roleplay runRun a scenario or the built-in attack pack.
roleplay reportPrint a saved report.
roleplay replayReplay a saved transcript.
roleplay uploadUpload local runs to workbench.
roleplay listList local scenarios or runs.
roleplay doctorCheck local, Workbench, provider, and judge readiness.
roleplay mcpStart a local MCP stdio server.

Real Runs

Real HTTP or CLI targets require explicit attacker and judge choices:

roleplay run social-engineering-core \
  --target http://localhost:3000/agent \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

Provider identifiers are openai, anthropic, google, and openai-compatible. They are reference options, not defaults.

Judge Modes

  • rules: deterministic local judge for smoke/offline checks.
  • semantic: provider-backed judge for transcript evaluation.
  • hybrid: semantic judge plus deterministic guardrails, recommended for CI and serious real-agent tests.

Rules-only judging against real targets requires --allow-rules-only so it is not mistaken for full semantic evaluation.

JSON Output And Exit Codes

Use --json on supported commands for machine-readable output.

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --json
roleplay report latest --json
roleplay list runs --json
roleplay doctor --cloud --json

roleplay run exits non-zero when the run crosses the configured --fail-on threshold: warning, failed, or critical.

Output Directory

Local artifacts are stored in .roleplay/runs by default.

roleplay run .roleplay/scenarios/install-smoke.yml --out ./artifacts/roleplay
roleplay report latest --out ./artifacts/roleplay
roleplay upload all --out ./artifacts/roleplay

Use the same --out value across run, list, report, replay, and upload.