Public documentation

CLI Overview

The roleplay CLI is the included local runner for the Roleplay Workbench. It runs tests in your environment, saves replayable local evidence, and uploads sanitized proof when you provide a workbench project key.

Basic Usage

roleplay setup
roleplay run social-engineering-core --target mock --provider mock --judge rules
roleplay report latest
roleplay replay latest

Use roleplay upload only after the workbench has created a project and project API key for you.

Commands

Command	Purpose
`roleplay setup`	Guided Workbench and local-runner setup.
`roleplay init`	Scriptable starter config for CI or manual setup.
`roleplay scenario:create`	Create a scenario from a built-in template.
`roleplay run`	Run a scenario or the built-in attack pack.
`roleplay report`	Print a saved report.
`roleplay replay`	Replay a saved transcript.
`roleplay upload`	Upload local runs to workbench.
`roleplay list`	List local scenarios or runs.
`roleplay doctor`	Check local, Workbench, provider, and judge readiness.
`roleplay mcp`	Start a local MCP stdio server.

Real Runs

Real HTTP or CLI targets require explicit attacker and judge choices:

roleplay run social-engineering-core \
  --target http://localhost:3000/agent \
  --provider <provider> \
  --judge hybrid \
  --project <project-id> \
  --api-key <project-api-key> \
  --fail-on critical

Provider identifiers are openai, anthropic, google, and openai-compatible. They are reference options, not defaults.

Judge Modes

rules: deterministic local judge for smoke/offline checks.
semantic: provider-backed judge for transcript evaluation.
hybrid: semantic judge plus deterministic guardrails, recommended for CI and serious real-agent tests.

Rules-only judging against real targets requires --allow-rules-only so it is not mistaken for full semantic evaluation.

JSON Output And Exit Codes

Use --json on supported commands for machine-readable output.

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key> --json
roleplay report latest --json
roleplay list runs --json
roleplay doctor --cloud --json

roleplay run exits non-zero when the run crosses the configured --fail-on threshold: warning, failed, or critical.

Output Directory

Local artifacts are stored in .roleplay/runs by default.

roleplay run .roleplay/scenarios/install-smoke.yml --out ./artifacts/roleplay
roleplay report latest --out ./artifacts/roleplay
roleplay upload all --out ./artifacts/roleplay

Use the same --out value across run, list, report, replay, and upload.