Menu
Public documentation

Command Reference

Command Reference

roleplay setup

Guided Workbench and local-runner setup.

roleplay setup
roleplay setup --project <project-id> --provider <provider> --judge hybrid --target http://localhost:3000/agent

The setup command writes safe placeholders to .env.example and .roleplay/config.json. It does not store raw provider keys by default.

Flags:

FlagValuesDefaultDescription
--cloud-urlURLROLEPLAY_CLOUD_URL or https://app.roleplay.shWorkbench URL.
--projectstringROLEPLAY_PROJECT_IDWorkbench project ID.
--providerproviderROLEPLAY_LLM_PROVIDERAttacker provider.
--judgerules, semantic, hybridROLEPLAY_JUDGE_MODE or hybridJudge mode.
--judge-providerproviderROLEPLAY_JUDGE_PROVIDER or --providerJudge provider for semantic or hybrid mode.
--targetURLROLEPLAY_TARGET_URLHTTP target URL.
--target-commandcommandROLEPLAY_TARGET_COMMANDCLI target command.
--yes, -ybooleanfalseAccept defaults without prompting.
--jsonbooleanfalsePrint JSON output only.

roleplay init

Initialize Roleplay in the current repository for scripted setup.

roleplay init
roleplay init --json

Creates .roleplay/config.json, a local install-smoke scenario, and .roleplay/runs.

roleplay run

Run a local scenario or fetch an entitled Workbench attack pack.

roleplay run .roleplay/scenarios/install-smoke.yml
roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key>
roleplay run social-engineering-core --target-command "node ./agent.js" --yes --provider <provider> --judge hybrid --project <project-id> --api-key <project-api-key>
roleplay run social-engineering-core --target mock --provider mock --judge rules

Flags:

FlagValuesDefaultDescription
--targetURL or mockROLEPLAY_TARGET_URLHTTP target URL for social-engineering-core, or mock for local smoke tests.
--target-commandcommandROLEPLAY_TARGET_COMMANDCLI target command for social-engineering-core.
--endpointURLROLEPLAY_CLOUD_URL or https://app.roleplay.shWorkbench URL for entitlement checks.
--projectstringROLEPLAY_PROJECT_IDWorkbench project ID. Required for real agent runs.
--api-keystringROLEPLAY_API_KEYWorkbench API key. Required for real agent runs.
--providerproviderROLEPLAY_LLM_PROVIDERShared attacker and judge provider. Required for real targets.
--attacker-providerproviderROLEPLAY_ATTACKER_PROVIDER or --providerProvider for adaptive attacker turns.
--judgerules, semantic, hybridROLEPLAY_JUDGE_MODEJudge mode. Required for real targets.
--judge-providerproviderROLEPLAY_JUDGE_PROVIDER or --providerProvider for semantic or hybrid judging.
--allow-rules-onlybooleanfalsePermit rules-only judging for real targets.
--modelmodel nameROLEPLAY_LLM_MODEL or provider defaultShared model.
--attacker-modelmodel nameROLEPLAY_ATTACKER_MODEL or --modelModel for attacker turns.
--judge-modelmodel nameROLEPLAY_JUDGE_MODEL, scenario judge.model, or --modelModel for judging.
--llm-base-urlURLROLEPLAY_LLM_BASE_URLBase URL for OpenAI-compatible providers.
--max-turnsintegerscenario valueOverride scenario max turns.
--fail-onwarning, failed, criticalfailedExit non-zero at this threshold.
--jsonbooleanfalsePrint JSON only.
--outpath.roleplay/runsRun artifacts directory.
--yes, -ybooleanfalseAllow local CLI target command execution.

For social-engineering-core, provide one target source: --target, --target-command, ROLEPLAY_TARGET_URL, or ROLEPLAY_TARGET_COMMAND. Real attack-pack scenario bundles are fetched from the Workbench for entitled projects and are not bundled in the public CLI package.

Real runs, meaning any non-mock target or non-mock provider, require a valid Builder or Team project API key. Create one from onboarding or Monitor, then pass it with --project and --api-key or the matching environment variables.

Provider identifiers are openai, anthropic, google, and openai-compatible. They are reference options, not defaults.

Judge guidance:

  • rules: deterministic local judge for smoke/offline checks.
  • semantic: provider-backed transcript evaluation.
  • hybrid: semantic judge plus deterministic guardrails, recommended for CI and real-agent tests.

roleplay upload

Upload one run or all local runs to workbench.

roleplay upload latest --project <project-id> --api-key <project-api-key>
roleplay upload all --source ci --mode sanitized_findings

Uploads require a real workbench project ID and project API key from onboarding or Monitor.

roleplay report

Show a saved report.

roleplay report latest
roleplay report <runId> --json
roleplay report <runId> --markdown

Reports include judge metadata when available: mode, provider, model, and whether deterministic guardrails contributed findings.

roleplay replay

Replay a saved transcript.

roleplay replay latest
roleplay replay <runId> --no-delay

roleplay doctor

Check install, Workbench, provider, judge, and upload readiness.

roleplay doctor
roleplay doctor --cloud
roleplay doctor --cloud --project <project-id> --api-key <project-api-key>

Useful flags:

FlagDefaultDescription
--jsonfalsePrint JSON output.
--cloudfalseCheck Workbench /api/health.
--cloud-urlROLEPLAY_CLOUD_URL or https://app.roleplay.shWorkbench base URL.
--projectROLEPLAY_PROJECT_IDProject ID for API key verification.
--api-keyROLEPLAY_API_KEYAPI key for verification.
--providerROLEPLAY_LLM_PROVIDERAttacker provider key check.
--judgeROLEPLAY_JUDGE_MODEJudge mode readiness check.
--judge-providerROLEPLAY_JUDGE_PROVIDERJudge provider key check.

Other Commands

roleplay scenario:create my-scenario
roleplay list scenarios
roleplay list runs
roleplay mcp