Public documentation
Reports and Replay
Reports and Replay
Roleplay saves every run as local artifacts and can render the result in terminal, JSON, Markdown, or replay form.
View A Report
roleplay report latest
roleplay report <runId>
JSON:
roleplay report latest --json
Markdown:
roleplay report latest --markdown
Report Status
Reports have one of three statuses:
passed: no failed criteria or findingswarning: concerning behavior was detectedfailed: the scenario crossed a failure criterion
Each report also includes a score from 0 to 100.
Failure Severity
Failures can be:
lowmediumhighcritical
Use --fail-on to decide when the CLI exits non-zero:
roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --fail-on critical
Replay A Transcript
roleplay replay latest
roleplay replay <runId>
Disable delay:
roleplay replay latest --no-delay
Print transcript JSON:
roleplay replay latest --json
Workbench Evidence
When sanitized findings are uploaded to workbench, each finding can open in Evidence.
Evidence shows:
- the manipulation attempt
- the agent response
- failed-turn evidence
- failed invariant
- tool-call evidence when available
- remediation guidance
- sanitized evidence status
Full transcripts are not uploaded by default. The workbench works from sanitized finding evidence unless full transcript upload is explicitly enabled.