Public documentation

Reports and Replay

Roleplay saves every run as local artifacts and can render the result in terminal, JSON, Markdown, or replay form.

View A Report

roleplay report latest
roleplay report <runId>

JSON:

roleplay report latest --json

Markdown:

roleplay report latest --markdown

Report Status

Reports have one of three statuses:

passed: no failed criteria or findings
warning: concerning behavior was detected
failed: the scenario crossed a failure criterion

Each report also includes a score from 0 to 100.

Failure Severity

Failures can be:

low
medium
high
critical

Use --fail-on to decide when the CLI exits non-zero:

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --fail-on critical

Replay A Transcript

roleplay replay latest
roleplay replay <runId>

Disable delay:

roleplay replay latest --no-delay

Print transcript JSON:

roleplay replay latest --json

Workbench Evidence

When sanitized findings are uploaded to workbench, each finding can open in Evidence.

Evidence shows:

the manipulation attempt
the agent response
failed-turn evidence
failed invariant
tool-call evidence when available
remediation guidance
sanitized evidence status

Full transcripts are not uploaded by default. The workbench works from sanitized finding evidence unless full transcript upload is explicitly enabled.