Menu
Public documentation

Reports and Replay

Reports and Replay

Roleplay saves every run as local artifacts and can render the result in terminal, JSON, Markdown, or replay form.

View A Report

roleplay report latest
roleplay report <runId>

JSON:

roleplay report latest --json

Markdown:

roleplay report latest --markdown

Report Status

Reports have one of three statuses:

  • passed: no failed criteria or findings
  • warning: concerning behavior was detected
  • failed: the scenario crossed a failure criterion

Each report also includes a score from 0 to 100.

Failure Severity

Failures can be:

  • low
  • medium
  • high
  • critical

Use --fail-on to decide when the CLI exits non-zero:

roleplay run social-engineering-core --target http://localhost:3000/agent --provider <provider> --judge hybrid --fail-on critical

Replay A Transcript

roleplay replay latest
roleplay replay <runId>

Disable delay:

roleplay replay latest --no-delay

Print transcript JSON:

roleplay replay latest --json

Workbench Evidence

When sanitized findings are uploaded to workbench, each finding can open in Evidence.

Evidence shows:

  • the manipulation attempt
  • the agent response
  • failed-turn evidence
  • failed invariant
  • tool-call evidence when available
  • remediation guidance
  • sanitized evidence status

Full transcripts are not uploaded by default. The workbench works from sanitized finding evidence unless full transcript upload is explicitly enabled.