Menu
Public documentation

Tests

Tests

A test run is one execution of one or more scenarios against an agent.

Test runs can come from:

  • local
  • ci

The included local runner creates local run artifacts. The workbench stores uploaded run summaries and sanitized findings.

Test Status

Test run status can be:

  • passed
  • failed
  • warning
  • running

failed and warning runs should be inspected because they may contain findings or unclear judge results.

Test Metadata

Workbench test records include:

  • run ID
  • source
  • status
  • branch
  • commit
  • build URL
  • environment
  • started time
  • duration
  • scenario count
  • finding count

CI metadata helps connect a security failure to the exact build that introduced it.

Filtering Tests

The Tests screen supports filters for:

  • status
  • source
  • branch
  • environment

Use these filters to answer questions such as:

  • Did the latest release branch pass?
  • Which production runs failed?
  • Which CI build introduced a regression?
  • Are local failures also showing up in CI?

Test Detail

The test detail panel shows:

  • run metadata
  • report summary
  • related findings
  • branch and commit context
  • build link when available

Open related findings to inspect exploit proof and remediation.

Local Run Artifacts

Local run artifacts live in:

.roleplay/runs/<runId>/

The usual files are:

  • scenario.yml
  • transcript.json
  • report.json
  • report.md
  • metadata.json

See Local Artifacts for details.

CI Run History

When CI uploads sanitized findings, workbench stores the run summary and CI context. The full transcript remains local by default.

This lets the team collaborate on findings without automatically uploading sensitive prompts, tool outputs, customer-like test data, or hidden policy context.