Public documentation

Tests

A test run is one execution of one or more scenarios against an agent.

Test runs can come from:

local
ci

The included local runner creates local run artifacts. The workbench stores uploaded run summaries and sanitized findings.

Test Status

Test run status can be:

passed
failed
warning
running

failed and warning runs should be inspected because they may contain findings or unclear judge results.

Test Metadata

Workbench test records include:

run ID
source
status
branch
commit
build URL
environment
started time
duration
scenario count
finding count

CI metadata helps connect a security failure to the exact build that introduced it.

Filtering Tests

The Tests screen supports filters for:

status
source
branch
environment

Use these filters to answer questions such as:

Did the latest release branch pass?
Which production runs failed?
Which CI build introduced a regression?
Are local failures also showing up in CI?

Test Detail

The test detail panel shows:

run metadata
report summary
related findings
branch and commit context
build link when available

Open related findings to inspect exploit proof and remediation.

Local Run Artifacts

Local run artifacts live in:

.roleplay/runs/<runId>/

The usual files are:

scenario.yml
transcript.json
report.json
report.md
metadata.json

See Local Artifacts for details.

CI Run History

When CI uploads sanitized findings, workbench stores the run summary and CI context. The full transcript remains local by default.

This lets the team collaborate on findings without automatically uploading sensitive prompts, tool outputs, customer-like test data, or hidden policy context.