Tests
Tests
A test run is one execution of one or more scenarios against an agent.
Test runs can come from:
localci
The included local runner creates local run artifacts. The workbench stores uploaded run summaries and sanitized findings.
Test Status
Test run status can be:
passedfailedwarningrunning
failed and warning runs should be inspected because they may contain findings or unclear judge results.
Test Metadata
Workbench test records include:
- run ID
- source
- status
- branch
- commit
- build URL
- environment
- started time
- duration
- scenario count
- finding count
CI metadata helps connect a security failure to the exact build that introduced it.
Filtering Tests
The Tests screen supports filters for:
- status
- source
- branch
- environment
Use these filters to answer questions such as:
- Did the latest release branch pass?
- Which production runs failed?
- Which CI build introduced a regression?
- Are local failures also showing up in CI?
Test Detail
The test detail panel shows:
- run metadata
- report summary
- related findings
- branch and commit context
- build link when available
Open related findings to inspect exploit proof and remediation.
Local Run Artifacts
Local run artifacts live in:
.roleplay/runs/<runId>/
The usual files are:
scenario.ymltranscript.jsonreport.jsonreport.mdmetadata.json
See Local Artifacts for details.
CI Run History
When CI uploads sanitized findings, workbench stores the run summary and CI context. The full transcript remains local by default.
This lets the team collaborate on findings without automatically uploading sensitive prompts, tool outputs, customer-like test data, or hidden policy context.