Public documentation

Core Concepts

Paid Workbench And Included CLI

Roleplay is a paid workbench. Builder is about $49/month and Team is about $199/month. The CLI is included as the local execution engine for both plans; real adaptive runs currently require your own provider key and an explicit judge mode.

Workspace And Project

A workspace is the account boundary for members, settings, billing, and projects. A project represents one protected agent product area. Project API keys, protected agents, test runs, findings, CI setup, and evidence are scoped to a project.

Project API Key

A project API key lets the CLI or CI upload sanitized findings to one workbench project. Raw key values are shown once. If you lose the raw value, create a new key in Monitor.

Scenario

A scenario is a YAML file that defines:

the target agent
the simulated user or attacker persona
the user's goal
hidden context the target agent should respect
success criteria
failure criteria
judge settings

Scenarios are used for both normal agent behavior checks and adversarial social-engineering simulations.

Target Agent

A target is the agent under test.

Supported target types:

http: send messages to an HTTP endpoint
cli: execute a local command
mock: use built-in deterministic mock agents

The workbench also tracks protected agents with types http, cli, browser, and mock.

Judge

The judge evaluates the transcript against success and failure criteria.

The CLI supports three judge modes:

rules: deterministic local checks for smoke/offline use.
semantic: provider-backed security evaluation against the transcript and criteria.
hybrid: semantic evaluation plus deterministic guardrails, recommended for CI and serious real-agent tests.

Run

A run is one scenario execution. Each run records transcript, report, scenario, and metadata artifacts under .roleplay/runs/{runId}.

Report

A report contains:

status: passed, failed, or warning
score from 0 to 100
criteria results
failures
recommendations
judge metadata when available
start and end timestamps

Finding

A finding is a workbench item derived from a failed run criterion. Findings include severity, attack type, failed invariant, affected agent, status, owner, remediation, and evidence.

Evidence

Evidence is the investigation surface for showing exactly how a manipulation succeeded. It highlights transcript turns, tool evidence, failed invariants, and remediation guidance.

Upload Mode

Cloud uploads support two modes:

sanitized_findings: default mode; uploads findings without full transcript, scenario YAML, or metadata
full_transcript_opt_in: requires both CLI opt-in and project policy opt-in before full evidence can be sent