Core Concepts
Core Concepts
Paid Workbench And Included CLI
Roleplay is a paid workbench. Builder is about $49/month and Team is about $199/month. The CLI is included as the local execution engine for both plans; real adaptive runs currently require your own provider key and an explicit judge mode.
Workspace And Project
A workspace is the account boundary for members, settings, billing, and projects. A project represents one protected agent product area. Project API keys, protected agents, test runs, findings, CI setup, and evidence are scoped to a project.
Project API Key
A project API key lets the CLI or CI upload sanitized findings to one workbench project. Raw key values are shown once. If you lose the raw value, create a new key in Monitor.
Scenario
A scenario is a YAML file that defines:
- the target agent
- the simulated user or attacker persona
- the user's goal
- hidden context the target agent should respect
- success criteria
- failure criteria
- judge settings
Scenarios are used for both normal agent behavior checks and adversarial social-engineering simulations.
Target Agent
A target is the agent under test.
Supported target types:
http: send messages to an HTTP endpointcli: execute a local commandmock: use built-in deterministic mock agents
The workbench also tracks protected agents with types http, cli, browser, and mock.
Judge
The judge evaluates the transcript against success and failure criteria.
The CLI supports three judge modes:
rules: deterministic local checks for smoke/offline use.semantic: provider-backed security evaluation against the transcript and criteria.hybrid: semantic evaluation plus deterministic guardrails, recommended for CI and serious real-agent tests.
Run
A run is one scenario execution. Each run records transcript, report, scenario, and metadata artifacts under .roleplay/runs/{runId}.
Report
A report contains:
- status:
passed,failed, orwarning - score from 0 to 100
- criteria results
- failures
- recommendations
- judge metadata when available
- start and end timestamps
Finding
A finding is a workbench item derived from a failed run criterion. Findings include severity, attack type, failed invariant, affected agent, status, owner, remediation, and evidence.
Evidence
Evidence is the investigation surface for showing exactly how a manipulation succeeded. It highlights transcript turns, tool evidence, failed invariants, and remediation guidance.
Upload Mode
Cloud uploads support two modes:
sanitized_findings: default mode; uploads findings without full transcript, scenario YAML, or metadatafull_transcript_opt_in: requires both CLI opt-in and project policy opt-in before full evidence can be sent