Menu
Public documentation

Core Concepts

Core Concepts

Roleplay is a paid workbench. Builder is about $49/month and Team is about $199/month. The CLI is included as the local execution engine for both plans; real adaptive runs currently require your own provider key and an explicit judge mode.

Workspace And Project

A workspace is the account boundary for members, settings, billing, and projects. A project represents one protected agent product area. Project API keys, protected agents, test runs, findings, CI setup, and evidence are scoped to a project.

Project API Key

A project API key lets the CLI or CI upload sanitized findings to one workbench project. Raw key values are shown once. If you lose the raw value, create a new key in Monitor.

Scenario

A scenario is a YAML file that defines:

  • the target agent
  • the simulated user or attacker persona
  • the user's goal
  • hidden context the target agent should respect
  • success criteria
  • failure criteria
  • judge settings

Scenarios are used for both normal agent behavior checks and adversarial social-engineering simulations.

Target Agent

A target is the agent under test.

Supported target types:

  • http: send messages to an HTTP endpoint
  • cli: execute a local command
  • mock: use built-in deterministic mock agents

The workbench also tracks protected agents with types http, cli, browser, and mock.

Judge

The judge evaluates the transcript against success and failure criteria.

The CLI supports three judge modes:

  • rules: deterministic local checks for smoke/offline use.
  • semantic: provider-backed security evaluation against the transcript and criteria.
  • hybrid: semantic evaluation plus deterministic guardrails, recommended for CI and serious real-agent tests.

Run

A run is one scenario execution. Each run records transcript, report, scenario, and metadata artifacts under .roleplay/runs/{runId}.

Report

A report contains:

  • status: passed, failed, or warning
  • score from 0 to 100
  • criteria results
  • failures
  • recommendations
  • judge metadata when available
  • start and end timestamps

Finding

A finding is a workbench item derived from a failed run criterion. Findings include severity, attack type, failed invariant, affected agent, status, owner, remediation, and evidence.

Evidence

Evidence is the investigation surface for showing exactly how a manipulation succeeded. It highlights transcript turns, tool evidence, failed invariants, and remediation guidance.

Upload Mode

Cloud uploads support two modes:

  • sanitized_findings: default mode; uploads findings without full transcript, scenario YAML, or metadata
  • full_transcript_opt_in: requires both CLI opt-in and project policy opt-in before full evidence can be sent