Menu
Public documentation

Findings and Evidence

Findings and Evidence

Findings are the core workbench work item.

Each finding describes what broke, how it was exploited, and what to fix.

Finding Fields

Findings include:

  • title
  • summary
  • severity
  • attack type
  • failed invariant
  • affected agent
  • impact
  • remediation
  • status
  • owner
  • fix state
  • regression key
  • regression outcome when a later upload shows the same failure returned
  • first seen
  • last seen
  • run ID
  • sanitized evidence flag
  • transcript proof when available

Severity

Supported severities:

  • critical
  • high
  • medium
  • low

Status

Supported statuses:

  • open
  • in_progress
  • fixed
  • accepted_risk
  • regressed

Use accepted_risk only when the team intentionally accepts the risk. A fixed finding can become regressed when the same failure returns in a later run. Deeper fixed-pending-verification states are part of the planned fix verification workflow.

Fix State

Supported fix states:

  • untriaged
  • assigned
  • patch_ready
  • verified

Findings Workflow

  1. Open the finding.
  2. Review severity, failed invariant, impact, and affected agent.
  3. Review the exploit proof in Evidence.
  4. Assign an owner.
  5. Move status to in_progress.
  6. Implement remediation.
  7. Mark fixed after the remediation is ready.
  8. Rerun the exact scenario or regression key.
  9. Watch future CI runs or monitors for regression.

Evidence

Evidence shows the proof behind a finding.

It includes:

  • attacker/user turns
  • agent turns
  • tool-call turns
  • judge turns
  • failed-turn highlight
  • failed invariant
  • impact
  • remediation
  • sanitized upload messaging

Use Evidence when you need to answer:

  • What exactly did the attacker say?
  • What did the agent do wrong?
  • Which turn crossed the boundary?
  • Was a tool call involved?
  • What invariant failed?
  • What should engineering change?
  • Did the rerun prove the fix held?
  • Can this failure return through CI or scheduled monitoring?