Public documentation
Findings and Evidence
Findings and Evidence
Findings are the core workbench work item.
Each finding describes what broke, how it was exploited, and what to fix.
Finding Fields
Findings include:
- title
- summary
- severity
- attack type
- failed invariant
- affected agent
- impact
- remediation
- status
- owner
- fix state
- regression key
- regression outcome when a later upload shows the same failure returned
- first seen
- last seen
- run ID
- sanitized evidence flag
- transcript proof when available
Severity
Supported severities:
criticalhighmediumlow
Status
Supported statuses:
openin_progressfixedaccepted_riskregressed
Use accepted_risk only when the team intentionally accepts the risk. A fixed finding can become regressed when the same failure returns in a later run. Deeper fixed-pending-verification states are part of the planned fix verification workflow.
Fix State
Supported fix states:
untriagedassignedpatch_readyverified
Findings Workflow
- Open the finding.
- Review severity, failed invariant, impact, and affected agent.
- Review the exploit proof in Evidence.
- Assign an owner.
- Move status to
in_progress. - Implement remediation.
- Mark fixed after the remediation is ready.
- Rerun the exact scenario or regression key.
- Watch future CI runs or monitors for regression.
Evidence
Evidence shows the proof behind a finding.
It includes:
- attacker/user turns
- agent turns
- tool-call turns
- judge turns
- failed-turn highlight
- failed invariant
- impact
- remediation
- sanitized upload messaging
Use Evidence when you need to answer:
- What exactly did the attacker say?
- What did the agent do wrong?
- Which turn crossed the boundary?
- Was a tool call involved?
- What invariant failed?
- What should engineering change?
- Did the rerun prove the fix held?
- Can this failure return through CI or scheduled monitoring?