Public documentation

Findings and Evidence

Findings are the core workbench work item.

Each finding describes what broke, how it was exploited, and what to fix.

Finding Fields

Findings include:

title
summary
severity
attack type
failed invariant
affected agent
impact
remediation
status
owner
fix state
regression key
regression outcome when a later upload shows the same failure returned
first seen
last seen
run ID
sanitized evidence flag
transcript proof when available

Severity

Supported severities:

critical
high
medium
low

Status

Supported statuses:

open
in_progress
fixed
accepted_risk
regressed

Use accepted_risk only when the team intentionally accepts the risk. A fixed finding can become regressed when the same failure returns in a later run. Deeper fixed-pending-verification states are part of the planned fix verification workflow.

Fix State

Supported fix states:

untriaged
assigned
patch_ready
verified

Findings Workflow

Open the finding.
Review severity, failed invariant, impact, and affected agent.
Review the exploit proof in Evidence.
Assign an owner.
Move status to in_progress.
Implement remediation.
Mark fixed after the remediation is ready.
Rerun the exact scenario or regression key.
Watch future CI runs or monitors for regression.

Evidence

Evidence shows the proof behind a finding.

It includes:

attacker/user turns
agent turns
tool-call turns
judge turns
failed-turn highlight
failed invariant
impact
remediation
sanitized upload messaging

Use Evidence when you need to answer:

What exactly did the attacker say?
What did the agent do wrong?
Which turn crossed the boundary?
Was a tool call involved?
What invariant failed?
What should engineering change?
Did the rerun prove the fix held?
Can this failure return through CI or scheduled monitoring?