LedgerGuard

Indirect prompt-injection-resistant bookkeeping autopilot · same input, two agents, zero clicks

Eval runner

Runs the YAML suite against four cumulative configs (raw → full). The headline number is the bottom-right cell.

no data yet
config

Pass count by config

Each layer added catches cases the previous configs missed — most visible in the adversarial slice.

Eval suite

Each config is cumulative — bottom row is everything turned on. Goal: clean column on the right.

ConfigHappyEdgeAdversarial
Raw LLM
+ Pydantic
+ Cross-verify
+ Guard layer (full)