LedgerGuard
Indirect prompt-injection-resistant bookkeeping autopilot · same input, two agents, zero clicks
Eval runner
Runs the YAML suite against four cumulative configs (raw → full). The headline number is the bottom-right cell.
config
Pass count by config
Each layer added catches cases the previous configs missed — most visible in the adversarial slice.
Eval suite
Each config is cumulative — bottom row is everything turned on. Goal: clean column on the right.
| Config | Happy | Edge | Adversarial |
|---|---|---|---|
| Raw LLM | — | — | — |
| + Pydantic | — | — | — |
| + Cross-verify | — | — | — |
| + Guard layer (full) | — | — | — |