My Passing Tests Encoded the Fail-Open Bug as Correct Behavior

I build deploy gates for a fleet of Claude Code agents. The gate’s whole job is to block: if an urgent care clinic has a consent hold open, the agent must not ship code touching that scope. A gate that blocks correctly is boring. A gate that lets something through when it should have stopped it is the only failure that matters.

One night I shipped two security-critical fixes to that gate. Both went out CI-green: a clean typecheck, 231 passing tests, an automated CodeRabbit-style review with no blockers, and my own read-through. By every signal I had, the code was correct.

Then I ran an author-independent adversarial review — a second reviewer whose only instruction was “find a way this ALLOWs when it must BLOCK.” It caught a CRITICAL each time.

Three fail-OPEN paths, in a gate built specifically to prevent fail-open:

An empty branch field skipped a main-scoped consent hold. Empty scope was treated as “nothing to check” — ALLOW — when it had to mean BLOCK.
A hold missing a required field was silently dropped instead of throwing. No field, no hold, no block.
Duplicate-key last-wins let a stale expiry overwrite an active freeze. The newest record won even when it was the wrong one.

Here is the mechanism I missed. I wrote the tests. My tests encode my mental model of the code. When my model is wrong, the tests assert the bug as intended behavior — and CI dutifully goes green confirming the gate does exactly what I wrongly believed. Tests written by the author can only ever catch the bugs the author already imagined. The fail-open lived in the gap between what I thought the code did and what it actually did, and nothing I authored could see into that gap.

For any code whose job is to block, “tests pass” is necessary but never sufficient. Green CI proves the code matches your model. It says nothing about whether your model is correct.

Make ambiguity fail closed, then fold the attack back in give me the detail

The fix is two-part: change the defaults so ambiguity blocks, and invert any test that encoded the broken behavior.

Missing, empty, or duplicate required fields must throw — not coerce to a permissive default:

function resolveHold(record: HoldRecord): Hold {
  if (!record.scope?.trim()) {
    // empty scope is NOT "no scope" — fail closed
    throw new GateError("empty scope must BLOCK, never ALLOW");
  }
  if (record.expiresAt == null) {
    throw new GateError("missing required field: expiresAt");
  }
  return record;
}

// duplicate keys: an active freeze must win over a stale expiry
const active = holds
  .filter(h => h.status === "active")
  .sort((a, b) => b.expiresAt - a.expiresAt);
if (active.length) return BLOCK;

Then write the test from the attacker’s side, asserting BLOCK:

test("empty branch on a main-scoped hold must BLOCK", () => {
  expect(() => evaluateGate({ branch: "" })).toThrow(GateError);
});

Every reproduced attack becomes a permanent test. If an old test asserted the fail-open as correct, invert it — expect(ALLOW) becomes expect(BLOCK). That inverted assertion is the proof your model changed.

So if you write gate, auth, or consent code: require an author-independent adversarial review before merge, and never trust CI-green alone. Make missing, ambiguous, or duplicate required fields throw and block. Make unknown or empty scope fail closed. And every time an attack works, fold it into your tests — inverting whatever test quietly swore the bug was the feature.