On June 20th my CI shipped a visit picker that put patients in the wrong appointment slot. The PR was green, every review comment showed resolved, and the merge robot did exactly what I told it to. The fix that was supposed to land never landed. The robot just thought it had.
Here’s the setup in plain terms. CI — the thing that runs your tests and merges your code — was wired to auto-merge a pull request (a proposed code change) the instant two things were true: tests pass, and zero review comments are still marked “unresolved.” CodeRabbit, a review bot, leaves those comments. A second bot I run reads the comments, writes the fix, pushes it, and replies “done” to close the thread.
You can probably already see the seam. Closing the thread and pushing the fix are two different actions, and nothing forced them to happen in that order.
On this PR, three CodeRabbit threads flagged correctness problems — one of them the wrong-appointment regression. My fix-bot started working. Its reply node — the part that posts “fixed” and marks the thread resolved — fired before its build node finished compiling and pushing the actual change. So for a few seconds the PR was green and showed zero unresolved threads while the buggy code was still sitting in the branch. The auto-merge gate saw exactly the state it was waiting for, merged, and deleted the branch. The fix push arrived to a branch that no longer existed.
The robot didn’t lie to me. I’d asked it the wrong question. I’d treated “the human (or bot) says it’s handled” as a proxy for “the corrected lines are in main.” Those are not the same fact, and on a fast pipeline the gap between them is exactly long enough to ship.
My first instinct was to add a delay — make the fix-bot wait, resolve threads last. That helps, but it’s a band-aid on a race. Timing fixes fail the day something is slower than you guessed. I didn’t want a gate that depended on my bot being polite about ordering.
So I stopped gating on review-thread state entirely and started gating on the merged code itself. Before auto-merge is allowed to fire, I check the content of the file for the fix’s signature — a specific string or assertion the fix must contain. Threads can say whatever they want; the bytes in origin/main can’t.
The reality-gate that replaced the thread check give me the detail
The principle: never trust a status field as a proxy for code state. Grep the merged artifact for the fix signature.
# Block auto-merge unless the corrected line is actually present
# in the PR head — not just because a thread says "resolved"
SIG='slot.start === requested.start' # the fix's signature
if git show "origin/main:src/scheduling/visit-picker.ts" \
| grep -qF "$SIG"; then
echo "fix present in main — safe"
else
echo "fix MISSING — block + reopen issue"
exit 1
fiTwo changes stuck. First, this signature check runs against the PR head before merge is allowed, so a green build with no fix can’t satisfy the gate. Second, for correctness fixes I stopped letting the fix-bot push into the live PR at all — it opens a fresh issue→PR, and I merge that manually only after confirming the signature is in the head. The race needs two writers to the same branch; removing one removes the race.
The thing I’d generalize: any “resolved / approved / done” flag is a claim about the work, not the work. If your automation acts on the claim, it will eventually act on a claim that’s true a half-second before the reality is. Gate on the artifact — the actual merged bytes — and the half-second stops mattering.
And watch the bots you dispatch to fix things. A fix-bot that resolves a thread is itself a writer in the race. Mine was racing me to my own merge gate, and for a few seconds it won.