My agent printed “3 green / 0 unresolved threads / ready to merge,” and I almost let it ship. Then GitHub refused — the merge button stayed grey, BLOCKED, with every single check green. That disagreement is the whole story.
Here’s the setup in plain terms. I was building a little gate that decides when a pull request — a proposed change to the code — is safe to merge. The gate looked at two things: the automated checks (tests passing), and the review comments left by CodeRabbit, an AI reviewer that flags problems inline. My rule was simple: zero unresolved review comments plus green checks equals go.
To count “unresolved,” I asked CodeRabbit’s data for threads where isResolved == false AND isOutdated == false. That second filter felt obviously right at the time. A thread goes “outdated” when the code around it shifts — so I figured outdated meant already dealt with. Stale. Safe to ignore.
That assumption was the bug.
One thread was marked 🔴 CRITICAL: a fail-open path where an error would silently let a request through instead of blocking it. It was real, it was still live in the current code — and it was flagged isOutdated == true. Not because anyone fixed it. Because an earlier commit added a few lines above it and pushed the line numbers down. The finding hadn’t moved. The numbers had moved. My filter saw “outdated” and erased it from the count. Zero. Green. Ship it.
What saved me was the dumber gate. GitHub’s branch protection has a setting called require_conversation_resolution, and it counts every unresolved thread — it doesn’t know or care what “outdated” means. So it held the PR at BLOCKED while my clever filter said clean. I almost overrode it. I’m glad I didn’t.
Think about why this is the nastiest kind of bug. It wasn’t in the code I was reviewing. It was in the tool I built to do the reviewing. I wrote isOutdated == false because I believed “outdated implies addressed,” and that belief became a filter, and the filter inherited my exact blind spot. A verification tool you author yourself can only catch mistakes you didn’t already make in your head — it shares your assumptions by construction.
The fix, and why it works give me the detail
When counting unresolved CodeRabbit threads, drop the outdated exclusion entirely. Using the GitHub GraphQL API:
reviewThreads(first: 100) {
nodes { isResolved isOutdated }
}Count isResolved == false. Full stop. isOutdated describes line-position drift, not resolution state — a CRITICAL finding stays valid even when a later commit shuffles its anchor lines.
Then treat the structural gate as ground truth:
gh pr view "$PR" --json mergeStateStatus -q .mergeStateStatus
# proceed ONLY on CLEANmergeStateStatus == CLEAN is fail-closed: it stays BLOCKED unless everything clears, including require_conversation_resolution. My hand-rolled filter was fail-open — it defaulted to “fine” whenever my logic missed a case. When the two disagree, the coarse fail-closed gate wins, every time.
So the rule I follow now: when a dumb, broad safety check disagrees with my smart, narrow one, I don’t override the dumb one — I go find out why it’s unhappy. The narrow check encodes what I think matters. The broad check just counts. When they split, the gap between them is usually where my assumption is wrong.
Distrust the filters you wrote yourself the most. They’re the ones built out of your blind spots.