← All posts

My Merge Gate Counted Comments Instead of Asking GitHub

An automated merge gate let a MAJOR code-review finding through because it trusted proxy signals instead of querying the actual thread state.

  • ai-devflow
  • code-review
  • github
  • automation
  • coderabbit

The PR consolidated our UIQ polling — a cleanup, nothing scary — and my AI devflow merged it. Then I went and read the diff myself, because something nagged me, and found a CodeRabbit comment still sitting open: a MAJOR finding flagging an except: pass that was silently swallowing a database-probe exception. CodeRabbit is the bot that reviews our pull requests and leaves comments by severity; MAJOR is one notch below the worst. So the gate that’s supposed to stop bad merges had waved through a PR with a known, unresolved, high-severity problem — and the problem was, of all things, code that silently swallows failures. The gate did the exact thing it was protecting against.

Here’s the embarrassing part. My merge gate — the bit of automation that decides “is this PR safe to merge?” — wasn’t actually checking whether the review threads were resolved. It was checking two things that feel like they mean “resolved” but don’t: a rubric-pass flag (an internal checklist score) and the comment counts on the PR. The logic was roughly “rubric passed, comments look handled, ship it.”

Comment counts are a proxy. A thread can have plenty of comments and still be wide open. A rubric can pass while a specific reviewer thread is unaddressed. These signals correlate with mergeability — most of the time a clean rubric does mean a clean PR — so the gate looked like it worked. Right up until the one time the correlation broke.

And it had broken before. We’d had an earlier audit incident where a count-based check missed an unresolved finding, and I’d apparently patched the symptom without fixing the actual blind spot. So this was a recurrence, which stings more than a fresh bug.

The thing about a proxy is it’ll pass your tests on every day except the day you needed it.

The fix was to stop guessing and ask the source of truth. GitHub knows exactly which review threads are resolved and which aren’t — it’s right there in the GraphQL API under reviewThreads, with isResolved and isOutdated booleans per thread. So the gate now queries that directly, filters to threads that are unresolved AND not outdated, and if any of them carry a MAJOR or CRITICAL severity, it blocks the merge. No rubric flag, no comment arithmetic. The authoritative state or nothing.

Querying review-thread state instead of counting comments give me the detail

The GraphQL query returns the real resolution state per thread:

query($owner:String!,$repo:String!,$pr:Int!){
  repository(owner:$owner,name:$repo){
    pullRequest(number:$pr){
      reviewThreads(first:100){
        nodes{ isResolved isOutdated
          comments(first:1){ nodes{ body author{login} } } }
      }
    }
  }
}

Gate logic: keep threads where isResolved=false AND isOutdated=false, parse the CodeRabbit severity tag from the first comment body, and block if any surviving thread is MAJOR/CRITICAL. Outdated threads are excluded so a since-rewritten line doesn’t block forever — but resolution status is read from GitHub, never inferred from a comment tally.

The general move: when you build a gate on top of someone else’s system, find the field that is the answer and read that field. If you’re deriving the answer from things that merely travel alongside it — counts, scores, “looks handled” — you’ve built a gate that passes until the day the proxy and the truth disagree, which is precisely the day you built the gate for. Go look at your own automated checks and ask each one: am I reading the state, or am I guessing at it?