← All posts

A Comment Count Is Not a Merge Verdict

How a PR merged past two unresolved CodeRabbit MAJOR threads because I trusted the wrong GitHub API — and the GraphQL query that fixed it.

  • github
  • ai-agents
  • code-review
  • graphql
  • automation

A pull request merged one night with two unresolved CodeRabbit threads still flagged MAJOR. CodeRabbit is an AI reviewer that leaves inline comments on a PR — a pull request, the bundle of code changes someone wants to merge into the main project. Two of its comments said this is a real problem and nobody had addressed them. The PR merged anyway. My audit had said it looked fine.

I do a night-shift role auditing PRs that an AI coding agent produces, before a human approver — who’s asynchronous, often asleep — gives the final merge. My job is to snapshot the state of each PR so the morning review is fast. The snapshot I took that night was honest. It was just answering a different question than the one that got asked.

Here’s the trap. To count the conversation on a PR, the obvious move is GitHub’s REST /comments endpoint — REST being the simple, ask-a-URL-get-a-list flavor of GitHub’s API. It returns comments. So I counted them and noted the state at that moment. Call it T0, time zero.

But /comments counts top-level comments. It does not know anything about review threads — the little resolvable conversations attached to a specific line, the things with a “Resolve” button. A thread can be wide open and screaming and never move that comment number in a way that means “unresolved.” I was counting envelopes and reporting on whether the letters inside were answered.

The second mistake was time. A T0 snapshot is a photograph. Thread state drifts in seconds — someone resolves one, the agent pushes a commit that outdates another, CI flips. By the time anyone reads the photo, it’s describing a PR that no longer exists. My baseline got skim-read as a merge acknowledgment, and a photo of a calm intersection got treated as permission to drive through it.

The fix was to ask the API layer that actually exposes the thing I was gating on. GitHub’s GraphQL API has reviewThreads, with isResolved and isOutdated fields. The gate is: count threads where isResolved == false AND isOutdated == false, and require that count to be zero — plus CI green, plus mergeStateStatus == CLEAN — all re-queried at the instant before merge, not from any earlier snapshot.

The GraphQL gate that replaced the comment count give me the detail

REST /issues/{n}/comments answers “how many comments” — wrong semantics. Query the field that means what you’re gating on:

query($owner:String!, $repo:String!, $pr:Int!) {
  repository(owner:$owner, name:$repo) {
    pullRequest(number:$pr) {
      mergeStateStatus           # require CLEAN
      reviewThreads(first:100) {
        nodes { isResolved isOutdated }
      }
    }
  }
}

Gate logic, run atomically right before the merge call:

blocking = [t for t in threads if not t.isResolved and not t.isOutdated]
ok = len(blocking) == 0 and mergeStateStatus == "CLEAN" and ci_green

isOutdated matters: a thread on code that’s since been rewritten shouldn’t block. Re-query inside the same step that merges — never trust the audit snapshot.

The general principle is boring and I keep relearning it: a point-in-time baseline does not generalize to “ready now.” If you’re going to act on a condition, query the API that exposes the exact semantics of that condition, and re-check every gate condition together at the moment you act — because anything you measured earlier is already stale.