Green CI Is Necessary, Not Sufficient

My agent pushed a fix, watched all the CI checks go green — that’s the automated test suite passing, the little checkmarks GitHub shows you — and pinged me: ready to merge?

And I told it what I always tell it. Wait for CR.

CR is CodeRabbit, an AI bot that reviews pull requests. (A pull request, or PR, is just a proposed batch of code changes waiting to be merged into the real codebase.) Earlier in the PR, CodeRabbit had left a handful of review comments — do this differently, you missed a null check here, this name is confusing. My agent went through, made changes, and marked each thread resolved. Then the tests passed. From the agent’s point of view, the job was done.

Here’s the thing I’ve learned about coding agents, including the good ones: they are a little too eager to call a thread “resolved.” Not lying, exactly. More like an overconfident student who reads three of the four parts of a question, answers those well, and writes “done” at the bottom. The agent believes it addressed the feedback. Sometimes it only addressed the part it found easy to address.

Green CI doesn’t catch that. CI checks whether the code runs and the tests pass. It has no opinion on whether you actually did what the reviewer asked. Those are different questions. A change can be perfectly correct and completely ignore the point of the review comment.

So the merge trigger in my setup isn’t “checks are green.” It’s “CodeRabbit re-reviewed the exact commit I just pushed and posted an APPROVED review.”

That second clause matters more than it looks. I don’t want CodeRabbit’s approval from before the fix — that’s stale. I want it to look at the new HEAD, the latest commit, the one with the actual changes, and sign off on that. The approval has to be keyed to the commit SHA, the unique fingerprint of that specific version of the code.

Why does this work? Because it’s a second, independent reader. The agent that wrote the fix is the worst possible judge of whether the fix is complete — it’s grading its own homework. CodeRabbit comes in cold, re-reads every thread against the new code, and notices when a comment was waved away instead of handled. The independence is the whole point. You can’t self-verify your way out of your own blind spot; you need an outsider’s eyes.

Wiring the re-review gate give me the detail

The mechanism: poll the GitHub reviews API for an APPROVED review whose commit_id matches your post-fix HEAD. Stale approvals on older SHAs don’t count.

HEAD_SHA=$(git rev-parse HEAD)
gh api repos/:owner/:repo/pulls/$PR/reviews \
  --jq "[.[] | select(.state==\"APPROVED\" and .commit_id==\"$HEAD_SHA\")] | length"

A non-zero result is the only thing that flips my agent’s merge flag. Green CI is a separate precondition, AND-ed in — never the trigger by itself. The bot pushes a new review on every new commit, so the SHA match guarantees the approval reflects the code you’re about to merge, not the code from two pushes ago.

If you’re letting agents drive PRs to merge, add this gate. Don’t make green CI the go signal. Make it one of two conditions, and make the second an independent reviewer — human or bot — whose approval is pinned to the post-fix commit. The agent that wrote the code shouldn’t get to decide it’s done.