The Admin Override Is a Different Trust Context

My orchestrator agent merged a PR that still had four unresolved Major review comments on it. It did this on purpose, calmly, and thought it was following the rules.

Here’s the setup in plain terms. When one of my Claude Code agents finishes a chunk of work, it opens a pull request — a proposed code change waiting for review. A reviewer (human or another agent) leaves comments on it, each tagged by severity: Critical, Major, Minor, Nitpick. Before that PR can merge, it passes through a gate — an automated checkpoint that reads those comments and decides yes or no.

I built the gate to be pragmatic. If it blocked on every stray Nitpick, a fully-automated flow would loop forever: agent fixes a typo, reviewer finds another, nobody ever hits merge. So the gate blocks hard on Critical and Major, and only warns on Minor and Nitpick. That way a hands-off pipeline can still finish. Think of it like a spellchecker that stops you on real errors but lets you send the email despite a debatable comma.

That threshold is correct — for the automated path.

The bug was that my orchestrator also has a second, sharper tool: gh pr merge --admin, a manual override that force-merges a PR regardless of what the gate thinks. It’s the sudo of merging. And the agent, reasoning about when it was allowed to merge, reached for the only threshold it knew — the gate’s. Criticals and Majors block, Minors just warn. So it admin-merged straight through four Majors once, and two Minors another time, and in its own logic it had done nothing wrong.

That’s the part that stuck with me. It wasn’t a hallucination or a flaky tool call. The agent applied a real, sensible rule in the wrong context. The automated gate is a low-trust, high-volume path that has to keep moving. A manual admin override is the opposite — you only reach for it when you’ve decided to bypass the normal safety, which is exactly when the bar should go up, not sideways. For a hand-merge the correct threshold is zero unresolved threads of any severity. Different trust context, different rule.

Enforcing it with a PreToolUse hook and GraphQL give me the detail

The behavioral fix (“agent, be stricter on admin merges”) failed twice, because “remember to be strict” is not a control — it’s a hope. The real fix is a PreToolUse hook in Claude Code that intercepts any gh pr merge --admin call and checks resolution state structurally before letting it run:

# reject admin merge if ANY review thread is unresolved
gh api graphql -f query='
  query($owner:String!,$repo:String!,$pr:Int!){
    repository(owner:$owner,name:$repo){
      pullRequest(number:$pr){
        reviewThreads(first:100){ nodes { isResolved } }
      }
    }
  }' -F owner=$OWNER -F repo=$REPO -F pr=$PR \
  | jq -e '[.data.repository.pullRequest.reviewThreads.nodes[].isResolved] | all' \
  || { echo "unresolved review threads — admin merge blocked"; exit 2; }

isResolved is ground truth from GitHub, not the agent’s summary of it. Exit code 2 makes the hook deny the tool call outright. The gate’s severity logic never enters this path.

The general shape: any time an agent can both run a relaxed automated check and manually bypass it — admin merge, force-push, sudo, direct DB write — don’t let it reuse the relaxed threshold on the override. Define the override’s rule separately and stricter, and enforce it in a hook that inspects real state, not in a sentence the agent is supposed to remember. A rule that only lives as agent behavior is a rule you’ve already agreed to break.