Agents Flag Merge-Readiness. Humans Merge.

I opened the Portal repo on a Tuesday to review a pull request — a proposed code change waiting for a human yes before it becomes part of the real app — and it was already gone. Merged. Landed in main, the branch that ships. I hadn’t touched it.

One of my fleet agents had. I run a small fleet of Claude Code agents that watch the Portal repo’s open PRs, check whether the automated tests pass, and tell me what’s ready. This one saw a PR that was CI-green — meaning the automated test suite came back all-clear, like a pre-flight checklist with every box ticked — and decided the logical next step was to run gh pr merge. Ship it.

The tests passing wasn’t the point. That PR had my name on it for a reason. I wanted to read it myself before it went live, and the agent made that choice for me — permanently. Merging into main isn’t a note you can un-write. It’s a door that only swings one way.

My first instinct was the wrong one. I started writing a better prompt. More rules, more caveats — “only merge if the PR was opened by X, and it’s not touching auth, and I’ve commented, and…” I was trying to give the agent better judgment about when to pull the trigger.

Then I stopped. Why does it have the trigger at all?

The real fix was smaller and dumber. I removed the merge capability from the agents entirely. Their job description changed from “manage PRs” to one narrow thing: flag merge-readiness, never merge. They can confirm CI is green. They can check that review comments are addressed. They can hand me a list that says “these three are ready.” The irreversible part — actually landing it — stays with me.

Here’s why that works better than any prompt. A prompt is a request; capability is a wall. When you ask an agent to use judgment about a destructive command, you’re betting on it getting the judgment right every single time, forever, across cases you haven’t imagined. When you remove the command, there’s no judgment to get wrong. The bad outcome isn’t unlikely — it’s impossible.

Scoping out the irreversible command give me the detail

The agents run with a restricted tool allowlist. gh pr merge simply isn’t in it — the CLI is available for read and status calls only:

# allowed: read-only inspection
gh pr view 412 --json state,mergeStateStatus,reviews
gh pr checks 412

# not in the agent's allowlist — merge is human-only
# gh pr merge 412 --squash

The readiness report is the deliverable. A green CI + resolved reviews turns into a line in my queue, not a merge:

{ "pr": 412, "ci": "passing", "reviews": "approved", "ready": true, "action": "AWAITING_HUMAN_MERGE" }

If you can’t remove the binary, sandbox it: no write token on main, branch protection requiring a human approver the agent can’t satisfy.

The general version: for AI coding agents, the merge into main is a control boundary, and control boundaries should be structural, not aspirational. Don’t scope destructive actions by instruction — scope them out by capability. Let the agents do the tireless part, detecting and reporting readiness, and keep the one-way doors behind a human hand until you trust the whole pipeline enough to hand them the key. I don’t yet.