A change went to production that shouldn’t have — an unreviewed migration and a semantics change rode the pipeline straight to live. So we did the responsible-looking thing. We froze deploys. Pipeline disabled, a watchdog set to keep it disabled, the whole flow held while we built a proper gate.
It felt like control. It was the wrong instrument, and I enforced it perfectly for days before I noticed.
How a freeze rots
The freeze did its job at first. Then it did more than its job. It blocked everything — including a copy change that had been reviewed, visually checked, and approved, whose only sin was that a file path happened to contain a word the crude risk-detector matched on. A safe, trivial, wanted change sat blocked behind a blanket halt meant for migrations.
I had optimized for holding the gate correctly and never stepped back to ask whether a durable, blanket gate was the right thing at all. That’s the trap: a freeze is easy to enforce and feels safe, so it stops being a stopgap and becomes the status quo. A freeze with no expiry that blocks safe changes isn’t safety. It’s a defect wearing a safety vest.
Safety belongs in the flow
The default has to be that changes flow to production continuously. The burden of proof is on blocking a deploy, never on allowing one. And the way you make a continuous flow safe is not a halt — it’s checks that run as steps in the pipeline and stop a deploy only when they find real risk:
- Narrow, automated risk detection. Gate on the things that are actually dangerous — schema migrations, data-semantics changes, anything touching sensitive data. Let copy, UI, and ordinary code flow. A detector that blocks a text change because the path contains a scary word is the blanket freeze in miniature; false positives on safe changes are the whole problem, shrunk.
- Real review of the risky surface, with evidence. For anything user-visible, that means actually opening it — not “the page returns 200,” but confirming the real behavior, with a screenshot.
- Impact analysis on the diff, so the blast radius is understood before it ships, not after.
None of those stop the flow by default. They stop one deploy when there’s a real reason.
When a freeze is legitimate
Almost never, and only as an emergency instrument. If you ever do reach for one, it has to (1) name the specific risk class it blocks — not “all deploys,” (2) carry an explicit expiry or lift-condition, and (3) be replaced by an in-flow check as fast as you can build one. A freeze with no expiry and no named risk is rejected on sight.
The deeper habit this taught me: be suspicious of any default-to-blocking. When the team — or you — reaches for a blunt halt instead of building the safety into the flow, push back. Enforcing the wrong instrument correctly is still wrong. The goal was never “nothing bad ships.” It was “good things ship continuously and bad things get caught on the way.” Those are different goals, and only one of them is a freeze.
What the in-flow version looks like
Concretely, each gate is a step in the pipeline — for me, a GitHub Actions job that conditions on the content of the diff, not a blanket switch:
# run the risk gate only when the diff actually touches a risky surface
- name: risk gate
run: ./scripts/risk-gate.sh # scans changed paths for migrations / data-semantics / PII
# UI changes get a real-browser visual check with a screenshot attached to the PR
# everything else flows straight through to the deploy step
The detector is narrow on purpose: it matches schema migrations, data-semantics changes, and anything touching sensitive fields — and it pattern-matches on meaning, not a scary word in a path (the false-positive that started this whole story was a copy change blocked because the filename contained consent). Copy, UI, and ordinary code never trip it. The site you’re reading deploys this way: push to main, Cloudflare Pages builds and ships, and the only thing that can stop a deploy is a check that found a real reason.
Built on: GitHub Actions for the pipeline gates · Cloudflare Pages for continuous deploys. The principle is straight out of Accelerate / DORA — elite teams deploy continuously and have lower change-failure rates; the two aren’t in tension, which is the whole point.