← All posts

zsh Doesn't Split Your Variables, and Silence Is Not Success

An 8-hour SSH monitor that ran perfectly and did absolutely nothing, because a bash habit doesn't survive in zsh.

  • zsh
  • shell
  • monitoring
  • ssh
  • debugging

I armed an 8-hour monitor on a remote server to watch an authentication flow, walked away, and came back to heartbeats — the little “still alive, here’s what I see” status pings the monitor emits — full of question marks. Every field it was supposed to fill in read ?. The monitor itself looked healthy. It had rearmed itself three times, right on schedule, cheerfully reporting nothing.

If you’re not a shell person: I’d written a small script that logs into another machine over SSH (a remote terminal connection), runs a check, and reports back every few minutes. The script ran fine. It just wasn’t actually checking anything. And it had no idea.

Here’s the line that betrayed me. I stored the SSH command in a variable to keep things tidy, a pattern I’ve typed a thousand times in bash:

SSH="ssh -o ConnectTimeout=10 user@host"
$SSH "run the real check"

In bash, $SSH gets word-split — the shell chops that string on spaces into ssh, -o, ConnectTimeout=10, and so on, exactly as if I’d typed them. In zsh, it does not. zsh takes the whole unquoted string and goes looking for a single program literally named ssh -o ConnectTimeout=10 user@host, spaces and all. No such program exists. It fails with exit code 127, prints nothing useful, and moves on.

That was the whole disaster. macOS switched its default shell to zsh years ago, so a script I’d have sworn was portable quietly changed meaning the moment it ran in a login shell instead of the bash I tested it in.

The fix is small once you know the rule. zsh has explicit split syntax:

${=SSH} "run the real check"

The = tells zsh yes, actually split this on whitespace. Or skip the variable entirely and write the command inline, or wrap it in a function. Any of those work.

But the variable wasn’t the real bug. The real bug was that my monitor treated silence as success. When the ssh call produced no output, the monitor shrugged and filled the heartbeat with ? instead of screaming. Three rearm cycles — hours — burned before I noticed, because nothing ever went red. A monitor that can’t tell “everything’s fine” from “I checked nothing” isn’t a monitor. It’s a clock.

So now every long-running script I arm starts with a probe: run the real command once, up front, and exit non-zero if the output is garbage. A config bug surfaces in five seconds at startup, loudly, instead of hiding inside the first broken heartbeat. If your health check can pass while doing nothing, it will — and it’ll pick the least convenient moment to tell you.

The startup probe give me the detail

The pattern that saved me is a fail-fast preflight before the monitor loop arms. It runs the exact command the loop will run, and refuses to start on empty or malformed output:

SSH_ARGS=(ssh -o ConnectTimeout=10 user@host)

probe() {
  local out
  out="$("${SSH_ARGS[@]}" 'auth-check --format=json' 2>/dev/null)"
  if [[ -z "$out" || "$out" != \{* ]]; then
    print -u2 "PROBE FAILED: got '${out:-<empty>}'"
    exit 1
  fi
}

probe   # dies in seconds if ssh no-ops

Note the array SSH_ARGS=(...) — in zsh, "${SSH_ARGS[@]}" expands to properly separated words without any of the ${=VAR} splitting ambiguity, and it survives paths with spaces. Arrays are the portable answer; string-splitting is the trap. The != \{* check means “if this doesn’t start with a {, it’s not the JSON I expected” — exit 127 from a failed command produces empty output, which this catches immediately.