← All posts

The Daemon Was Running. The Socket Wasn't There.

A per-user systemd service can be perfectly alive and still invisible to its clients — because the directory its socket lives in only exists while you're logged in.

  • systemd
  • linux
  • ai-agents
  • devops
  • daemons

I rolled out moshi-hook to about sixteen Linux boxes and watched systemctl tell me, on every single one, that the service was active. Green across the board. Then I tried to approve an agent action from my phone and nothing happened.

moshi-hook is a little daemon — a background program — that I run on each machine to bridge my AI coding agent to a phone app. When Claude Code wants to do something risky and pauses to ask permission, the hook fires, moshi-hook catches it, and I get a tap-to-approve prompt on my phone over a WebSocket (a live two-way connection that stays open, unlike a normal web request that asks once and hangs up). The point is I don’t have to be sitting at the laptop.

So: service running everywhere, and the hooks couldn’t find it anywhere. That’s the worst kind of broken, because every dashboard says you’re fine.

The hooks talk to moshi-hook through a Unix socket — think of it as a private mailbox file on disk that two programs use to pass messages. moshi-hook puts its mailbox at /run/user/<uid>/moshi-hook.sock. That /run/user/<uid> directory is your personal scratch space, and here’s the part I’d forgotten: it only exists while you’re actually logged in. Log out and the kernel sweeps it away.

My systemd service started at boot, as the login user, with nobody logged in. So /run/user/<uid> either didn’t exist or was a stripped-down version, and XDG_RUNTIME_DIR — the environment variable that’s supposed to point the daemon at that folder — wasn’t set. moshi-hook fell back to putting its socket somewhere else. The hooks looked in the canonical spot, found nothing, and failed silently.

I spent a while suspecting the WebSocket, the firewall, the phone. All red herrings. The daemon was healthy. It was just shouting into a mailbox in a different building.

Two changes fixed it. First, loginctl enable-linger <user> — this tells the system to keep that user’s runtime directory alive even with nobody logged in, like leaving the lights on in an empty office. Second, I pinned Environment=XDG_RUNTIME_DIR=/run/user/<uid> right in the unit file so the daemon stops guessing.

The unit, and the verify step that actually means something give me the detail

The fix in the systemd unit:

[Service]
Environment=XDG_RUNTIME_DIR=/run/user/1000
ExecStart=/usr/local/bin/moshi-hook

And once, as root:

loginctl enable-linger youruser

The real lesson is the health check. systemctl is-active only proves the process is alive — it tells you nothing about whether the socket exists where clients look. So I check both:

systemctl --user is-active moshi-hook \
  && [ -S /run/user/1000/moshi-hook.sock ] \
  && echo "reachable" || echo "running-but-deaf"

The [ -S ... ] test (-S = “is this a socket?”) is what separates “running” from “actually reachable.” Run it on every box after deploy.

The general trap: any user-scoped daemon you run as a system service depends on a runtime directory that login normally creates for you. Take login out of the loop and the floor disappears. So enable lingering, pin XDG_RUNTIME_DIR, and never trust a health check that only asks whether the process is breathing — ask whether anyone can reach it.