The boss came back with a clean little report: 22 panes were stuck. Each one had hit the rate-limit dialog — Claude Code’s “you’ve used up your budget for now” wall — and was sitting there frozen, waiting. The boss laid it all out neatly and asked me what I wanted to do.
I run a fleet of Claude Code sessions, each in its own tmux pane. (tmux is just a way to keep a bunch of terminal windows alive side by side on one machine.) Over the top of them sits a “boss” agent whose whole job is to watch the panes and keep them running. When one hits a limit — like a phone plan that throttles you after you blow through your monthly data — the boss is supposed to rescue it.
So I looked at this tidy report of 22 dead panes and felt something curdle. It had done the detection perfectly. Then it stopped and asked me to bless the obvious.
That was my mistake, baked into how I’d built it. I’d treated “this pane is rate-limited” as a finding to escalate. But there’s nothing to decide here. Dismissing a dialog and relaunching a session isn’t a judgment call — it’s a chore. Asking me to approve it is like a smoke detector that beeps and then waits for you to authorize the alarm.
The fix was to widen the boss’s charter: mechanical recovery happens silently, no permission slip. When a pane is walled, the boss clears the dialog — Escape, then Ctrl-C, then Enter — and relaunches the same session by its UUID, so the agent picks up exactly where it left off. The relaunch goes through a small launcher I call cl, which checks which of my accounts still have budget and routes to one that isn’t capped, falling back to a different provider when they all are.
Here’s the line I now draw on every agent action: is this a decision or a recovery? A decision changes something in the world that I’d want a say in — deleting data, spending money, shipping code. A recovery just restores a known-good state. Decisions escalate. Recoveries execute.
Get this wrong in the cautious direction and you drown yourself in approve-this prompts for things the system already knows how to fix. The whole point of a supervisor is that I don’t watch it. The moment it asks me to confirm a chore, I’m back to babysitting — which is the job I built it to take.
The recovery loop and account-balancing launch give me the detail
The boss polls each pane’s captured output for the rate-limit string. On a match, it runs the dismiss-and-resume sequence rather than reporting:
# clear the dialog, then resume the exact session
tmux send-keys -t "$pane" Escape
tmux send-keys -t "$pane" C-c
tmux send-keys -t "$pane" Enter
sleep 1
tmux send-keys -t "$pane" "cl --resume $SESSION_UUID" Entercl wraps the launch with an account check before it hands off:
# balance --why prints each account's remaining 5h/weekly budget
acct=$(balance --why | awk '$2=="ok"{print $1; exit}')
[ -z "$acct" ] && acct=$(fallback_provider) # all capped → switch provider
exec claude --resume "$SESSION_UUID" --account "$acct"Resuming by UUID matters — a fresh claude loses the conversation; --resume keeps the agent’s working context intact across the account swap.