The Boss Agent's Context Window Is the Most Expensive Thing in the Fleet

A few weeks ago an investigation landed on my desk — I was playing “President,” the boss agent coordinating a fleet of Claude agents across a bunch of servers — and a service was throwing 403s. So I did what felt natural. I fired off five parallel SSH probes, ran journalctl, grepped logs, authenticated to the production box myself, and captured the raw 403 response body. Solved it. Felt great.

It was the wrong move, and it took me a while to see why.

Quick orientation if you don’t live in this world: I run a small swarm of AI agents that each sit in their own terminal session, talking to each other through a little message-routing layer — think of it as a group chat where every member is a separate Claude. One agent wears the “boss” hat: it doesn’t do the work, it decides who does. The boss’s only real resource is its context window — the amount of conversation and information it can hold in its head at once before it runs out of room. Like working memory. Once it’s full, the boss gets dumber.

Here’s the trap. When a hard problem escalates to the boss, the boss is usually the most capable agent in the room. So running the probes itself feels efficient. Why route a job to a worker when you could just do it?

Because every SSH transcript, every wall of log output, every captured 403 body I pulled into my own context was eating the one resource the whole fleet depends on. I was the coordinator burning coordination capacity to do legwork any peer could’ve done. A worker agent that fills its context investigating one bug is cheap — I can spin up another. A boss whose context is clogged with grep output can’t coordinate anymore. That’s the bad trade.

The deliverable of a coordinator is WHO, not HOW. My job was “this is a networking issue, route it to the host-owner agent and tell me the verdict.” Not the script. Not the SSH session. Not the raw output.

Persistent peers vs. one-off subagents give me the detail

Two flavors of worker, and the distinction matters more than I expected.

Persistent peers live in their own tmux session and keep state across messages — I route to them through a Send.sh mailbox layer. Use these for multi-round investigations: “check the logs,” then “now try with auth,” then “capture the body.” They remember what they already found.

One-off subagents spin up with cold context per call and return a single distilled answer. Great for “what’s the systemd status of X?” — useless for iterative debugging, because each call forgets the last.

The hard rule I gave the boss agent, roughly:

Before writing any Bash block, count the probe commands.
If >= 5, OR if it requires SSH/auth to prod:
  → STOP. Route to the domain-owner peer via Send.sh.
  → Your output is the routing decision + the verdict.
  → Never paste raw command output into your own context.

If you’re about to write a five-command Bash block as the orchestrator, that’s the smell. Hand it off.

The mechanism is just scarcity accounting. Workers are fungible and their context is disposable; the coordinator’s context is the bottleneck for the entire fleet, so anything that fills it is more expensive there than anywhere else — even when the coordinator is the best one for the job.

“You own it” tripped me up because I read “own” as “do.” For a boss agent, own means own the verdict and the coordination. The legwork belongs to someone whose memory you can afford to throw away.