I run a multi-account load balancer for Claude Code — a launcher called cl that routes agent sessions across several Anthropic accounts based on available headroom. When one account’s rate limit is close to exhausted, the next session goes to a different account. The constraint that shapes everything: Anthropic rate limits are per-account, so the balancer has to decide what each API error actually means for the routing decision.
The bug class that kept biting me: trusting the HTTP status code to make that decision.
Here is the specific trap. The Anthropic API returns a 402 in two situations that call for opposite responses. If an account is genuinely out of credits, the body says something like “insufficient credits” — the right move is to rotate that session to a different account. But 402 also comes back for a periodic rate limit, where the body says “try again in 30 seconds.” Rotating accounts on that one is exactly wrong: you burn a context switch, you load a fresh account, and in thirty seconds the original account would have been fine. I was treating every 402 as a billing failure.
The mirror trap is 5xx. A 500 is usually a transient server blip — retry and it’ll pass. But some 5xx responses are deterministic: the body says unsupported_parameter. That request will fail identically every time you send it. Retrying it is just a tight loop hammering the API with a request it will always reject. My balancer was patiently retrying those.
The fix in both cases is the same: read the body before you decide. The status code is a bucket. The body is the actual reason. In the multi-account balancer, this distinction is load-bearing — wrong classification means a wasted rotation or an infinite retry, not just a failed call.
Three rules the balancer now follows:
- A 402 is not a rotation signal until the body confirms it’s a billing error. If the body has a retry phrase, it’s a rate limit — back off, wait, stay on the same account.
- Not every 5xx is retryable. If the body names a bad parameter, fail fast and surface it rather than looping.
- Match deterministic-rejection patterns before retry-eligible ones. Error strings overlap in the wild — a phrase like
max_tokenscan appear in both a “malformed request” error and a “context too long” error. If you test the retryable pattern first, the permanent failure sneaks through as transient.
the ordered classifier give me the detail
The implementation that avoids the overlap trap is a short ordered table: (pattern, action) pairs where deterministic-rejection rules come first and you return on the first hit. The balancer uses this to decide between ROTATE, BACKOFF, RETRY, and FAIL before touching any routing logic.
import re
RULES = [
# deterministic — never retry, even on a 5xx
(r"unsupported_parameter|invalid_request|not supported", "FAIL"),
# rate limits — back off, stay on this account
(r"rate.?limit|try again|retry.?after|temporarily", "BACKOFF"),
# genuine billing — now it's safe to rotate to another account
(r"insufficient|quota exceeded|out of credits", "ROTATE"),
]
def classify(status: int, body: str) -> str:
text = (body or "").lower()
for pattern, action in RULES: # order matters: FAIL must beat BACKOFF
if re.search(pattern, text):
return action
return "RETRY" if 500 <= status < 600 else "FAIL"Order is load-bearing here. Because max_tokens appears in both a malformed-request body and a context-overflow body, the FAIL rule must win the tie — otherwise context-overflow looks retryable and you loop forever.
One practical note: don’t write these patterns from the documentation. Capture the actual bodies your provider sends with curl -i during real errors, pin a few as fixtures, and assert against them. Every vendor words these differently, and the strings are all you’ve got.
The through-line applies whether you’re routing agent sessions across accounts or just calling any third-party API: a status code tells you the bucket, not the decision. Two errors in the same bucket can need opposite responses — one wants a patient wait, the other an immediate stop, the third a full context rotation. The body is where the difference lives. Parse it before you route.