The Universal Algorithm: How One Framework Scales from Bug Fixes to Building Companies

I was staring at a tmux pane in Claude Code, watching an agent loop through the same bug fix for the fourth time—re-observe, re-think, re-plan—and I realized it was following the exact same sequence I’d used three hours earlier to design a database migration. Same pattern. Different scope. The only thing that changed was how many times it looped.

That was the moment the Universal Algorithm stopped being a whiteboard idea and became something I could actually build.

I’d spent six months on Personal AI Infrastructure (PAI) v2.1—an agent execution system that runs the same algorithm whether you’re fixing a typo in a config file or laying out a multi-year product strategy. The breakthrough wasn’t inventing a new algorithm. It was noticing that the algorithm was already there, hiding in plain sight inside how any competent person actually solves problems. I just had to scrape the barnacles off and make it explicit.

I didn’t set out to build a universal algorithm

I set out to stop my Claude Code agents from getting stuck in loops. They’d lose context mid-task, forget what they’d already tried, and re-execute the same wrong approach. I needed a way to track what “done” meant—not “feels done” or “looks good enough,” but mechanically verifiable.

So I started writing down what I did when I debugged something. Observe the error. Understand the code. Plan a fix. Implement it. Verify it works. Learn something for next time.

Then I wrote down what I did when I designed a feature. Observe the requirements. Think through constraints. Plan the architecture. Build components. Verify integration. Learn what surprised me.

It was the same list. Just at a different zoom level.

That list became the seven-phase loop: OBSERVE → THINK → PLAN → BUILD → EXECUTE → VERIFY → LEARN. Nothing about it is novel. It’s just the scientific method, restated for software. What made it useful was attaching a scoring system to it.

The gap that matters is between current and ideal

The core of it is stupidly simple: every task is a gap between where you are and where you want to be. You close that gap through verifiable iteration. That’s it.

Each iteration runs seven phases:

graph LR
    OBSERVE[OBSERVE<br/>Current State] --> THINK[THINK<br/>Analyze Gap]
    THINK --> PLAN[PLAN<br/>Define Steps]
    PLAN --> BUILD[BUILD<br/>Create Artifacts]
    BUILD --> EXECUTE[EXECUTE<br/>Take Action]
    EXECUTE --> VERIFY[VERIFY<br/>Measure Progress]
    VERIFY --> LEARN[LEARN<br/>Update Context]
    LEARN --> OBSERVE

OBSERVE is about ruthless honesty. Not where you think you are—where you actually are. “The API is slow” is not an observation. “API p95 latency is 2.3 seconds, target is 500ms” is an observation. The difference is that one of those you can verify, and one of those is a feeling you have.

THINK surfaces options without making decisions. Root cause analysis, constraint mapping, risk assessment. The trap I kept falling into early on was treating THINK as the decision phase—I’d land on the first plausible explanation and race into building. THINK should produce a menu, not an order.

PLAN converts the menu into steps with explicit success criteria. Three questions: what are we building, how do we know it works, and how much effort is this going to take. Plans nest: a company-building plan contains system-architecture plans, which contain feature-development plans, which contain bug-fix plans. Same structure the whole way down.

BUILD creates the artifacts—code, tests, configs, documentation, pitch decks. It’s separate from EXECUTE because I learned the hard way that writing a migration script and running it against production are different activities that belong in different phases. Build things, then deploy them.

EXECUTE is the only phase that changes external state. Everything else operates on information. You run the migration here. You ship the code here. You send the email here.

VERIFY is where most agent systems fall apart. Every criterion from PLAN is a boolean condition—it either passes or it doesn’t. No “probably done.” VERIFY produces a gap analysis: which criteria are met, which aren’t, what’s the delta. If your criteria were any good, this step takes about thirty seconds.

LEARN converts the iteration’s results into knowledge the next iteration can use. What worked? What surprised you? Which assumption was wrong? If you skip this phase you will re-make the same mistake in iteration N+1, and I can tell you from experience that it’s infuriating.

ISC: the thing that makes “done” provable

Ideal State Criteria (ISC) is the mechanism that makes this more than a productivity framework. Every task defines upfront what “done” means as a set of boolean conditions.

Here’s a real ISC set from a database migration I ran:

New schema deployed to staging (checked via SQL query)
All user records migrated (row count matches source table)
Data integrity checks pass (zero mismatches found)
Rollback procedure verified (returns data to original state)

After each iteration, I evaluate all criteria and update the tally. Progress is mechanical: completed criteria divided by total criteria equals percentage done. When that number hits 100%, the task is done. Not “probably done.” Provably complete, because every success condition you defined at the start is now true.

I call the loop wrapper the Ralph Loop—named after the principle of relentless pursuit until verified completion. One full cycle of seven phases, then check ISC. If criteria remain, loop again. It continues until either all ISC is satisfied (success), an unresolvable blocker appears (escalate to human), or max iterations is reached (safety valve for time-boxed tasks). For what I call DETERMINED tasks—building a company, say—max iterations is effectively infinite. The loop persists across days, weeks, months. However long it takes.

Effort classification actually controls system behavior

Not every task needs the same horsepower. PAI defines five effort levels, and they’re not abstract—they determine which AI model the system picks for each phase, how often it checkpoints state, and when multi-agent coordination kicks in.

the mechanism — memory tiers, hooks, and a concrete snippet give me the detail

The three memory tiers aren’t metaphorical — each maps to a real storage layer chosen for its access characteristics:

Hot (Redis, short TTL) — current iteration state, active ISC status. Sub-millisecond reads keep the phase loop tight. A simple HSET task:{id} phase THINK isc_done 2 isc_total 4 is enough for TRIVIAL tasks.
Warm (PostgreSQL, indexed) — the last N iterations’ LEARN outputs plus the full decision log. Querying SELECT learning FROM iterations WHERE task_id = $1 ORDER BY n DESC LIMIT 10 gives the agent its rolling short-term memory on resume.
Cold (pgvector or a graph store like Neo4j) — completed task archives and pattern libraries, retrieved by semantic similarity. A pgvector cosine search over LEARN-phase summaries lets a new task inherit the optimization strategy from a structurally similar past task without the agent re-discovering it.

The hook system is a thin middleware chain around each phase transition — think Express-style middleware but for algorithm steps. A pre-EXECUTE hook can gate production changes behind a human-approval check; a post-VERIFY hook can push ISC progress to an external dashboard. Hooks are registered per effort tier: TRIVIAL tasks skip the approval gate, THOROUGH and above require it.

Try it yourself. The ISC tracking pattern is easy to prototype with any key-value store:

# Store ISC criteria and check completion — works with redis-cli or any Redis client
redis-cli HSET task:demo isc_total 4 isc_done 0
redis-cli HINCRBY task:demo isc_done 1          # after each verified criterion
redis-cli HGET task:demo isc_done               # → 1
# Loop condition: task is done when isc_done == isc_total
redis-cli EVAL "local t=redis.call('HGETALL',KEYS[1]) return t" 1 task:demo

Run this alongside any iterative task and you get mechanical progress tracking — no “feels done”, just a ratio.

Three case studies, same algorithm

The best way to show this works is to trace it at radically different scales.

A bug fix: 4 iterations, ~2 hours

API requests timing out intermittently under load.

Iteration 1 — OBSERVE: p95 latency 8 seconds, timeout set at 5 seconds. The errors clustered under load, which pointed toward a slow query. THINK: hypothesis—missing database index causing full table scans. PLAN: four ISC criteria (identify slow query, add index, verify p95 under 500ms, zero timeouts across one hour of traffic). BUILD: slow-query analysis script, index schema design. EXECUTE: ran analysis against production logs. VERIFY: ISC 1 done (found the culprit: orders table, no index on created_at, scanning 2M rows). LEARN: write cost will increase slightly, but the trade-off is worth it.

Iteration 2 — OBSERVE: confirmed the gap. THINK: compound index on (user_id, created_at). PLAN: test in staging first. BUILD: migration script with rollback. EXECUTE: ran migration in staging. VERIFY: ISC 2 done. LEARN: staging shows 95% latency drop (8s → 400ms).

Iteration 3 — Deploy to production during low-traffic window. ISC 3 done (p95 now 420ms in production).

Iteration 4 — One hour of monitoring data. Zero timeouts across 2,847 successful requests. ISC 4 done. Task complete.

The whole thing took about two hours. Nothing heroic—just structured iteration with a scoreboard.

A microservice: 28 iterations, 3 weeks

Now the scale jumps: designing a real-time event processing service.

Early iterations (1–5) were pure OBSERVE and THINK: 50k events/second peak, p99 latency under 100ms, 99.99% uptime, 30-day data retention, integration with existing Kafka, PostgreSQL, and Redis. THINK surfaced a constraint I hadn’t initially factored in: the team knew Node.js but had zero operational experience with anything else, so introducing a new language would create learning overhead that ate into the timeline.

Iterations 6–10 were PLAN and BUILD: architecture diagrams, OpenAPI specs, data models, Terraform for infrastructure, CI/CD pipeline. Seven ISC criteria covering throughput, latency, uptime, persistence, failover, backward compatibility, and observability.

Iterations 11–20 executed incrementally: Kafka consumer first, then state management, then the REST API layer, then monitoring. Each batch verified a subset of ISC. After iteration 13, events were persisting. After iteration 16, p99 was at 120ms—close, but not under 100ms. Not done yet.

Iterations 21–25 were optimization. VERIFY had found the real bottlenecks: database writes were blocking the event loop, and the Kafka consumer had serialization overhead. Switched to batch writes and Protocol Buffers, added consumer parallelism. By iteration 25: 52k events/sec, p99 at 88ms. Criteria 1 and 2 met.

Extended testing over iterations 26–28: seven-day load test at production volume, simulated node failures, security audit. Final results: 99.993% uptime (exceeding the 99.99% target), zero data loss across ten simulated failovers. All ISC met. Three weeks total.

Same algorithm. More iterations.

A company: 200+ iterations, 12 months

This is where it gets interesting. I applied the same loop to building a company from idea to launch. Healthcare scheduling—a $2 billion market where 40% of providers were unhappy with existing tools.

Iterations 1–50 were market research, customer interviews, competitive analysis. The THINK phase identified a real opening: existing solutions were built pre-mobile, pre-AI. The PLAN defined five company-level ISC: five paying customers at launch, $50k monthly recurring revenue, sub-10% monthly churn, Net Promoter Score above 50, validated product-market fit. EXECUTE meant fundraising, incorporating, hiring. VERIFY after this phase: $2M seed round closed, three engineers hired.

Then the algorithm recursed. Each major feature became its own child task with its own ISC and Ralph Loop. The core scheduling engine (35 iterations), the mobile app (42 iterations), the integration platform (18 iterations). Each child’s ISC rolled up to the parent company ISC.

Iterations 101–150 were go-to-market. LEARN from beta users: they loved the mobile app but were frustrated by integration gaps. So we pivoted the PLAN—prioritized integration expansion, deferred advanced scheduling features. Eight paying customers at launch (exceeded the target of five), 7% monthly churn (under the 10% target), NPS of 62. Revenue was at $28k—below the $50k target—but user satisfaction was high. The bottleneck wasn’t the product. It was distribution.

Iterations 151–200 invested in marketing, sales process, referral program. Final VERIFY: $54k MRR, product-market fit validated (80% of users would be “very disappointed” without the product).

Same seven phases. Same boolean verification. The only difference between fixing a database index in two hours and building a company in twelve months was scope and persistence. The algorithm doesn’t care what you point it at.

Memory that knows what to forget

As tasks scale, context accumulates. The three-tier memory system prevents the agent from drowning in its own history.

Hot memory stays in Redis—current iteration state, recent decisions, active ISC. Sub-millisecond reads, one-hour TTL. This is what the agent has in its face right now.

Warm memory lives in PostgreSQL—the last ten iterations of learnings, the decision log, connections to related tasks. This is what the agent reaches for when it resumes after an interruption.

Cold memory sits in a vector database—completed task archives, pattern libraries, reusable components. Retrieved by semantic similarity, not by timestamp. A new optimization task can inherit the strategy from a structurally similar past task without the agent rediscovering it.

Frequently accessed warm memories get promoted to hot. Unused hot memories expire to warm. Cold memories get retrieved only when something looks similar enough to matter.

Hooks: middleware for algorithm phases

Hooks are a thin middleware layer around each phase transition, similar to how Express lets you stack middleware before a route handler. You can register hooks for pre-phase (load domain knowledge before THINK), post-phase (update a dashboard after OBSERVE), full iterations (checkpoint state), and task lifecycle (generate reports on completion).

This means you get compliance gates, observability injection, cost controls, and domain-specific logic without touching the core algorithm. Hooks compose—stack multiple per phase, each adding a layer of capability.

A pre-EXECUTE hook can require human approval before production changes. A post-VERIFY hook can push progress to an external dashboard. TRIVIAL tasks skip the approval gate. THOROUGH tasks and above require it. The effort classification drives hook behavior, not just model selection.

What makes it actually universal

Three properties, and I’ve tested all three:

Scale invariance. The algorithm works identically whether the task takes 30 seconds or 3 years. Fix a typo, build a company—same seven phases. The number of iterations scales, the structure doesn’t change.

Domain independence. Nothing in the algorithm is specific to software. Writing code: OBSERVE the codebase, THINK about architecture, PLAN the solution. Writing essays: OBSERVE the research, THINK about arguments, PLAN the outline. Cooking: OBSERVE the ingredients, THINK about technique, PLAN the sequence. Running companies: OBSERVE the market, THINK about opportunity, PLAN the strategy. The phases are fundamental to problem-solving itself.

Verifiable termination. ISC provides objective completion criteria. Tasks terminate when all criteria are met—not before, not after. This kills both premature shipping and infinite tinkering.

Things I got wrong

After running this across more than 100 tasks, the failure patterns were consistent:

Vague ISC is the most common trap. “Make the API faster” isn’t verifiable. “p95 latency under 500ms measured over 1,000 requests” is.

Skipping OBSERVE is the second. Jumping straight to solution without measuring current state means you’re navigating without knowing where you started. No measurement, no iteration.

BUILD/EXECUTE confusion—making production changes while still developing the solution. The fix was an explicit transition gate: BUILD produces artifacts, EXECUTE deploys them. They’re different phases for a reason.

Undershooting the effort classification. Using QUICK for complex architecture means you run out of iterations before the ISC is met. Conservative classification is almost always cheaper than re-running.

Ignoring LEARN is the silent killer. Iterations repeat mistakes because nothing was captured. Making LEARN artifacts mandatory and reviewing them before the next OBSERVE fixed this.

Where this goes next

Multi-agent orchestration is the active frontier. Right now, one agent runs one task. The next step is coordinating multiple agents on DETERMINED tasks—one agent on backend architecture, another on frontend, a third on infrastructure, each running its own algorithm instance, coordinated through shared ISC and dependency tracking.

The ISC system itself generates training data: task description → plan → ISC → outcome. Over time the system can learn which criteria predict success, which effort classification minimizes iterations, and which model selection strategy optimizes cost against quality.

And as cold memory accumulates completed tasks, cross-domain patterns emerge. Optimization tasks typically need three to five iterations. Database migrations need explicit rollback verification. API design benefits from prototyping in the PLAN phase. These patterns transfer across domains because the algorithm doesn’t care what domain you’re in.

The Universal Algorithm isn’t a productivity hack. It’s a formalization of how effective problem-solving actually works, made explicit and verifiable. Every task is a gap between current and ideal. Every gap closes through iteration. Every iteration follows the same seven phases. Progress is measured mechanically through ISC.

This framework has handled everything from typos to distributed systems to products. The algorithm didn’t change. Only the scope and the number of loops changed.

That’s what makes it universal.

Jon Roosevelt builds production agent systems and writes about making AI infrastructure practical, measurable, and reliable—turning conceptual frameworks into working systems that ship.