The Ralph Loop: Long-Running AI Agents

Overview & Core Concept
Architecture & Components
The Loop Mechanism
Specifications & Planning
Backpressure & Guardrails
Subagents & Parallelism
Evidence & Case Studies
Glossary
References
Conclusion

Overview & Core Concept

The Ralph Loop (also known as the "Ralph Wiggum technique") is an AI agent orchestration method for reliable, long-running autonomous coding tasks. Named after the endearingly simple-minded character from The Simpsons, the technique embraces limitations rather than fighting them. Ralph Wiggum is lovable but forgetful, earnest but prone to mistakes. So are AI agents. They don't remember previous attempts and will cheerfully make the same mistakes twice.

The solution: fresh context each iteration, external memory through files, and enough repetition that even Ralph eventually gets it right.

At Its Core

Ralph is a bash loop. In its purest form:

while :; do cat PROMPT.md | claude-code ; done

The technique was developed and popularized by Geoffrey Huntley, an open source developer who used it to build CURSED, a complete programming language created almost entirely by AI over three months of autonomous operation.

"Ralph is a deterministically mallocing orchestrator that avoids context rot... if people want edge they need to rethink things from first principles."

— Geoffrey Huntley, X

Critical Distinction: Fresh Context Per Iteration

The fundamental mechanism of Ralph is that each iteration starts a new session with fresh context. This is what distinguishes it from plugins or techniques that operate within a single continuous session. As Michael Arnaldi noted: "If you're implementing Ralph as part of the agent harness via skill/command/etc you are missing the point of Ralph which is to use always a fresh context."

Architecture & Components

Unlike multi-agent systems with agent-to-agent communication, Ralph operates as a single process performing one task per loop. This avoids the complexity of coordinating non-deterministic agents, what Huntley describes as "a red hot mess."

Core Files

A Ralph-based project typically includes these core files:

File	Purpose
`PROMPT.md`	The instructions fed to the agent each iteration. Contains task directives, constraints, and behavioral guidance.
`specs/*`	Specification files, one per feature/component. The source of truth for what should be built. Loaded on-demand, not all at once.
`fix_plan.md`	Dynamic task tracker. Lists items to implement, discovers bugs, and completes work. Updated by Ralph during execution and committed to version control.
`AGENT.md`	Project conventions and "signs" for Ralph. Can be nested in subdirectories for context-specific guidance. Ralph can update this file with learnings.

The "Signs" Metaphor

Huntley uses a playground metaphor: Ralph is given instructions to construct a playground, but comes home bruised because he fell off the slide. You tune Ralph by adding a sign: "SLIDE DOWN, DON'T JUMP, LOOK AROUND." If more and more signs are added over time, it eventually becomes overwhelming. At that point, the signs should be tuned, removed, and the whole Ralph configuration reevaluated.

These "signs" live in AGENT.md files and specs. They're progressively discoverable: if the agent is working in a routes/ directory, it reads the nested routes/AGENT.md for route-specific conventions, while still having access to the root-level conventions.

The Loop Mechanism

Each iteration of the Ralph loop follows this pattern:

Read fix_plan.md to understand current state
Pick the most important item (Ralph decides, not you)
Pull relevant spec(s) for that specific task
Implement the change
Run tests/validation (backpressure)
Update fix_plan.md with results
Commit changes to git
Session ends → fresh context → repeat

One Item Per Loop

This is perhaps the most counterintuitive aspect: you only ask Ralph to do one thing per iteration. The reasoning is context window preservation. You have approximately 170k tokens to work with, and quality degrades as you approach limits (Huntley notes output clips around 147-152k).

"One item per loop. I need to repeat myself here—one item per loop. You may relax this restriction as the project progresses, but if it starts going off the rails, then you need to narrow it down to just one item."

— Geoffrey Huntley, ghuntley.com/ralph

Trusting Ralph

A key philosophical shift: you trust Ralph to decide what's most important to implement. This is "full hands-off vibe coding that will test the bounds of what you consider responsible engineering." LLMs are surprisingly good at reasoning about priority and next steps.

Your task is to implement missing stdlib (see @specs/stdlib/*) 
and compiler functionality. Follow the @fix_plan.md and choose 
the most important thing.

Specifications & Planning

Spec Organization: One Per File

Specifications are stored as individual files in a specs/ directory, one feature or component per file. This enables selective loading: Ralph only pulls the spec relevant to the current task, preserving context window for actual work.

Specs are formed through conversation with the agent at the beginning of a project. Instead of asking the agent to implement immediately, you have a long conversation about requirements. Once the agent understands the task, you prompt it to write specifications out, one per file.

The Plan File

The fix_plan.md is separate from specs. It's a dynamic task tracker, not a requirements document:

`specs/*` (Static)

What should be built
Source of truth for requirements
Updated when requirements change
Loaded on-demand per task

`fix_plan.md` (Dynamic)

What still needs to be done
Discovered bugs and issues
Items marked complete/incomplete
Periodically regenerated or cleaned

Planning Mode vs Building Mode

Ralph operates in two modes, controlled by which prompt you feed it:

Planning mode: Reads all specs, compares implementation against specifications, generates/updates fix_plan.md with discrepancies. This is the expensive, context-heavy operation. Run it once before switching to building mode.

Building mode: Reads fix_plan.md, picks one item, implements it, updates the plan, commits. This is the lean loop that runs repeatedly.

Warning: Drift Detection Requires Active Monitoring

You must actively watch and monitor Ralph's progress. Drift detection (when implementation no longer matches specs) requires you to recognize the issue and explicitly switch to planning mode. There is currently no automatic method of catching drift during build loops. You need to stay engaged and periodically verify that Ralph is building what you intended.

Plan Updates During Implementation

During build iterations, Ralph doesn't just mark items complete. It also adds newly discovered issues. From Huntley's prompt:

When you discover a parser, lexer, control flow or LLVM issue, 
immediately update @fix_plan.md with your findings using a subagent. 
When the issue is resolved, update @fix_plan.md and remove the item.

The plan is a living document. Huntley mentions deleting it multiple times during CURSED development and regenerating it fresh when it accumulates too much cruft or Ralph goes off track.

Backpressure & Guardrails

"Backpressure" is what validates Ralph's output and forces corrections. Code generation is cheap now; ensuring correctness is what's hard. The key is that the validation wheel must turn fast.

Types of Backpressure

Type systems: Compilation errors force fixes. Rust provides extreme correctness but slower iteration. TypeScript offers faster cycles.
Tests: Run tests for the unit of code just implemented. Capture why tests exist in documentation for future loops.
Static analyzers: Critical for dynamic languages. Huntley recommends Dialyzer (Erlang), Pyrefly (Python), and similar tools.
Security scanners: Industry-dependent. A banking app needs extensive security tooling; an esoteric language doesn't.
Linters: Can run per-file during implementation, while tests wait for feature completion.

Language Choice Trade-offs

Huntley chose Rust for CURSED because he wanted extreme correctness for a compiler, but Rust's slow compilation means slower iteration. LLMs aren't great at generating perfect Rust in one attempt, requiring more correction cycles. This can be positive (more validation) or negative (slower progress).

Preventing Placeholder Implementations

Some models have an inherent bias toward minimal/placeholder implementations. They're trained to chase the reward function of compiling code. Combat this with explicit instructions:

DO NOT IMPLEMENT PLACEHOLDER OR SIMPLE IMPLEMENTATIONS. 
WE WANT FULL IMPLEMENTATIONS. DO IT OR I WILL YELL AT YOU

You can also run additional Ralph loops specifically to identify and transform placeholders into a TODO list for future iterations.

Subagents & Parallelism

Subagents are spawned processes that perform work without consuming the primary context window. Think of them like assistants who handle tasks in the background. They go off to search files, run tests, or update documentation, then report back with just the results. This keeps Ralph focused on the main work without getting distracted by the details of these auxiliary tasks.

Warning: Subagents Are NOT for Implementation

Subagents are used for read/search/planning operations, not for making implementation changes. The primary context window (Ralph itself) makes the actual code changes. Subagents handle I/O: searching the codebase, updating fix_plan.md, running builds/tests, and studying specs. Use subagents to make changes at your own risk.

Subagent Use Cases

Searching the codebase (parallel file searches)
Updating fix_plan.md and AGENT.md
Running build and test validation
Studying source code against specifications
Planning and research tasks

Huntley's prompts allow massive parallelism for these operations ("up to 500 parallel subagents") but constrain validation: "only 1 subagent for build/tests of rust" to avoid backpressure conflicts.

The Oracle Tool

The "oracle" is a tool within the agent harness that makes calls to the most capable (and usually the slowest) model available. When Ralph encounters a particularly hard problem, it can consult the oracle for deeper reasoning. This is about using the smartest, most expensive techniques when they're actually needed.

Parallelism Approaches

Two forms of parallelism can be combined:

Subagents (Within Session) Spawned from the primary Ralph to handle read-only operations in parallel. Results feed back to the main context.

Multiple Loops (Separate Contexts) Multiple independent Ralph loops running in parallel using VMs, containers, git worktrees, or similar isolation techniques to avoid conflicts on the same repository.

Evidence & Case Studies

CURSED: A Programming Language Built by Ralph

The flagship demonstration of the technique is CURSED, a GenZ-themed esoteric programming language. Over three months of autonomous operation, Ralph built a complete compiler including lexer, parser, LLVM codegen, and standard library in a language that didn't exist in the LLM's training data.

YC Hackathon Validation

At a Y Combinator hackathon, a team put the Ralph technique to the test and documented their results:

Key Findings:

Output: 6 repositories shipped overnight, ~1,100 commits total
Cost: ~$800 total, approximately $10.50/hour per Sonnet agent
Prompt size: A 1,500-word prompt made the agent "slower and dumber" compared to 103 words
Self-termination: One agent used pkill to terminate itself when stuck
Overachieving: Agents added features not in the original spec (emergent behavior)
Completion rate: ~90% automated, 10% human cleanup to finish

VentureBeat Coverage

The technique gained mainstream attention through VentureBeat's coverage, which noted community reactions describing it as "the closest thing I've seen to AGI." The article documents a case where a developer completed a $50,000 contract for $297 in API costs using the technique.

When Ralph Fails

Ralph will test you. You'll wake up to broken codebases that don't compile. The decision then is: git reset --hard and restart, or craft rescue prompts? This is judgment-based. There's no explicit threshold. Huntley mentions throwing massive compiler error output into Gemini (with its large context window) to generate a recovery plan for Ralph.

Glossary

Agent Harness A program written around API calls to LLMs. The harness manages sessions, tools, and orchestration. When we say "same agent," we mean same harness, but each loop creates a new session.

Session A single conversation context with the LLM. Ralph creates a fresh session each iteration to avoid context rot. This is the fundamental mechanism.

Context Window The LLM's working memory (like RAM). Limited to ~170-200k tokens, with quality degrading as you approach limits. Ralph preserves this by using fresh contexts and delegating to subagents.

Context Rot Degradation in output quality as irrelevant or conflicting information accumulates in the context window. Ralph avoids this through fresh sessions.

Subagent A spawned process with its own context window, used for read-only operations (search, validation, planning). Results return to the main agent without polluting its context.

Oracle A tool that calls the most capable (typically slowest) model available for particularly difficult reasoning tasks. Use sparingly when Ralph needs deeper analysis.

Backpressure Mechanisms that validate output and force corrections: type systems, tests, linters, security scanners. The faster the wheel turns, the more iterations you can run.

Signs Instructions and conventions stored in AGENT.md files that guide Ralph's behavior. Like putting up signs in a playground to prevent injuries.

Specs Specification files defining what should be built. One feature per file, stored in specs/*, loaded on-demand to preserve context.

References

Primary Sources

ghuntley.com/ralph/ - Geoffrey Huntley's main blog post on the Ralph Wiggum technique
github.com/ghuntley/cursed - CURSED programming language repository (flagship demo)
ghuntley.com/subagents/ - Huntley's post on subagent patterns and context window management
ghuntley.com/gutter/ - "Autoregressive queens of failure" - context window degradation patterns

Case Studies & Coverage

YC Hackathon Writeup (repomirror) - "We Put a Coding Agent in a While Loop and It Shipped 6 Repos Overnight"
VentureBeat Article - "How Ralph Wiggum went from 'The Simpsons' to the biggest name in AI right now"
Matt Pocock - Tips for AI Coding with Ralph Wiggum - AI Hero tutorial on applying the technique
Matt Pocock - Getting Started with Ralph - AI Hero getting started guide

Huntley confirming fresh context mechanism - "Ralph is a deterministically mallocing orchestrator that avoids context rot"
Michael Arnaldi on fresh context requirement - "If you're implementing Ralph as part of the agent harness via skill/command/etc you are missing the point"
Huntley on subagents endorsement - "King ralph!" - endorsing subagent/fan-out patterns

Conclusion

The Ralph Loop represents a pragmatic approach to autonomous AI coding: fresh context per iteration, external memory through git and spec files, and structured backpressure through existing engineering tools. It works best for greenfield projects where you can accept 90% automated completion with 10% human cleanup.

The technique requires a philosophical shift. First, it forces you to define specifications upfront, describing what the end state should look like rather than how to achieve it. The loop figures out the how. Finding that balance between specification and implementation detail is the core challenge. Beyond that, you must trust the agent to prioritize, accept eventual consistency over immediate perfection, and treat failures as tuning opportunities rather than blockers. As Huntley puts it: "Any problem created by AI can be resolved through a different series of prompts."

"Ralph is deterministically bad in a non-deterministic world."

— Geoffrey Huntley

What this means: Ralph will make mistakes in predictable, repeatable ways. Unlike the chaotic unpredictability of complex multi-agent systems, Ralph's failures follow patterns you can anticipate and guard against with signs, specs, and backpressure. That consistency makes it debuggable, tunable, and ultimately reliable for long-running tasks. The predictability of its limitations is exactly what makes it useful.

The Ralph Loop: Long-Running AI Agents

Table of Contents

Overview & Core Concept

At Its Core

Critical Distinction: Fresh Context Per Iteration

Architecture & Components

Core Files

The "Signs" Metaphor

The Loop Mechanism

One Item Per Loop

Trusting Ralph

Specifications & Planning

Spec Organization: One Per File

The Plan File

`specs/*` (Static)

`fix_plan.md` (Dynamic)

Planning Mode vs Building Mode

Plan Updates During Implementation

Backpressure & Guardrails

Types of Backpressure

Language Choice Trade-offs

Preventing Placeholder Implementations

Subagents & Parallelism

Subagent Use Cases

The Oracle Tool

Parallelism Approaches

Evidence & Case Studies

CURSED: A Programming Language Built by Ralph

YC Hackathon Validation

VentureBeat Coverage

When Ralph Fails

Glossary

References

Primary Sources

Case Studies & Coverage

Conclusion

Suggested Reading

Table of Contents

Overview & Core Concept

At Its Core

Critical Distinction: Fresh Context Per Iteration

Architecture & Components

Core Files

The "Signs" Metaphor

The Loop Mechanism

One Item Per Loop

Trusting Ralph

Specifications & Planning

Spec Organization: One Per File

The Plan File

specs/* (Static)

fix_plan.md (Dynamic)

Planning Mode vs Building Mode

Plan Updates During Implementation

Backpressure & Guardrails

Types of Backpressure

Language Choice Trade-offs

Preventing Placeholder Implementations

Subagents & Parallelism

Subagent Use Cases

The Oracle Tool

Parallelism Approaches

Evidence & Case Studies

CURSED: A Programming Language Built by Ralph

YC Hackathon Validation

VentureBeat Coverage

When Ralph Fails

Glossary

References

Primary Sources

Case Studies & Coverage

Social Media Sources

Conclusion

Suggested Reading

`specs/*` (Static)

`fix_plan.md` (Dynamic)