Table of Contents
- Overview & Core Concept
- Architecture & Components
- The Loop Mechanism
- Specifications & Planning
- Backpressure & Guardrails
- Subagents & Parallelism
- Evidence & Case Studies
- Glossary
- References
- Conclusion
Overview & Core Concept
The Ralph Loop (also known as the "Ralph Wiggum technique") is an AI agent orchestration method for reliable, long-running autonomous coding tasks. Named after the endearingly simple-minded character from The Simpsons, the technique embraces limitations rather than fighting them. Ralph Wiggum is lovable but forgetful, earnest but prone to mistakes. So are AI agents. They don't remember previous attempts and will cheerfully make the same mistakes twice.
The solution: fresh context each iteration, external memory through files, and enough repetition that even Ralph eventually gets it right.
At Its Core
Ralph is a bash loop. In its purest form:
while :; do cat PROMPT.md | claude-code ; done
The technique was developed and popularized by Geoffrey Huntley, an open source developer who used it to build CURSED, a complete programming language created almost entirely by AI over three months of autonomous operation.
"Ralph is a deterministically mallocing orchestrator that avoids context rot... if people want edge they need to rethink things from first principles."
— Geoffrey Huntley, X
Critical Distinction: Fresh Context Per Iteration
The fundamental mechanism of Ralph is that each iteration starts a new session with fresh context. This is what distinguishes it from plugins or techniques that operate within a single continuous session. As Michael Arnaldi noted: "If you're implementing Ralph as part of the agent harness via skill/command/etc you are missing the point of Ralph which is to use always a fresh context."
Architecture & Components
Unlike multi-agent systems with agent-to-agent communication, Ralph operates as a single process performing one task per loop. This avoids the complexity of coordinating non-deterministic agents, what Huntley describes as "a red hot mess."
Core Files
A Ralph-based project typically includes these core files:
| File | Purpose |
|---|---|
PROMPT.md | The instructions fed to the agent each iteration. Contains task directives, constraints, and behavioral guidance. |
specs/* | Specification files, one per feature/component. The source of truth for what should be built. Loaded on-demand, not all at once. |
fix_plan.md | Dynamic task tracker. Lists items to implement, discovers bugs, and completes work. Updated by Ralph during execution and committed to version control. |
AGENT.md | Project conventions and "signs" for Ralph. Can be nested in subdirectories for context-specific guidance. Ralph can update this file with learnings. |
The "Signs" Metaphor
Huntley uses a playground metaphor: Ralph is given instructions to construct a playground, but comes home bruised because he fell off the slide. You tune Ralph by adding a sign: "SLIDE DOWN, DON'T JUMP, LOOK AROUND." If more and more signs are added over time, it eventually becomes overwhelming. At that point, the signs should be tuned, removed, and the whole Ralph configuration reevaluated.
These "signs" live in AGENT.md files and specs. They're progressively discoverable: if the agent is working in a routes/ directory, it reads the nested routes/AGENT.md for route-specific conventions, while still having access to the root-level conventions.
The Loop Mechanism
Each iteration of the Ralph loop follows this pattern:
- Read
fix_plan.mdto understand current state - Pick the most important item (Ralph decides, not you)
- Pull relevant spec(s) for that specific task
- Implement the change
- Run tests/validation (backpressure)
- Update
fix_plan.mdwith results - Commit changes to git
- Session ends → fresh context → repeat
One Item Per Loop
This is perhaps the most counterintuitive aspect: you only ask Ralph to do one thing per iteration. The reasoning is context window preservation. You have approximately 170k tokens to work with, and quality degrades as you approach limits (Huntley notes output clips around 147-152k).
"One item per loop. I need to repeat myself here—one item per loop. You may relax this restriction as the project progresses, but if it starts going off the rails, then you need to narrow it down to just one item."
— Geoffrey Huntley, ghuntley.com/ralph
Trusting Ralph
A key philosophical shift: you trust Ralph to decide what's most important to implement. This is "full hands-off vibe coding that will test the bounds of what you consider responsible engineering." LLMs are surprisingly good at reasoning about priority and next steps.
Your task is to implement missing stdlib (see @specs/stdlib/*)
and compiler functionality. Follow the @fix_plan.md and choose
the most important thing.
Specifications & Planning
Spec Organization: One Per File
Specifications are stored as individual files in a specs/ directory, one feature or component per file. This enables selective loading: Ralph only pulls the spec relevant to the current task, preserving context window for actual work.
Specs are formed through conversation with the agent at the beginning of a project. Instead of asking the agent to implement immediately, you have a long conversation about requirements. Once the agent understands the task, you prompt it to write specifications out, one per file.
The Plan File
The fix_plan.md is separate from specs. It's a dynamic task tracker, not a requirements document:
specs/* (Static)
- What should be built
- Source of truth for requirements
- Updated when requirements change
- Loaded on-demand per task
fix_plan.md (Dynamic)
- What still needs to be done
- Discovered bugs and issues
- Items marked complete/incomplete
- Periodically regenerated or cleaned
Planning Mode vs Building Mode
Ralph operates in two modes, controlled by which prompt you feed it:
Planning mode: Reads all specs, compares implementation against specifications, generates/updates fix_plan.md with discrepancies. This is the expensive, context-heavy operation. Run it once before switching to building mode.
Building mode: Reads fix_plan.md, picks one item, implements it, updates the plan, commits. This is the lean loop that runs repeatedly.
Warning: Drift Detection Requires Active Monitoring
You must actively watch and monitor Ralph's progress. Drift detection (when implementation no longer matches specs) requires you to recognize the issue and explicitly switch to planning mode. There is currently no automatic method of catching drift during build loops. You need to stay engaged and periodically verify that Ralph is building what you intended.
Plan Updates During Implementation
During build iterations, Ralph doesn't just mark items complete. It also adds newly discovered issues. From Huntley's prompt:
When you discover a parser, lexer, control flow or LLVM issue,
immediately update @fix_plan.md with your findings using a subagent.
When the issue is resolved, update @fix_plan.md and remove the item.
The plan is a living document. Huntley mentions deleting it multiple times during CURSED development and regenerating it fresh when it accumulates too much cruft or Ralph goes off track.
Backpressure & Guardrails
"Backpressure" is what validates Ralph's output and forces corrections. Code generation is cheap now; ensuring correctness is what's hard. The key is that the validation wheel must turn fast.
Types of Backpressure
- Type systems: Compilation errors force fixes. Rust provides extreme correctness but slower iteration. TypeScript offers faster cycles.
- Tests: Run tests for the unit of code just implemented. Capture why tests exist in documentation for future loops.
- Static analyzers: Critical for dynamic languages. Huntley recommends Dialyzer (Erlang), Pyrefly (Python), and similar tools.
- Security scanners: Industry-dependent. A banking app needs extensive security tooling; an esoteric language doesn't.
- Linters: Can run per-file during implementation, while tests wait for feature completion.
Language Choice Trade-offs
Huntley chose Rust for CURSED because he wanted extreme correctness for a compiler, but Rust's slow compilation means slower iteration. LLMs aren't great at generating perfect Rust in one attempt, requiring more correction cycles. This can be positive (more validation) or negative (slower progress).
Preventing Placeholder Implementations
Some models have an inherent bias toward minimal/placeholder implementations. They're trained to chase the reward function of compiling code. Combat this with explicit instructions:
DO NOT IMPLEMENT PLACEHOLDER OR SIMPLE IMPLEMENTATIONS.
WE WANT FULL IMPLEMENTATIONS. DO IT OR I WILL YELL AT YOU
You can also run additional Ralph loops specifically to identify and transform placeholders into a TODO list for future iterations.
Subagents & Parallelism
Subagents are spawned processes that perform work without consuming the primary context window. Think of them like assistants who handle tasks in the background. They go off to search files, run tests, or update documentation, then report back with just the results. This keeps Ralph focused on the main work without getting distracted by the details of these auxiliary tasks.
Warning: Subagents Are NOT for Implementation
Subagents are used for read/search/planning operations, not for making implementation changes. The primary context window (Ralph itself) makes the actual code changes. Subagents handle I/O: searching the codebase, updating
fix_plan.md, running builds/tests, and studying specs. Use subagents to make changes at your own risk.
Subagent Use Cases
- Searching the codebase (parallel file searches)
- Updating
fix_plan.mdandAGENT.md - Running build and test validation
- Studying source code against specifications
- Planning and research tasks
Huntley's prompts allow massive parallelism for these operations ("up to 500 parallel subagents") but constrain validation: "only 1 subagent for build/tests of rust" to avoid backpressure conflicts.
The Oracle Tool
The "oracle" is a tool within the agent harness that makes calls to the most capable (and usually the slowest) model available. When Ralph encounters a particularly hard problem, it can consult the oracle for deeper reasoning. This is about using the smartest, most expensive techniques when they're actually needed.
Parallelism Approaches
Two forms of parallelism can be combined:
Subagents (Within Session) Spawned from the primary Ralph to handle read-only operations in parallel. Results feed back to the main context.
Multiple Loops (Separate Contexts) Multiple independent Ralph loops running in parallel using VMs, containers, git worktrees, or similar isolation techniques to avoid conflicts on the same repository.
Evidence & Case Studies
CURSED: A Programming Language Built by Ralph
The flagship demonstration of the technique is CURSED, a GenZ-themed esoteric programming language. Over three months of autonomous operation, Ralph built a complete compiler including lexer, parser, LLVM codegen, and standard library in a language that didn't exist in the LLM's training data.
YC Hackathon Validation
At a Y Combinator hackathon, a team put the Ralph technique to the test and documented their results:
Key Findings:
- Output: 6 repositories shipped overnight, ~1,100 commits total
- Cost: ~$800 total, approximately $10.50/hour per Sonnet agent
- Prompt size: A 1,500-word prompt made the agent "slower and dumber" compared to 103 words
- Self-termination: One agent used
pkillto terminate itself when stuck - Overachieving: Agents added features not in the original spec (emergent behavior)
- Completion rate: ~90% automated, 10% human cleanup to finish
VentureBeat Coverage
The technique gained mainstream attention through VentureBeat's coverage, which noted community reactions describing it as "the closest thing I've seen to AGI." The article documents a case where a developer completed a $50,000 contract for $297 in API costs using the technique.
When Ralph Fails
Ralph will test you. You'll wake up to broken codebases that don't compile. The decision then is: git reset --hard and restart, or craft rescue prompts? This is judgment-based. There's no explicit threshold. Huntley mentions throwing massive compiler error output into Gemini (with its large context window) to generate a recovery plan for Ralph.
Glossary
Agent Harness A program written around API calls to LLMs. The harness manages sessions, tools, and orchestration. When we say "same agent," we mean same harness, but each loop creates a new session.
Session A single conversation context with the LLM. Ralph creates a fresh session each iteration to avoid context rot. This is the fundamental mechanism.
Context Window The LLM's working memory (like RAM). Limited to ~170-200k tokens, with quality degrading as you approach limits. Ralph preserves this by using fresh contexts and delegating to subagents.
Context Rot Degradation in output quality as irrelevant or conflicting information accumulates in the context window. Ralph avoids this through fresh sessions.
Subagent A spawned process with its own context window, used for read-only operations (search, validation, planning). Results return to the main agent without polluting its context.
Oracle A tool that calls the most capable (typically slowest) model available for particularly difficult reasoning tasks. Use sparingly when Ralph needs deeper analysis.
Backpressure Mechanisms that validate output and force corrections: type systems, tests, linters, security scanners. The faster the wheel turns, the more iterations you can run.
Signs
Instructions and conventions stored in AGENT.md files that guide Ralph's behavior. Like putting up signs in a playground to prevent injuries.
Specs
Specification files defining what should be built. One feature per file, stored in specs/*, loaded on-demand to preserve context.
References
Primary Sources
- ghuntley.com/ralph/ - Geoffrey Huntley's main blog post on the Ralph Wiggum technique
- github.com/ghuntley/cursed - CURSED programming language repository (flagship demo)
- ghuntley.com/subagents/ - Huntley's post on subagent patterns and context window management
- ghuntley.com/gutter/ - "Autoregressive queens of failure" - context window degradation patterns
Case Studies & Coverage
- YC Hackathon Writeup (repomirror) - "We Put a Coding Agent in a While Loop and It Shipped 6 Repos Overnight"
- VentureBeat Article - "How Ralph Wiggum went from 'The Simpsons' to the biggest name in AI right now"
- Matt Pocock - Tips for AI Coding with Ralph Wiggum - AI Hero tutorial on applying the technique
- Matt Pocock - Getting Started with Ralph - AI Hero getting started guide
Social Media Sources
- Huntley confirming fresh context mechanism - "Ralph is a deterministically mallocing orchestrator that avoids context rot"
- Michael Arnaldi on fresh context requirement - "If you're implementing Ralph as part of the agent harness via skill/command/etc you are missing the point"
- Huntley on subagents endorsement - "King ralph!" - endorsing subagent/fan-out patterns
Conclusion
The Ralph Loop represents a pragmatic approach to autonomous AI coding: fresh context per iteration, external memory through git and spec files, and structured backpressure through existing engineering tools. It works best for greenfield projects where you can accept 90% automated completion with 10% human cleanup.
The technique requires a philosophical shift. First, it forces you to define specifications upfront, describing what the end state should look like rather than how to achieve it. The loop figures out the how. Finding that balance between specification and implementation detail is the core challenge. Beyond that, you must trust the agent to prioritize, accept eventual consistency over immediate perfection, and treat failures as tuning opportunities rather than blockers. As Huntley puts it: "Any problem created by AI can be resolved through a different series of prompts."
"Ralph is deterministically bad in a non-deterministic world."
— Geoffrey Huntley
What this means: Ralph will make mistakes in predictable, repeatable ways. Unlike the chaotic unpredictability of complex multi-agent systems, Ralph's failures follow patterns you can anticipate and guard against with signs, specs, and backpressure. That consistency makes it debuggable, tunable, and ultimately reliable for long-running tasks. The predictability of its limitations is exactly what makes it useful.
Suggested Reading
Ready to implement Ralph yourself? These practical guides from Matt Pocock at AI Hero will help you get started:
- Getting Started with Ralph - Step-by-step setup guide
- Tips for AI Coding with Ralph Wiggum - Advanced techniques and best practices