# Recursive Language Models

 By Stu Kennedy  ·  May 2026  ·   AI  Architecture  Cloudflare

 RAG is a filing cabinet. An RLM is a reader. Where RAG pre-fetches and dumps context, a Recursive Language Model loops — reading, deciding what else to read, then reading again — until it has enough to answer.

## The Problem with RAG

 Retrieval-Augmented Generation has become the default pattern for giving LLMs access to external knowledge. You chunk documents, embed them, store vectors, retrieve the top-k at query time, and stuff them into the context window. It works. But it has a fundamental flaw: the model has no agency over what it reads.

  **The RAG bottleneck:** All retrieval decisions happen before the model reasons. The model can't ask follow-up questions of the knowledge base mid-thought.

## What is an RLM?

 A Recursive Language Model interleaves reading and reasoning in a loop — query, reason, read, evaluate, loop or answer:

 USER
 QUERY

 REASON

 READ

 ENOUGH?

 no — read more

 ANS
 yes

 max depth: configurable

 Fig 1. The RLM read-reason loop

## RLMs on Cloudflare Durable Objects

 A Durable Object is the REPL environment. Each DO instance is a stateful session with its own storage, its own LLM loop, and its own inverted keyword index. No external vector store needed.

```
class RecursiveMemory extends DurableObject { async runLoop(query: string, model: string): Promise  { const tools = [ { name: 'search', fn: (q) => this.keywordSearch(q) }, { name: 'read', fn: (id) => this.readChunk(id) }, ]; const { text } = await generateText({ model, prompt: query, tools, maxSteps: this.config.maxDepth, }); return text; } }
```

## RLM vs RAG

  Dimension  RAG  RLM
  Retrieval timing  Pre-query, one-shot  Mid-reasoning, iterative
  Model agency  None  Full — model decides what to read
  Latency  Low  Higher (multiple steps)
  Best for  Simple Q&A  Complex reasoning, multi-hop

## The Hypermedia Connection

 There's a reason [html.surf](/p/surfable-web) is built the way it is. HTML pages linked by ` ` tags are the natural substrate for RLM navigation. An agent reading this article can follow the link to [Free The Web](/p/free-the-web), decide it needs more context, and follow another link — exactly the way a human researcher would. The link graph _is_ the retrieval index.

 This is the thesis of [Free The Web](/p/free-the-web): [HTML is readable by humans and machines alike](/p/why-html-wins). An RLM navigating html.surf pages is indistinguishable from a human doing research. That's the goal.

## Further Reading

- [Free The Web](/p/free-the-web) — five principles for the next web

- [Why HTML Wins](/p/why-html-wins) — the technical case for hypermedia

- [The Surfable Web](/p/surfable-web) — knowledge graphs as websites

- [Edge-First Architecture](/p/edge-first) — where this all runs

 Published on [html.surf](/p/index) — surfable knowledge for humans + AI