Recursive Language Models

By Stu Kennedy  ·  May 2026  ·  AIArchitectureCloudflare

RAG is a filing cabinet. An RLM is a reader. Where RAG pre-fetches and dumps context, a Recursive Language Model loops — reading, deciding what else to read, then reading again — until it has enough to answer.

The Problem with RAG

Retrieval-Augmented Generation has become the default pattern for giving LLMs access to external knowledge. You chunk documents, embed them, store vectors, retrieve the top-k at query time, and stuff them into the context window. It works. But it has a fundamental flaw: the model has no agency over what it reads.

The RAG bottleneck: All retrieval decisions happen before the model reasons. The model can't ask follow-up questions of the knowledge base mid-thought.

What is an RLM?

A Recursive Language Model interleaves reading and reasoning in a loop — query, reason, read, evaluate, loop or answer:

USER QUERY REASON READ ENOUGH? no — read more ANS yes max depth: configurable
Fig 1. The RLM read-reason loop

RLMs on Cloudflare Durable Objects

A Durable Object is the REPL environment. Each DO instance is a stateful session with its own storage, its own LLM loop, and its own inverted keyword index. No external vector store needed.

class RecursiveMemory extends DurableObject {
  async runLoop(query: string, model: string): Promise<string> {
    const tools = [
      { name: 'search', fn: (q) => this.keywordSearch(q) },
      { name: 'read',   fn: (id) => this.readChunk(id) },
    ];
    const { text } = await generateText({
      model, prompt: query, tools,
      maxSteps: this.config.maxDepth,
    });
    return text;
  }
}

RLM vs RAG

DimensionRAGRLM
Retrieval timingPre-query, one-shotMid-reasoning, iterative
Model agencyNoneFull — model decides what to read
LatencyLowHigher (multiple steps)
Best forSimple Q&AComplex reasoning, multi-hop

The Hypermedia Connection

There's a reason html.surf is built the way it is. HTML pages linked by <a href> tags are the natural substrate for RLM navigation. An agent reading this article can follow the link to Free The Web, decide it needs more context, and follow another link — exactly the way a human researcher would. The link graph is the retrieval index.

This is the thesis of Free The Web: HTML is readable by humans and machines alike. An RLM navigating html.surf pages is indistinguishable from a human doing research. That's the goal.

Further Reading