DistributedPrompt (for scaling up RLMs)
Horizontally scalable prompts for Recursive Language Models (RLMs).
DistributedPrompt is a drop-in replacement for Python's str that stores
prompt data across fixed-size shards on disk or S3. Slicing
(prompt[n:n+k]) performs O(1) shard lookup — only the needed data is
fetched. This lets RLMs operate on prompts with 100M+ characters without
holding them in memory.
uv sync --all-groups # installs Python 3.12, dev & test deps
uv tool install -e . # installs the dprompt CLI on your PATHuv run dprompt ingest ../mega_prompt.md --output ./shards/ --shard-size 10_000_000
uv run dprompt info ./shards/from distributed_prompt import DistributedPrompt, FileBackend
backend = FileBackend("./shards/")
prompt = DistributedPrompt(backend)
# O(1) shard lookup — only fetches the two shards covering this range
print(prompt[10_000_000:10_000_000 + 1000])
print(len(prompt)) # no I/O — reads from metadata
print(repr(prompt)) # DistributedPrompt(length=..., shards=..., shard_size=...)
# str-like protocol
"keyword" in prompt # scans shard-by-shard with overlap
prompt.find("needle") # returns character offset or -1from distributed_prompt.backends.s3_backend import S3Backend, ingest_to_s3
# Upload local shards to MinIO
ingest_to_s3("./shards/", bucket="prompts", prefix="corpus-v1",
endpoint_url="http://minio:9000")
# Read from S3
backend = S3Backend(bucket="prompts", prefix="corpus-v1",
endpoint_url="http://minio:9000")
prompt = DistributedPrompt(backend)
print(prompt[0:100])In Zhang & Khattab 2025's Recursive Language Models paper, the agent loop evaluates
Python in a REPL where prompt is a plain string. Replace it with:
# Before (fails for large prompts):
prompt = open("huge_file.txt").read()
# After:
from distributed_prompt import DistributedPrompt, FileBackend
prompt = DistributedPrompt(FileBackend("./shards/"))
prompt[n:n+k] # works identically — O(1) shard fetchThe RLM's generated code can slice prompt as usual. The DistributedPrompt
object transparently fetches only the needed shards. LM inference latency
(seconds) dominates shard fetch latency (milliseconds) so this is effectively free.
uv run pytest tests/ -v- Zhang, Alex and Khattab, Omar (2025). "Recursive Language Models." https://alexzhang13.github.io/blog/2025/rlm/