ruosh is a full-text search library for Python. It has a Whoosh-like API, but the search engine underneath is Tantivy, which is written in Rust. The goal is to give Python code something familiar to work with while getting meaningful performance out of a real search engine.
pip install ruosh
Python 3.10 or newer is required. No Rust toolchain needed , wheels are pre-built for Linux, macOS, and Windows.
from ruosh import Schema, TEXT, ID, create_in
schema = (
Schema()
.add("doc_id", ID(stored=True, unique=True))
.add("title", TEXT(stored=True))
.add("body", TEXT(stored=True))
)
idx = create_in("my-index", schema)
w = idx.writer()
w.add_document(doc_id="1", title="Getting started", body="A short introduction to full-text search.")
w.add_document(doc_id="2", title="Advanced queries", body="Boolean, phrase, and field-specific queries.")
w.commit()
for hit in idx.search("introduction"):
print(hit["doc_id"], hit["title"], hit.score)String queries search all text fields:
results = idx.search("full-text search", limit=10)Structured queries let you be specific:
from ruosh.query import Term, And, Or, Phrase
# must contain both terms in the body field
results = idx.search(And(Term("body", "search"), Term("body", "fast")), limit=10)
# exact phrase
results = idx.search(Phrase("body", ["full-text", "search"]), limit=10)page2 = idx.search("search", limit=10, offset=10)
print(f"{page2.total} total hits, showing {len(page2)}")Pass snippet_fields to get highlighted excerpts back with each hit:
results = idx.search("search", limit=10, snippet_fields=["body"])
for hit in results:
print(hit.snippet("body")) # returns HTML with <b> tags around matchesAdd a numeric field with sortable=True and pass sort_by:
from ruosh import NUMERIC
schema = Schema().add("doc_id", ID(stored=True)).add("body", TEXT(stored=True)).add("rank", NUMERIC(stored=True, sortable=True))
# ...index documents...
results = idx.search("search", sort_by="rank", sort_desc=False)ruosh maintains a lightweight sidecar that tracks approximate term frequencies across the corpus. It updates at write time and loads once, so repeated queries cost nothing.
# how often does "search" appear across documents?
stats = idx.term_stats("body", "search")
# {"term": "search", "estimated_doc_freq": 1840, "very_common": True}
# what are the most common terms in this field?
terms = idx.frequent_terms("body", limit=20)
# [{"term": "search", "estimated_doc_freq": 1840}, ...]These are estimates, not exact counts. They are useful for things like query planning, stopword detection, and building tag clouds. Neither SQLite FTS5 nor Whoosh exposes a top-terms API at all.
# replace a document by its unique field
w = idx.writer()
w.update_document(doc_id="1", title="Updated title", body="New body text.")
w.commit()
# delete by field value
w = idx.writer()
w.delete_by_term("doc_id", "1")
w.commit()from ruosh import open_dir
idx = open_dir("my-index")ruosh trades raw single-query speed for richer features. Each search call crosses the Python-Rust boundary, which adds a few milliseconds of fixed overhead. For workloads that need very fast single-keyword lookups over small corpora, SQLite FTS5 will be faster. ruosh is a better fit when you need structured queries, pagination with correct totals, highlighted snippets, or corpus statistics.
Compared to Whoosh, ruosh is faster across the board. On a 100,000-document corpus:
| Scenario | ruosh | Whoosh |
|---|---|---|
| Keyword search | 8.9 ms | 43 ms |
| Boolean AND | 13 ms | 201 ms |
| Phrase query | 48 ms | 371 ms |
| Paginated results | 30 ms | 193 ms |
| Snippet extraction | 10 ms | 41 ms |
| Corpus intelligence | 0.7 ms | 5.7 ms |
Requirements: Python 3.11, Rust toolchain, uv.
git clone https://github.com/biswarupghosh/ruosh
cd ruosh
uv venv .venv --python 3.11
uv sync --extra dev
uv run maturin develop
uv run pytest
To run the full feature benchmark:
uv sync --extra dev --extra bench
uv run python scripts/benchmark_features.py
MIT