ruosh

ruosh is a full-text search library for Python. It has a Whoosh-like API, but the search engine underneath is Tantivy, which is written in Rust. The goal is to give Python code something familiar to work with while getting meaningful performance out of a real search engine.

Install

pip install ruosh

Python 3.10 or newer is required. No Rust toolchain needed , wheels are pre-built for Linux, macOS, and Windows.

Quick start

from ruosh import Schema, TEXT, ID, create_in

schema = (
    Schema()
    .add("doc_id", ID(stored=True, unique=True))
    .add("title",  TEXT(stored=True))
    .add("body",   TEXT(stored=True))
)

idx = create_in("my-index", schema)

w = idx.writer()
w.add_document(doc_id="1", title="Getting started", body="A short introduction to full-text search.")
w.add_document(doc_id="2", title="Advanced queries", body="Boolean, phrase, and field-specific queries.")
w.commit()

for hit in idx.search("introduction"):
    print(hit["doc_id"], hit["title"], hit.score)

Queries

String queries search all text fields:

results = idx.search("full-text search", limit=10)

Structured queries let you be specific:

from ruosh.query import Term, And, Or, Phrase

# must contain both terms in the body field
results = idx.search(And(Term("body", "search"), Term("body", "fast")), limit=10)

# exact phrase
results = idx.search(Phrase("body", ["full-text", "search"]), limit=10)

Pagination

page2 = idx.search("search", limit=10, offset=10)
print(f"{page2.total} total hits, showing {len(page2)}")

Snippets

Pass snippet_fields to get highlighted excerpts back with each hit:

results = idx.search("search", limit=10, snippet_fields=["body"])
for hit in results:
    print(hit.snippet("body"))  # returns HTML with <b> tags around matches

Sorting

Add a numeric field with sortable=True and pass sort_by:

from ruosh import NUMERIC

schema = Schema().add("doc_id", ID(stored=True)).add("body", TEXT(stored=True)).add("rank", NUMERIC(stored=True, sortable=True))

# ...index documents...

results = idx.search("search", sort_by="rank", sort_desc=False)

Corpus intelligence

ruosh maintains a lightweight sidecar that tracks approximate term frequencies across the corpus. It updates at write time and loads once, so repeated queries cost nothing.

# how often does "search" appear across documents?
stats = idx.term_stats("body", "search")
# {"term": "search", "estimated_doc_freq": 1840, "very_common": True}

# what are the most common terms in this field?
terms = idx.frequent_terms("body", limit=20)
# [{"term": "search", "estimated_doc_freq": 1840}, ...]

These are estimates, not exact counts. They are useful for things like query planning, stopword detection, and building tag clouds. Neither SQLite FTS5 nor Whoosh exposes a top-terms API at all.

Updating and deleting documents

# replace a document by its unique field
w = idx.writer()
w.update_document(doc_id="1", title="Updated title", body="New body text.")
w.commit()

# delete by field value
w = idx.writer()
w.delete_by_term("doc_id", "1")
w.commit()

Opening an existing index

from ruosh import open_dir

idx = open_dir("my-index")

Performance

ruosh trades raw single-query speed for richer features. Each search call crosses the Python-Rust boundary, which adds a few milliseconds of fixed overhead. For workloads that need very fast single-keyword lookups over small corpora, SQLite FTS5 will be faster. ruosh is a better fit when you need structured queries, pagination with correct totals, highlighted snippets, or corpus statistics.

Compared to Whoosh, ruosh is faster across the board. On a 100,000-document corpus:

Scenario	ruosh	Whoosh
Keyword search	8.9 ms	43 ms
Boolean AND	13 ms	201 ms
Phrase query	48 ms	371 ms
Paginated results	30 ms	193 ms
Snippet extraction	10 ms	41 ms
Corpus intelligence	0.7 ms	5.7 ms

Development

Requirements: Python 3.11, Rust toolchain, uv.

git clone https://github.com/biswarupghosh/ruosh
cd ruosh
uv venv .venv --python 3.11
uv sync --extra dev
uv run maturin develop
uv run pytest

To run the full feature benchmark:

uv sync --extra dev --extra bench
uv run python scripts/benchmark_features.py

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
docs		docs
python/ruosh		python/ruosh
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ruosh

Install

Quick start

Queries

Pagination

Snippets

Sorting

Corpus intelligence

Updating and deleting documents

Opening an existing index

Performance

Development

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ruosh

Install

Quick start

Queries

Pagination

Snippets

Sorting

Corpus intelligence

Updating and deleting documents

Opening an existing index

Performance

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages