ir

ENG | 한국어

Local semantic search engine for markdown knowledge bases. Rust port of qmd with three key differences:

Per-collection SQLite — each collection is an independent file; no shared global index
Persistent daemon — models stay loaded between queries; first search auto-starts it
Dual LLM cache — expander outputs and reranker scores are persisted; repeated queries are instant

Search quality benchmarked on 4 BEIR datasets; reranking adds up to +14.5% nDCG@10 over pure vector.

Features

Hybrid search — BM25 probe → score fusion (0.80·vec + 0.20·bm25) → LLM reranking
Query expansion — typed sub-queries (lex/vec/hyde) when expander model is present
Strong-signal shortcut — skips expansion when top BM25 score ≥ 0.75 with gap ≥ 0.10
Daemon mode — keeps models warm between queries; auto-starts on first search
Dual LLM cache — expander outputs cached globally; reranker scores cached per-collection
Per-collection SQLite — independent WAL journals, isolated backup, zero cross-collection contention
Content-addressed storage — identical files deduplicated by SHA-256 within a collection
FTS5 injection-safe — all user input escaped before FTS5 query construction
Metal GPU — all layers offloaded to Metal on macOS by default; IR_GPU_LAYERS=N to override
Auto-download — models fetched from HuggingFace Hub on first use; HF_HUB_OFFLINE=1 to disable

Installation

Homebrew (macOS):

brew install vlwkaos/tap/ir

From source:

cargo install --path .

Requires Rust 1.80+. On macOS, links llama.cpp with Metal automatically.

Quick start

ir collection add notes ~/notes   # register a collection
ir update notes                   # scan files → extract text → populate FTS5 index (BM25)
ir embed notes                    # chunk text → run embedding model → store vectors (enables vector + hybrid search)
ir search "memory safety in rust" # search (daemon auto-starts)

ir update is fast (no models, pure text processing). ir embed is slow on first run (model inference per chunk) but only re-embeds changed content on subsequent runs. BM25 search works after update alone; vector and hybrid search require embed.

Models

Models are downloaded automatically from HuggingFace Hub on first use and cached in ~/.cache/huggingface/. No manual setup required.

Model	HF Repo	Required for
EmbeddingGemma 300M	`ggml-org/embeddinggemma-300M-GGUF`	`ir embed`, vector search, hybrid
Qwen3.5-0.8B	`unsloth/Qwen3.5-0.8B-GGUF`	unified expand + rerank (optional)
Qwen3.5-2B	`unsloth/Qwen3.5-2B-GGUF`	unified expand + rerank (optional)
Qwen3-Reranker 0.6B	`ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF`	reranking only (optional)
qmd-query-expansion 1.7B	`tobil/qmd-query-expansion-1.7B`	expansion only (optional)
BGE-M3 568M	`ggml-org/bge-m3-Q8_0-GGUF`	Korean embedding alternative (optional)

BM25 search works without any models. When IR_COMBINED_MODEL is set (or a Qwen3.5 GGUF is found in ~/local-models/), it replaces both the expander and reranker.

Local models:

export IR_MODEL_DIRS="$HOME/my-models"
export IR_COMBINED_MODEL="$HOME/local-models/Qwen3.5-2B-Q4_K_M.gguf"   # unified
export IR_EMBEDDING_MODEL="$HOME/my-models/embeddinggemma-300M-Q8_0.gguf"
export IR_RERANKER_MODEL="$HOME/my-models/qwen3-reranker-0.6b-q8_0.gguf"
export IR_EXPANDER_MODEL="$HOME/my-models/qmd-query-expansion-1.7B-q4_k_m.gguf"

Search order: env → IR_MODEL_DIRS → ~/local-models/ → ~/.cache/ir/models/ → ~/.cache/qmd/models/ → HF Hub auto-download.

IR_*_MODEL env vars accept a path to a .gguf file, a directory containing a known model file, or a HuggingFace repo ID (owner/name). Unrecognized values error immediately instead of silently loading the default.

Known HF repo IDs: ggml-org/embeddinggemma-300M-GGUF, ggml-org/bge-m3-Q8_0-GGUF, ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF, tobil/qmd-query-expansion-1.7B, unsloth/Qwen3.5-0.8B-GGUF, unsloth/Qwen3.5-2B-GGUF.

Compatibility aliases: QMD_EMBEDDING_MODEL, QMD_RERANKER_MODEL, QMD_EXPANDER_MODEL, QMD_MODEL_DIRS.

Config directory:

export IR_CONFIG_DIR="~/vault/.config/ir"   # portable across machines

IR_CONFIG_DIR sets the directory for config, collection DBs, and daemon files. Supports ~ and $VAR expansion, so the value is safe to use in MCP configs synced across machines. Precedence: IR_CONFIG_DIR → XDG_CONFIG_HOME/ir (deprecated) → ~/.config/ir.

GPU:

IR_GPU_LAYERS=0 ir search "query"   # force CPU
IR_GPU_LAYERS=32 ir search "query"  # partial offload

Usage

Collections:

ir collection add notes ~/notes
ir collection add code  ~/code
ir collection ls
ir collection rm notes
ir status                    # index health per collection

Index and embed:

ir update                    # index all collections
ir update notes              # one collection
ir update notes --force      # full re-index from scratch

ir embed                     # embed all unembedded documents
ir embed notes --force       # re-embed everything

Search:

ir search "memory safety in rust"
ir search "sqlite architecture" --mode bm25
ir search "async patterns"     --mode vector
ir search "error handling"     --mode hybrid -c notes --min-score 0.4

# Output formats
ir search "ownership" --json
ir search "ownership" --md
ir search "ownership" --files       # paths only
ir search "ownership" --full        # include full document content in results
ir search "ownership" --chunk       # include best-matching chunk text (vector results)
ir search "ownership" --quiet       # suppress stderr (progress, logs) — for scripting

# Filter by field (-f/--filter, repeatable; all clauses ANDed)
ir search "design" -f "modified_at>=2026-01-01"
ir search "design" -f "meta.tags=rust"
ir search "design" -f "path~notes/"
ir search "design" -f "modified_at>=2025-01-01" -f "meta.author=vlwkaos"

Retrieve documents:

ir get "2026/Daily/04/2026-04-07.md"           # collection-relative path
ir get "Notes/2026/Daily/04/2026-04-07.md"     # vault-root path (strips collection dir prefix)
ir get "2026-04-07" -c periodic                # substring match, scoped to collection
ir get "some/path.md" --json                   # full metadata as JSON
ir get "some/path.md" --section "Installation" # extract named heading section only
ir get "some/path.md" --max-chars 3000         # first 3000 chars
ir get "some/path.md" --offset 1000 --max-chars 2000  # chars 1000–3000

ir multi-get "file1.md" "file2.md" "file3.md"  # batch fetch
ir multi-get "file1.md" "file2.md" --json       # {found: [...], not_found: [...]}
ir multi-get "file1.md" "file2.md" --files      # paths only (found ones)
ir multi-get "file1.md" "file2.md" --max-chars 2000  # truncate each doc

Path matching order: exact → suffix (%/path) → substring. Vault-root paths (where the first component matches the collection's directory name) are resolved before the normal match.

Filter syntax (-f/--filter):

Each clause is a string FIELD OP VALUE. Multiple -f flags are ANDed together.

Field	Description
`path`	Document path (relative to collection root)
`modified_at`	File modification time (UTC RFC3339)
`created_at`	File creation time (UTC RFC3339)
`meta.<name>`	Frontmatter field (e.g. `meta.tags`, `meta.author`)

Op	Meaning
`=` / `!=`	Equal / not equal (case-sensitive)
`>` / `>=` / `<` / `<=`	Lexicographic compare (dates normalize to UTC RFC3339)
`~` / `!~`	Contains / not-contains (case-insensitive)

Date values for modified_at, created_at, and meta.date are normalized to UTC RFC3339 (YYYY-MM-DD becomes YYYY-MM-DDT00:00:00Z). Multi-valued frontmatter fields (e.g. tag arrays) match if any element satisfies the clause — including !=. A doc tagged ["rust", "go"] passes meta.tags!=rust because "go" satisfies the condition. Documents with no metadata rows always fail meta.* clauses.

Note: Collection DBs are upgraded to schema version 2 on first use after this release. The one-time backfill (populating document_metadata from existing frontmatter) is fast (<1s for <10k docs).

Daemon:

ir daemon start              # start (auto-started on first search)
ir daemon stop
ir daemon status

The daemon keeps models warm in memory. Subsequent queries over the Unix socket skip model loading entirely (~30ms round-trip vs 3s cold).

Incremental Indexing

IR efficiently handles updates by only processing changed files through content-addressed storage with SHA-256 hashing.

How it works:

Change detection: Files are hashed (SHA-256) and compared against stored hashes
Smart updates: Only modified or new files are re-processed
Deletion handling: Removed files are marked as inactive (soft delete)
Deduplication: Identical content within a collection shares storage

Index operations:

# Regular incremental update (default)
ir update                    # all collections
ir update notes              # specific collection

# Force full re-index from scratch
ir update notes --force      # rebuilds entire index

# Check what changed (see the summary)
ir update notes
# Output: "2 added, 1 updated, 0 deactivated"

Embedding operations:

# Incremental embedding (only new/changed documents)
ir embed                     # embeds unembedded content
ir embed notes               # specific collection

# Force re-embedding everything
ir embed notes --force       # re-computes all vectors

Performance characteristics:

Initial indexing: fast (no models, pure text extraction)
Incremental updates: only processes changed files
Hash comparison: instant even for thousands of files
Embedding: slow first time, fast incremental updates

Example workflow:

# Monday: initial setup
ir collection add notes ~/notes
ir update notes              # indexes 500 files
ir embed notes               # computes 500 embeddings (slow)

# Tuesday: added 3 files, modified 2
ir update notes              # Output: "3 added, 2 updated, 0 deactivated"
ir embed notes               # only embeds 5 documents (fast)

# Wednesday: deleted 1 file
ir update notes              # Output: "0 added, 0 updated, 1 deactivated"
# No embedding needed for deletions

The incremental approach means you can run ir update frequently without performance penalty — only changed content is processed.

MCP server — Claude Desktop / Claude Code

ir mcp runs a Model Context Protocol server so Claude can search your indexed documents directly.

Claude Desktop (~/.config/claude/claude_desktop_config.json):

{
  "mcpServers": {
    "ir": {
      "command": "ir",
      "args": ["mcp"]
    }
  }
}

Claude Code (.mcp.json in project root or ~/.claude/mcp.json):

{
  "mcpServers": {
    "ir": {
      "command": "ir",
      "args": ["mcp"]
    }
  }
}

Five tools are exposed:

Tool	Description
`search`	Hybrid BM25+vector search. Returns path, title, score, snippet. Params: `mode`, `limit`, `min_score`, `collections`, `full` (include full doc text), `include_chunk` (include best-matching chunk text), `filter` (array of `{field, op, value}` objects, ANDed).
`get`	Retrieve document text by path (exact → suffix → substring match). Params: `collections`, `section` (heading text, case-insensitive), `offset` (char offset), `max_chars` (truncate).
`multi_get`	Batch document retrieval. Params: `paths[]`, `collections`, `max_chars` (truncate each doc). Returns `found` and `not_found`.
`status`	Index health — collection names, doc counts, DB sizes, daemon status.
`update`	Re-index collections after file changes. Accepts `collection` and `force` params.

The filter array accepts structured clauses: {"field": "modified_at", "op": ">=", "value": "2024-01-01"}. Fields: path, modified_at, created_at, meta.<name>. Ops: =, !=, >, >=, <, <=, ~ (contains), !~ (not-contains).

HTTP mode (for remote access or multi-client setups):

ir mcp --http 3620    # serve on all interfaces, port 3620

Configure clients to point at http://<host>:3620/mcp. The daemon starts automatically on first search tool call.

Security note: HTTP mode is unauthenticated and binds to all interfaces. Only expose it on trusted networks. The update tool can trigger re-indexing, so treat it like any other local write-access service.

Preprocessors — Korean / Japanese / Chinese

Preprocessors tokenize text before BM25 indexing. Without one, agglutinated words ("이스탄불의", "東京都") are treated as single FTS tokens and never match morpheme-level queries. The same preprocessor runs at index time and query time.

Korean (lindera, Mode::Decompose):

ir preprocessor install ko          # downloads official lindera CLI + ko-dic, registers as "ko"
                                    # shows collection picker to bind immediately
ir collection add wiki ~/wiki       # add collection (if not yet added)
ir preprocessor bind ko wiki        # wire "ko" to collection and re-index
ir search "서울 지하철" -c wiki

ir preprocessor install ko downloads the official lindera CLI binary and ko-dic dictionary from lindera's GitHub releases. Supported platforms: macOS (arm64, x86_64) and Linux (x86_64, aarch64). No system deps, no Rust toolchain required. The install step shows an interactive picker so you can bind to collections right away.

Other languages:

ir preprocessor install ja    # Japanese (Lindera + ipadic)
ir preprocessor install zh    # Chinese (Lindera + jieba)

Manage:

ir preprocessor list          # shows registered + available bundled preprocessors
ir preprocessor remove ko     # unregister (keeps binary)
ir preprocessor remove ko -d  # unregister and delete binary

The protocol is stdin/stdout line-by-line: one UTF-8 line in, zero or one tokenized line out (zero if all tokens are filtered), process stays alive between lines. The subprocess must pass ASCII-only single-word lines through unchanged — ir uses an internal sentinel token to detect when a line produces no output. Any executable following this protocol can be registered.

Lindera throughput: ~5,600 Korean docs/s · 1.8 MB/s on M-series Mac. Near-zero cold start (Rust binary, embedded dictionary).

Korean BM25 benchmark (MIRACL-Korean, 213 queries):

preprocessor	nDCG@10	note
none	0.0009	agglutinated tokens never match
lindera	0.0460	50× gain from morphological tokenization
lindera hybrid+rerank	0.8411	near-ceiling on 2,835 passages

Compound decompounding benchmark (50 queries targeting compound sub-components):

preprocessor	nDCG@10	note
none	0.0000	sub-parts absent from FTS index
lindera	0.6326	Mode::Decompose splits compounds

See research/experiment.md for full results and rationale.

Korean embedding models: For Korean-optimized dense retrieval, BGE-M3 can replace the default embedding model via IR_EMBEDDING_MODEL. Filename auto-detection handles pooling and formatting. See README.ko.md for setup. Switching models requires ir embed --force (vector dimensions auto-adapt).

Search Pipeline

Query → BM25 probe → score fusion (0.80·vec + 0.20·bm25) → reranking

Strong-signal shortcut (BM25 score ≥ 0.75, gap ≥ 0.10) skips all LLM work. With expander: expand → lex/vec/hyde sub-queries → RRF → rerank top-20. All LLM outputs cached in SQLite — repeated queries skip inference entirely.

See research/pipeline.md for staged async daemon design.

Benchmark — BEIR (4 datasets, nDCG@10)

EmbeddingGemma 300M embeddings + qmd-expander-1.7B + Qwen3-Reranker-0.6B.

Dataset	BM25	Vector	Hybrid	+Reranker	LLM gain
NFCorpus (323q)	0.2046	0.3898	0.3954	0.4001	+1.2%
SciFact (300q)	0.0500	0.7847	0.7873	0.7797	−1.0%
FiQA (648q)	0.0298	0.4324	0.4266	0.4567	+7.1%
ArguAna (1406q)	0.0012	0.4264	0.4263	0.4879	+14.5%

BM25 fusion provides no statistically significant lift over pure vector (paired t-test). Reranker gains are largest on conversational/argument retrieval.

See research/experiment.md for reproduction steps.

vs qmd

ir is a Rust port of qmd with a different storage model and a persistent daemon.

	qmd	ir
Storage	Single SQLite for all collections	Per-collection SQLite — `rm name.sqlite` to delete
Concurrent writes	Shared WAL journal	Independent WAL per collection
sqlite-vec	Dynamically loaded `.so`	Statically compiled in
Process model	Spawns per query	Daemon keeps models warm
LLM cache	Reranker scores (per-collection)	Reranker scores + expander outputs (global)
Quality (NFCorpus nDCG@10)	No published numbers	0.4001

Performance (macOS M4 Max, same models and query):

	ir	qmd	Ratio
Cold (no cache)	3.0s	9.5s	3×
Warm (daemon + caches hot)	30ms	840ms	28×

Cold difference: ir caps reranking at 20 candidates vs qmd's 40. Warm difference: qmd pays ~800ms process spawn + JS runtime per invocation; ir's daemon round-trip is 30ms (embed + kNN only).

Development

cargo build                  # debug build
cargo build --release        # release build
cargo test                   # unit tests (no models required)
cargo test -- --ignored      # model-dependent tests (requires models)
cargo run --bin eval -- --data test-data/nfcorpus --mode all

Schema

Each collection database (~/.config/ir/collections/<name>.sqlite):

content          — hash → full text (content-addressed)
documents        — path, title, hash, active flag
documents_fts    — FTS5 virtual table (porter tokenizer)
vectors_vec      — sqlite-vec kNN (768d cosine, EmbeddingGemma format)
content_vectors  — chunk metadata (hash, seq, pos, model)
llm_cache        — reranker score cache (sha256(model+query+doc) → score)
meta             — collection metadata (name, schema version)

Global cache (~/.config/ir/expander_cache.sqlite):

expander_cache   — sha256(model+query) → JSON Vec<SubQuery>

Triggers keep documents_fts in sync with documents on insert/update/delete.

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
.claude/skills/codex		.claude/skills/codex
research		research
scripts		scripts
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.ko.md		README.ko.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ir

Installation

Quick start

About

Uh oh!

Releases 27

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ir

Installation

Quick start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 27

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages