#bloom-filter #probabilistic #probabilistic-data-structures

quickbloom

An industry-grade, highly scalable Bloom filter with lock-free concurrency, blocked layout, and automatic persistence

2 unstable releases

0.2.0 Apr 8, 2026
0.1.0 Apr 4, 2026

#693 in Data structures

MIT license

46KB
538 lines

quickbloom

Crates.io License GitHub Workflow Status

quickbloom is an industry-grade Bloom filter library for Rust featuring:

  • 🚀 ahash — fastest non-cryptographic hasher for short keys
  • 🧠 Enhanced double hashingh(i) = h1 + i·h2 + reduces bit clustering at high fill ratios
  • 🗂️ Blocked (cache-friendly) layout — all k hashes for one item stay inside a single 64-byte CPU cache line
  • ⚛️ Lock-free AtomicBloomFilter — insert and query from any number of threads simultaneously with zero locking
  • 📈 ScalableBloomFilter — grows automatically when saturated; false-positive rate never deteriorates
  • 💾 Automatic persistence — attach a file path and the filter saves itself on Drop

Installation

[dependencies]
quickbloom = "0.2.0"

Quick Start

use quickbloom::{BloomConfig, BloomFilter, BloomMode};

// Mathematically optimal size for 1M items at 1% false-positive rate
let config = BloomConfig::new(1_000_000, 0.01);
let (size, hashes) = config.parameters();

let mut filter = BloomFilter::with_mode(size, hashes, BloomMode::Blocked);

filter.insert(&"alice");
assert!(filter.contains(&"alice")); // true  – definitely present
assert!(!filter.contains(&"bob")); // false – definitely absent

Types at a Glance

Type Best for Concurrency Grows? Persistence
BloomFilter Single-threaded, general use No No
ScalableBloomFilter Unbounded / unknown data sets No (wrap in ConcurrentBloomFilter)
ConcurrentBloomFilter Read-heavy multi-thread workloads RwLock Depends on F Depends on F
AtomicBloomFilter Write-heavy lock-free hot paths (BETA) Atomic No Manual

BloomFilter

The standard single-threaded Bloom filter. Supports two memory layouts:

Standard layout (default)

Hash positions are spread uniformly across the entire bit array. Simple and accurate.

use quickbloom::BloomFilter;

let mut f = BloomFilter::new(100_000, 7);
f.insert(&"hello");
assert!(f.contains(&"hello"));

Blocked layout (cache-friendly)

All k hashes for one item are confined to a single 64-byte block (CPU cache line). On large filters this often doubles throughput with a negligible increase in false-positive rate.

use quickbloom::{BloomFilter, BloomMode};

let mut f = BloomFilter::with_mode(1_000_000, 7, BloomMode::Blocked);
f.insert(&"cache_friendly");

Persistence

Attach a path to enable automatic saving when the filter is dropped:

use quickbloom::BloomFilter;

{
    let mut f = BloomFilter::new(100_000, 7).with_persistence("my_filter.bin");
    f.insert(&"persist_me");
} // ← auto-saved here

// Load it back next time:
let f = BloomFilter::load_or_new("my_filter.bin", 100_000, 7);
assert!(f.contains(&"persist_me"));

You can also call .save() manually to flush at any point:

f.save().expect("I/O error");

Sizing with BloomConfig

Instead of guessing raw bit counts, use BloomConfig to compute optimal parameters:

use quickbloom::BloomConfig;

let config = BloomConfig::new(500_000, 0.005); // 500K items, 0.5% FP rate
let (bits, hashes) = config.parameters();
println!("bits={bits}, hashes={hashes}");

ScalableBloomFilter

Automatically provisions new layers when the current one exceeds 50% fill, preserving the configured false-positive rate as items grow without bound.

use quickbloom::{BloomConfig, ScalableBloomFilter};

let config = BloomConfig::new(1_000, 0.01);
let mut f = ScalableBloomFilter::new(config);

for i in 0..5_000u64 {
    f.insert(&i);
}

println!("layers: {}", f.layers()); // > 1 once the first layer fills
assert!(f.contains(&0u64));
assert!(f.contains(&4_999u64));

Custom growth parameters

use quickbloom::{BloomConfig, ScalableBloomFilter};

let config = BloomConfig::new(1_000, 0.01);
let f = ScalableBloomFilter::with_parameters(
    config,
    0.85, // tightening ratio – each new layer's FP target is multiplied by this
    4,    // growth factor   – each new layer is 4× larger than the previous
);

Persistence

use quickbloom::{BloomConfig, ScalableBloomFilter};

let config = BloomConfig::new(1_000, 0.01);

{
    let mut f = ScalableBloomFilter::new(config.clone())
        .with_persistence("scalable.bin");
    for i in 0..2_000u64 { f.insert(&i); }
} // ← auto-saved

let f = ScalableBloomFilter::load_or_new("scalable.bin", config);
assert!(f.contains(&0u64));

ConcurrentBloomFilter

A generic Arc<RwLock<F>> wrapper. Use it when you need concurrent access to a filter that also supports persistence or dynamic scaling.

use quickbloom::{BloomFilter, ConcurrentBloomFilter};
use std::sync::Arc;

let f = ConcurrentBloomFilter::new(BloomFilter::new(100_000, 7));

// Share across threads
let f2 = f.clone();
std::thread::spawn(move || {
    f2.write(|inner| inner.insert(&"from_thread"));
}).join().unwrap();

assert!(f.read(|inner| inner.contains(&"from_thread")));

Tip: For write-heavy workloads at high concurrency, prefer AtomicBloomFilter which has zero lock overhead.


AtomicBloomFilter (BETA)

[!WARNING] AtomicBloomFilter is currently in Beta. While functionally complete and high-performance, the API and internal representation may evolve. Use with caution in critical production paths.

A fully lock-free Bloom filter backed by AtomicU8 bytes. Any number of threads can insert and contains simultaneously without blocking.

use quickbloom::{AtomicBloomFilter, BloomConfig};
use std::sync::Arc;

let config = BloomConfig::new(500_000, 0.01);
let filter = Arc::new(AtomicBloomFilter::from_config(&config));

let handles: Vec<_> = (0..8)
    .map(|i| {
        let f = Arc::clone(&filter);
        std::thread::spawn(move || {
            f.insert(&format!("thread_{}", i));
        })
    })
    .collect();

for h in handles { h.join().unwrap(); }

assert!(filter.contains(&"thread_0".to_string()));
assert!(filter.contains(&"thread_7".to_string()));

Monitoring saturation

println!("fill ratio: {:.2}", filter.fill_ratio());
// When fill_ratio > 0.5, false-positive rate rises quickly.
// At that point, create a new AtomicBloomFilter with a larger size.

Benchmarking

Run the included benchmarks to compare the three approaches on your hardware:

cargo bench

The benchmark suite measures:

  • insert/standard vs insert/blocked vs insert/atomic_lock_free
  • contains/standard vs contains/blocked vs contains/atomic_lock_free
  • concurrent/atomic_threads at 1, 4, and 8 threads

Running Examples

cargo run --example basic_usage

Running Tests

cargo test

Hashing Details

quickbloom uses ahash with seeded RandomState for the two base hashes, and enhanced double hashing for index derivation:

h(i) = h1 + i × h2 +

The quadratic term eliminates the bit-clustering that can appear in pure double hashing when fill ratios exceed ~30%, resulting in a lower real-world false-positive rate.


File Format

The binary persistence format is versioned (v3). Files saved by an older version of quickbloom will not load in v3 (a None is returned and a fresh filter is created). The format stores:

  • A 2-byte header: [version][type]
  • For each filter: [mode][size][hashes][items][raw_byte_count][raw_bytes…]
  • For scalable filters: config parameters, growth factors, then each layer as above.

Benchmarks

To ensure industry-grade performance, we benchmarked quickbloom v0.2.0 against the established fastbloom v0.9.0 crate.

Test Environment

  • Hardware: Apple M1 Max
  • Task: 1,000,000 operations on strings ("user_12345")
  • Parameters: 1,000,000 bits, 7 hash functions
Crate Mode Avg Latency (Insert)
quickbloom v0.2.0 Blocked (Cache-Line) 14.78 ns
quickbloom v0.2.0 Standard 16.17 ns
fastbloom v0.9.0 Blocked (512-bit) 18.56 ns
fastbloom v0.9.0 Standard 18.77 ns

[!TIP] With the switch to ahash and our optimized Blocked Layout, quickbloom now provides top-tier throughput that is competitive with and often exceeds established high-performance implementations.

Authors

Rasesh Shetty

License

MIT — see LICENSE

Dependencies

~2MB
~35K SLoC