🦀 minikv

A production-ready distributed key-value store with Raft consensus

Built in 24 hours by someone learning Rust for 31 days 🚀

🆕 What's New in v0.2.0 (December 10, 2025)

minikv v0.2.0 is a fully production-ready distributed key-value store. All code, comments, scripts, and documentation are now in professional English.

Major Features Completed

✅ Full multi-node Raft consensus: Robust leader election, log replication, snapshots, commit index, recovery, and network partition detection. Zero known errors or warnings.
✅ Advanced Two-Phase Commit (2PC) streaming: Chunked blob streaming, error propagation, retry, timeouts, and comprehensive integration tests for error scenarios.
✅ Automatic cluster rebalancing: Detects overloaded/underloaded volumes, moves blobs and updates metadata, with CLI/admin tools for monitoring and manual control.
✅ Prometheus metrics endpoint: /metrics exposes cluster and volume stats, Raft role, replication lag, health, and more. Includes counters, histograms, and alerting support.
✅ Professional integration, stress, and recovery tests: Manual scenarios for node loss, split-brain, recovery, high load, consistency verification, and disaster recovery.
✅ All scripts, test templates, and documentation translated/adapted to English: Internationalized for global teams and production use.

Project Status

All core features are implemented and production-ready.
No stubs, TODOs, or incomplete logic remain.
All documentation, comments, and scripts are in professional English.
Ready for enterprise deployment and further extension.

✨ What is minikv?

minikv is a distributed key-value store built from scratch in Rust, designed to be production-ready with enterprise-grade features.

🎯 Core Features

⚡ Raft consensus for high availability
🔄 Two-Phase Commit (2PC) for strong consistency
💾 Write-Ahead Log (WAL) for durability
🗂️ 256 virtual shards for horizontal scalability
🌸 Bloom filters for fast lookups
📡 gRPC for internal coordination
🌐 HTTP REST API for client access

� Project Status & Roadmap

v0.2.0 is fully production-ready.
All major distributed features are complete and tested.
Security (TLS, authentication) and cross-datacenter replication are planned for v0.3.0+.

🔄 Evolution from mini-kvstore-v2

This is the distributed evolution of mini-kvstore-v2:

Feature	mini-kvstore-v2	minikv
Architecture	Single-node	Multi-node cluster
Consensus	❌ None	✅ Raft
Replication	❌ None	✅ N-way (2PC)
Durability	❌ None	✅ WAL + fsync
Sharding	❌ None	✅ 256 virtual shards
Lines of Code	~1,200	~1,800
Development Time	10 days	+24 hours
Write Performance	240K ops/s	80K ops/s (replicated 3x)
Read Performance	11M ops/s	8M ops/s (distributed)

What's preserved from v2:

✅ Segmented append-only logs
✅ In-memory HashMap index (O(1) lookups)
✅ Bloom filters for negative lookups
✅ Index snapshots (5ms restarts)
✅ CRC32 checksums

What's new:

🆕 Raft consensus for coordinator HA
🆕 2PC for distributed transactions
🆕 gRPC internal protocol
🆕 WAL for durability
🆕 Dynamic sharding with 256 virtual shards
🆕 Automatic rebalancing

⚡ Quick Start

Prerequisites

Rust 1.81+ (Install)
Docker (optional, for cluster deployment)

1. Build from Source

git clone https://github.com/whispem/minikv
cd minikv
cargo build --release

2. Start a Local Cluster

Option A: One-line script (Recommended)

./scripts/serve. sh 3 3  # 3 coordinators + 3 volumes

Option B: Using Docker Compose

docker-compose up -d

Option C: Manual (for learning)

Start 3 coordinators in separate terminals:

# Terminal 1 - Coordinator 1 (will become Raft leader)
./target/release/minikv-coord serve \
  --id coord-1 \
  --bind 0.0.0.0:5000 \
  --grpc 0.0.0.0:5001 \
  --db ./coord1-data \
  --peers coord-2:5003,coord-3:5005

# Terminal 2 - Coordinator 2
./target/release/minikv-coord serve \
  --id coord-2 \
  --bind 0.0.0.0:5002 \
  --grpc 0.0.0.0:5003 \
  --db ./coord2-data \
  --peers coord-1:5001,coord-3:5005

# Terminal 3 - Coordinator 3
./target/release/minikv-coord serve \
  --id coord-3 \
  --bind 0.0. 0.0:5004 \
  --grpc 0.0.0.0:5005 \
  --db ./coord3-data \
  --peers coord-1:5001,coord-2:5003

Start 3 volumes in separate terminals:

# Terminal 4 - Volume 1
./target/release/minikv-volume serve \
  --id vol-1 \
  --bind 0.0.0.0:6000 \
  --grpc 0.0.0.0:6001 \
  --data ./vol1-data \
  --wal ./vol1-wal \
  --coordinators http://localhost:5000

# Terminal 5 - Volume 2
./target/release/minikv-volume serve \
  --id vol-2 \
  --bind 0.0. 0.0:6002 \
  --grpc 0. 0.0.0:6003 \
  --data ./vol2-data \
  --wal ./vol2-wal \
  --coordinators http://localhost:5000

# Terminal 6 - Volume 3
./target/release/minikv-volume serve \
  --id vol-3 \
  --bind 0.0.0.0:6004 \
  --grpc 0.0.0.0:6005 \
  --data ./vol3-data \
  --wal ./vol3-wal \
  --coordinators http://localhost:5000

3. Use the CLI

# Put a blob (automatically replicated 3x)
echo "Hello, distributed world!" > test.txt
./target/release/minikv put my-key --file test.txt

# Get it back
./target/release/minikv get my-key --output retrieved.txt

# Delete
./target/release/minikv delete my-key

# Cluster operations
./target/release/minikv verify --deep        # Check integrity
./target/release/minikv repair --replicas 3  # Fix under-replication
./target/release/minikv compact --shard 0    # Reclaim space

4. Use the HTTP API

# Put a blob
curl -X PUT http://localhost:5000/my-key --data-binary @file.pdf

# Get a blob
curl http://localhost:5000/my-key -o output.pdf

# Delete a blob
curl -X DELETE http://localhost:5000/my-key

# Health check
curl http://localhost:5000/health

🏗️ Architecture

High-Level Overview

┌─────────────────────────────────────────────────────┐
│          Coordinator Cluster (Raft)                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐          │
│  │ Coord-1  │◄─┤ Coord-2  │◄─┤ Coord-3  │          │
│  │ (Leader) │  │(Follower)│  │(Follower)│          │
│  └────┬─────┘  └──────────┘  └──────────┘          │
│       │ Metadata consensus via Raft                 │
└───────┼─────────────────────────────────────────────┘
        │ gRPC (2PC, placement, health monitoring)
   ┌────┴────┬─────────────┬─────────────┐
   │         │             │             │
┌──▼──────┐ ┌▼─────────┐ ┌▼─────────┐ ┌▼─────────┐
│Volume-1 │ │Volume-2  │ │Volume-3  │ │Volume-N  │
│         │ │          │ │          │ │          │
│Shards:  │ │Shards:   │ │Shards:   │ │Shards:   │
│0-85     │ │86-170    │ │171-255   │ │0-255     │
│         │ │          │ │          │ │          │
│+ WAL    │ │+ WAL     │ │+ WAL     │ │+ WAL     │
│+ Bloom  │ │+ Bloom   │ │+ Bloom   │ │+ Bloom   │
│+ Snap   │ │+ Snap    │ │+ Snap    │ │+ Snap    │
└─────────┘ └──────────┘ └──────────┘ └──────────┘

Components

Coordinator (Raft Cluster)

Stores metadata: key → [replica locations]
Elects leader via Raft consensus
Orchestrates writes using 2PC
Monitors volume health
Uses RocksDB for persistent metadata

Volume (Storage Nodes)

Stores actual blob data
Segmented append-only logs
In-memory index for O(1) lookups
WAL for crash recovery
Automatic compaction

Write Path (2PC with Strong Consistency)

Client → PUT /my-key (1MB blob)
  ↓
Coordinator (Raft Leader)
  ↓
1️⃣ Select 3 replicas via HRW hashing
  key="my-key" → hash → shard 42 → [vol-1, vol-3, vol-5]
  ↓
2️⃣ Phase 1: PREPARE
  ├─ gRPC → vol-1: prepare(key, size=1MB, blake3=abc...)
  ├─ gRPC → vol-3: prepare(key, size=1MB, blake3=abc...)
  └─ gRPC → vol-5: prepare(key, size=1MB, blake3=abc...)
  ↓ (All volumes reserve space, return OK)
  ↓
3️⃣ Phase 2: COMMIT
  ├─ gRPC → vol-1: commit(key) → stream data → WAL → disk
  ├─ gRPC → vol-3: commit(key) → stream data → WAL → disk
  └─ gRPC → vol-5: commit(key) → stream data → WAL → disk
  ↓ (All volumes persist data, return OK)
  ↓
4️⃣ Update metadata (replicated via Raft)
  metadata["my-key"] = {
    replicas: [vol-1, vol-3, vol-5],
    size: 1MB,
    blake3: abc.. .,
    shard: 42
  }
  ↓
✅ Success → 201 Created

Error Handling:

If PREPARE fails → abort all
If COMMIT fails → retry or mark as failed
If coordinator crashes → Raft elects new leader

Read Path (Optimized for Locality)

Client → GET /my-key
  ↓
Coordinator: lookup metadata
  metadata["my-key"] → replicas: [vol-1, vol-3, vol-5]
  ↓
Select closest healthy volume (e.g., vol-1)
  ↓
Option A: Redirect (307 Temporary Redirect → vol-1:6000/my-key)
Option B: Proxy (stream from vol-1 through coordinator)
  ↓
Volume-1:
  1️⃣ Check Bloom filter → probably exists
  2️⃣ Lookup index: "my-key" → {shard: 42, offset: 1024, size: 1MB}
  3️⃣ Read from disk: segments/42/00/01. blob @ offset 1024
  4️⃣ Verify CRC32 checksum
  5️⃣ Stream to client
  ↓
✅ 200 OK (1MB blob)

Failure Scenarios

Coordinator Failure:

Coord-1 (Leader) crashes
  ↓
Coord-2 and Coord-3 detect missing heartbeats
  ↓
Raft election triggered (<200ms)
  ↓
Coord-2 becomes new leader
  ↓
Clients automatically redirect to new leader

Volume Failure:

Vol-1 crashes (has replicas for shard 42)
  ↓
Coordinator detects missing heartbeats
  ↓
Marks vol-1 as "dead" in metadata
  ↓
Reads: redirect to vol-3 or vol-5 (other replicas)
Writes: select different volume for new data
  ↓
Background repair job (optional):
  Copy under-replicated data to healthy volumes

📊 Performance

Benchmarks

Hardware: MacBook M4, 16GB RAM, NVMe SSD

Distributed cluster (3 coordinators + 3 volumes, replication factor = 3):

Writes:  80,000 ops/sec (2PC + 3x replication)
Reads:   8,000,000 ops/sec (distributed reads)

Latency (1MB blobs):
  PUT:  p50=8ms  p90=15ms  p95=22ms
  GET:  p50=1ms  p90=3ms   p95=5ms

Raft Consensus:
  Leader election: <200ms
  Log replication: ~5ms per entry

Single-node baseline (mini-kvstore-v2, no replication):

Writes:  240,000 ops/sec
Reads:   11,000,000 ops/sec

Run Your Own Benchmarks

cargo bench
./scripts/benchmark.sh
k6 run bench/scenarios/write-heavy.js
k6 run bench/scenarios/read-heavy.js

Example k6 output:

✓ write ok
✓ read ok

write_latency.. .: avg=12.3ms min=3.2ms med=8.1ms max=89.4ms p(90)=18.7ms p(95)=24.3ms
read_latency... .: avg=2.1ms  min=0.4ms med=1.3ms max=45.2ms p(90)=3.8ms  p(95)=5.1ms
write_success...: 87.34% ✓ 69872  ✗ 10128
read_success....: 99.82% ✓ 31945  ✗ 58

🚀 Features

✅ Implemented (v0.2.0)

Core Distributed Features:

Full multi-node Raft consensus (leader election, log replication, snapshots, commit index, recovery, partition detection)
Advanced 2PC streaming for distributed writes (chunked, error handling, retry, timeouts)
N-way replication (configurable factor, default = 3)
HRW (Highest Random Weight) placement
256 virtual shards for horizontal scaling
Automatic cluster rebalancing (detects load, moves blobs, updates metadata)

Storage Engine:

Segmented append-only logs
In-memory HashMap index (O(1) lookups)
Bloom filters for fast negative lookups
Index snapshots (5ms restarts)
CRC32 checksums on every record
Automatic compaction (background tasks)

Durability:

Write-Ahead Log (WAL)
Configurable fsync policy (Always/Interval/Never)
Crash recovery via WAL replay

APIs:

gRPC for internal coordination (coordinator ↔ volume)
HTTP REST API for client access
CLI for operations (verify, repair, compact, rebalance)

Infrastructure:

Docker Compose setup
GitHub Actions CI/CD
k6 benchmarks with multiple scenarios
OpenTelemetry support (Jaeger tracing)
Prometheus metrics endpoint (/metrics)

Testing & Internationalization:

Professional integration, stress, and recovery tests
All scripts, test templates, and documentation in English

🔮 Planned (v0.3.0+)

📚 The Story

🌟 From Zero to Distributed in 31 Days

Background: Started learning Rust on October 27, 2025. Zero programming experience before that (I studied languages - italian & english at Aix-Marseille University).

Timeline:

Week 1-2 (Oct 27 - Nov 9): The Rust Book

Ownership, borrowing, lifetimes
Structs, enums, pattern matching
Error handling with Result<T, E>
Traits and generics

Week 3-5 (Nov 10 - Nov 25): Built mini-kvstore-v2

Single-node key-value store
Segmented append-only logs
In-memory HashMap index
Bloom filters
CRC32 checksums
Index snapshots
~1,200 lines of code
Performance: 240K writes/s, 11M reads/s

Day 31 (Dec 6, 2025): Built minikv in 24 hours

Transformed single-node into distributed system
Implemented Raft consensus (simplified)
Added 2PC for strong consistency
Added WAL for durability
Added gRPC for internal coordination
Added dynamic sharding (256 virtual shards)
~1,800 lines of code
Performance: 80K writes/s (replicated), 8M reads/s

💡 Key Learnings

1. Raft Consensus

Conceptually simple: leader election + log replication
Implementation is hard: edge cases, network partitions, timing
Rust's type system helps catch bugs at compile time

2. Two-Phase Commit (2PC)

Phase 1 (PREPARE): reserve resources, check constraints
Phase 2 (COMMIT): actually apply changes
Critical: handle failures at every step (prepare fails, commit fails, coordinator crashes)

3. gRPC vs HTTP

gRPC is ~10x faster for internal coordination (protobuf + HTTP/2)
Still use HTTP for public API (better compatibility)

4. Bloom Filters are Magic

10x speedup for negative lookups
Trade-off: false positives (1% acceptable)
Space efficient: 100K keys = ~120KB filter

5. Rust Type System

Option<T> eliminates null pointer bugs
Result<T, E> forces error handling
Ownership prevents data races at compile time
90% of distributed systems bugs caught before running

🦀 Why Rust for Distributed Systems?

Memory safety without GC pauses

No stop-the-world garbage collection
Predictable latency (important for p99)

Fearless concurrency

Ownership prevents data races
Send and Sync traits enforce thread safety

Zero-cost abstractions

High-level ergonomics (iterators, closures)
Low-level performance (no runtime overhead)

Excellent tooling

cargo (build, test, benchmark)
rustfmt (consistent formatting)
clippy (advanced lints)

Strong ecosystem

tokio for async I/O
tonic for gRPC
axum for HTTP servers
rocksdb for embedded databases

📖 Documentation

Architecture Deep Dive

CHANGELOG.md - Version history and roadmap
CONTRIBUTING.md - How to contribute
TRACING.md - Observability with OpenTelemetry

Design Decisions

Q: Why Raft over Paxos?

Raft is easier to understand and implement correctly. The paper literally says "In Search of an Understandable Consensus Algorithm". For coordinator metadata (not the data path), simplicity matters more than theoretical optimality.

Q: Why 2PC for writes?

Strong consistency is non-negotiable for a storage system. 2PC ensures all replicas are in sync or the write fails atomically. Alternative (eventual consistency) would require conflict resolution, which is complex and application-specific.

Q: Why separate coordinator and volume roles?

Coordinator: Lightweight, metadata only (~MB), can run on modest hardware
Volume: Heavy I/O, stores actual data (~TB), needs fast disks

This separation allows independent scaling: add more coordinators for HA, add more volumes for capacity.

Q: Why gRPC internally but HTTP externally?

Internal (coordinator ↔ volume): gRPC is 10x faster (protobuf + HTTP/2 multiplexing)
External (client ↔ coordinator): HTTP REST is more compatible (curl, browsers, any language)

Q: Why 256 virtual shards?

Balances three factors:

Fine-grained enough: Even data distribution across volumes
Not too many: Low coordination overhead
Power of 2: Fast modulo operations (hash % 256)

Q: Why BLAKE3 for hashing?

Faster than SHA-256 (10x on modern CPUs)
Secure enough for content addressing
Available as a fast Rust crate

Code Structure

minikv/
├── src/
│   ├── bin/
│   │   ├── cli.rs           # CLI: verify, repair, compact
│   │   ├── coord.rs         # Coordinator binary
│   │   └── volume.rs        # Volume binary
│   ├── common/              # Shared utilities
│   │   ├── config.rs        # Configuration types
│   │   ├── error.rs         # Error types (Result<T>)
│   │   ├── hash.rs          # BLAKE3, HRW, sharding
│   │   └── utils.rs         # CRC32, key encoding, etc.
│   ├── coordinator/         # Coordinator implementation
│   │   ├── grpc.rs          # gRPC service (Raft RPCs)
│   │   ├── http.rs          # HTTP API (PUT, GET, DELETE)
│   │   ├── metadata.rs      # RocksDB metadata store
│   │   ├── placement.rs     # HRW placement + sharding
│   │   ├── raft_node.rs     # Raft state machine
│   │   └── server.rs        # Server orchestration
│   ├── volume/              # Volume implementation
│   │   ├── blob.rs          # Blob storage (segmented logs)
│   │   ├── grpc.rs          # gRPC service (2PC endpoints)
│   │   ├── http.rs          # HTTP API (blob access)
│   │   ├── index.rs         # In-memory index + snapshots
│   │   ├── wal.rs           # Write-Ahead Log
│   │   └── server.rs        # Server orchestration
│   └── ops/                 # Operations commands
│       ├── verify. rs        # Cluster integrity check
│       ├── repair. rs        # Repair under-replication
│       └── compact.rs       # Cluster-wide compaction
├── proto/
│   └── kv.proto             # gRPC protocol definitions
├── tests/
│   └── integration. rs       # Integration tests
├── bench/
│   └── scenarios/           # k6 benchmark scenarios
│       ├── write-heavy. js   # 90% writes, 10% reads
│       └── read-heavy.js    # 10% writes, 90% reads
└── scripts/
    ├── serve.sh             # Start local cluster
    ├── benchmark.sh         # Run all benchmarks
    └── verify.sh            # Verify cluster health

🔧 Development

Prerequisites

# Rust 1.81+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Docker (optional)

# k6 (optional) - For benchmarks
brew install k6  # macOS
apt install k6   # Ubuntu

Build & Test

# Clone and build
git clone https://github. com/whispem/minikv
cd minikv
cargo build --release

# Run tests
cargo test
cargo test --test integration

# Run benchmarks
cargo bench

# Code quality
cargo fmt --all
cargo clippy --all-targets -- -D warnings

# Generate documentation
cargo doc --no-deps --open

Running Tests

# Run all tests
cargo test

# Run specific test
cargo test test_wal_basic

# Run tests with output
cargo test -- --nocapture

# Run integration tests
cargo test --test integration

# Run benchmarks
cargo bench

Debugging

Enable trace logging:

RUST_LOG=trace ./target/release/minikv-coord serve --id coord-1

Use tracing with Jaeger:

docker run -d -p16686:16686 -p4317:4317 jaegertracing/all-in-one:latest
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 ./target/release/minikv-coord serve --id coord-1
open http://localhost:16686

Making Changes

Create a feature branch

git checkout -b feature/my-feature

Make your changes
- Follow existing code style (run cargo fmt)
- Add tests for new features
- Update documentation
Test thoroughly

cargo test
cargo clippy --all-targets

Commit with conventional commits

git commit -m "feat: add automatic rebalancing"
git commit -m "fix: correct 2PC abort logic"
git commit -m "docs: update architecture diagram"

Push and create PR

git push origin feature/my-feature

🤝 Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING. md) for guidelines.

Areas That Need Help

High Priority:

Complete Raft multi-node consensus (currently simplified)
Full 2PC streaming implementation (large blob transfers)
Ops commands logic (verify, repair, compact)
More integration tests

Medium Priority:

Performance tuning (zero-copy I/O, io_uring)
Compression support (LZ4/Zstd)
Metrics export (Prometheus)
Admin dashboard

Low Priority:

Range queries
Batch operations
Cross-datacenter replication
S3-compatible API

Code of Conduct

Be respectful, inclusive, and constructive. We're all learning together.

📜 License

MIT License - see LICENSE

🙏 Acknowledgments

Built by @whispem as a learning project.

Inspired by:

TiKV - Production-grade distributed KV store with Raft
etcd - Distributed consensus and configuration
mini-redis - Tokio async patterns

Resources that helped:

The Rust Book - Best programming book ever written
Designing Data-Intensive Applications - Martin Kleppmann
Raft Paper - In Search of an Understandable Consensus Algorithm
Tokio Tutorial - Async Rust
gRPC Rust Tutorial - Tonic documentation

🌟 Star History

If you find this project useful, please consider giving it a star! ⭐

Built with ❤️ in Rust

"From zero to distributed in 31 days"

📞 Contact

GitHub: @whispem
Issues: github.com/whispem/minikv/issues

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github/workflows		.github/workflows
bench		bench
opentelemetry		opentelemetry
proto		proto
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile.coordinator		Dockerfile.coordinator
Dockerfile.volume		Dockerfile.volume
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.rs		build.rs
cobertura.xml		cobertura.xml
docker-compose.yml		docker-compose.yml
fix_ci_complete.sh		fix_ci_complete.sh

Folders and files

Latest commit

History

Repository files navigation

🦀 minikv

🆕 What's New in v0.2.0 (December 10, 2025)

Major Features Completed

Project Status

📖 Table of Contents

✨ What is minikv?

🎯 Core Features

� Project Status & Roadmap

🔄 Evolution from mini-kvstore-v2

⚡ Quick Start

Prerequisites

1. Build from Source

2. Start a Local Cluster

3. Use the CLI

4. Use the HTTP API

🏗️ Architecture

High-Level Overview

Components

Write Path (2PC with Strong Consistency)

Read Path (Optimized for Locality)

Failure Scenarios

📊 Performance

Benchmarks

Run Your Own Benchmarks

🚀 Features

✅ Implemented (v0.2.0)

🔮 Planned (v0.3.0+)

📚 The Story

🌟 From Zero to Distributed in 31 Days

Week 1-2 (Oct 27 - Nov 9): The Rust Book

Week 3-5 (Nov 10 - Nov 25): Built mini-kvstore-v2

Day 31 (Dec 6, 2025): Built minikv in 24 hours

💡 Key Learnings

🦀 Why Rust for Distributed Systems?

📖 Documentation

Architecture Deep Dive

Design Decisions

Code Structure

🔧 Development

Prerequisites

Build & Test

Running Tests

Debugging

Making Changes

🤝 Contributing

Areas That Need Help

Code of Conduct

📜 License

🙏 Acknowledgments

🌟 Star History

📞 Contact

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors 2

Languages

Packages