Skip to content

avijeetpandey/JQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JQL — LSM-Tree Key-Value Storage Engine

A persistent, high-performance key-value storage engine built in Java 21+ using the Log-Structured Merge-Tree (LSM-Tree) architecture. No external database libraries—pure Java.


Table of Contents


Features

  • Concurrent reads and writes — Lock-free Memtable (CAS), thread-safe WAL and store
  • Durability — Write-ahead log (WAL) with crash recovery
  • Efficient lookups — In-memory Memtable, Bloom filters, and sparse indexes on SSTables
  • Background compaction — Size-tiered compaction on a virtual thread with K-way merge
  • Range scans — Ordered iteration over key ranges
  • Tombstones — Deletes via tombstones with correct recovery semantics

Architecture Overview

                    ┌─────────────────────────────────────────────────┐
                    │                    LSMStore                      │
                    │  (coordinates WAL → Memtable, reads, compaction) │
                    └─────────────────────────────────────────────────┘
                                          │
         ┌────────────────────────────────┼────────────────────────────────┐
         ▼                                ▼                                ▼
   ┌───────────┐                  ┌─────────────┐                  ┌─────────────┐
   │    WAL    │                  │  Memtable   │                  │  SSTables   │
   │ (append-   │                  │ (SkipList, │                  │ (immutable │
   │  only)    │ ── replay ──►     │  in-memory) │ ── flush ──►     │  on disk)   │
   └───────────┘                  └─────────────┘                  └─────────────┘
         │                                │                                │
         │                                │                                │
         ▼                                ▼                                ▼
   current.wal                    Sorted K/V                    sstable_1.sst
   (binary log)                   (~64MB then                   sstable_2.sst
                                   immutable)                    + Bloom + index
  • WAL: Every put/delete is appended before updating the Memtable; on startup the WAL is replayed to reconstruct the Memtable.
  • Memtable: In-memory concurrent SkipList; when it exceeds a size threshold it becomes immutable and is flushed to a new SSTable.
  • SSTables: Immutable sorted files with a sparse index and Bloom filter; reads check Memtable first, then SSTables (newest first).
  • Compaction: A background virtual thread merges the oldest N SSTables into one using a K-way merge to limit the number of on-disk files.

Prerequisites

  • Java 21 or later (for Virtual Threads, Records, and modern APIs)
  • Gradle 8.x+ (or use the included wrapper; no global install required)

Check your Java version:

java -version
# Should show version 21 or higher

Setup Guide

1. Clone or unpack the project

cd /path/to/your/workspace
# If using Git:
# git clone <repository-url> JQL
# cd JQL

2. Ensure Java 21 is available

  • macOS (Homebrew):
    brew install openjdk@21
    Then set JAVA_HOME if needed, e.g.
    export JAVA_HOME=$(/usr/libexec/java_home -v 21)

  • Linux (apt):
    sudo apt install openjdk-21-jdk

  • Windows:
    Install Eclipse Temurin 21 or Oracle JDK 21 and set JAVA_HOME.

3. Generate the Gradle wrapper (if missing)

If the project doesn’t already have gradlew and gradlew.bat:

gradle wrapper --gradle-version 9.3

If you don’t have Gradle installed, download the binary from gradle.org or use SDKMAN: sdk install gradle.

4. Verify setup

./gradlew --version
./gradlew tasks

You should see the Java version (21+) and a list of available tasks.


Building & Testing

Command Description
./gradlew build Compile, run tests, and build (no publishing)
./gradlew compileJava Compile main sources only
./gradlew test Run all tests
./gradlew clean Remove build/
./gradlew clean build Clean and full build

Run tests (recommended after setup):

./gradlew test

Tests include: put/get, overwrites, deletes, range scans, restart recovery, concurrent reads/writes, and flush-to-SSTable behavior.


Usage

Basic API

import core.LSMStore;
import java.nio.file.Path;

Path dataDir = Path.of("/var/data/jql-store");

try (LSMStore store = new LSMStore(dataDir)) {

    // Put
    store.put("user:1000".getBytes(), "Alice".getBytes());

    // Get
    byte[] value = store.get("user:1000".getBytes());
    // value != null => "Alice"

    // Delete (tombstone)
    store.delete("user:1000".getBytes());
    store.get("user:1000".getBytes()); // null

    // Range scan [from, to] (inclusive; null = unbounded)
    var entries = store.scan("user:".getBytes(), "user:~".getBytes());
    for (var e : entries) {
        System.out.println(new String(e.getKey()) + " -> " + new String(e.getValue()));
    }
}
// store.close() called automatically; WAL and SSTables are on disk

Restart and recovery

Data survives process restarts. Open the same directory again to recover from the WAL and existing SSTables:

// First run
try (LSMStore store = new LSMStore(dataDir)) {
    store.put("k".getBytes(), "v".getBytes());
}

// After restart: open same directory
try (LSMStore store = new LSMStore(dataDir)) {
    byte[] v = store.get("k".getBytes());
    // v equals "v"
}

Custom memtable size threshold

Default flush threshold is 64 MiB. You can override it:

long thresholdBytes = 32 * 1024 * 1024; // 32 MiB
try (LSMStore store = new LSMStore(dataDir, thresholdBytes)) {
    // ...
}

Configuration

Component Default / behavior
Memtable threshold 64 MiB (DEFAULT_MEMTABLE_THRESHOLD_BYTES in LSMStore)
WAL file current.wal in the store directory
SSTable naming sstable_<generation>.sst (e.g. sstable_1.sst)
Compaction Runs in a virtual thread; when there are ≥ 4 SSTables, the oldest 4 are merged into one
Sparse index Every 16th key is stored in the in-memory index (SSTable.SPARSE_INDEX_INTERVAL)
Bloom filter 1% target false positive rate for SSTables

To change compaction threshold or index interval, edit the constants in LSMStore and SSTable respectively.


Project Structure

JQL/
├── build.gradle.kts          # Gradle build (Kotlin DSL)
├── settings.gradle.kts
├── README.md                 # This file
├── gradlew / gradlew.bat     # Gradle wrapper
└── src/
    ├── main/java/core/
    │   ├── LSMStore.java     # Main entry: WAL → Memtable, reads, compaction
    │   ├── WAL.java          # Write-ahead log (binary format, replay)
    │   ├── Memtable.java     # In-memory concurrent SkipList
    │   ├── SSTable.java      # Immutable on-disk sorted table + index + Bloom
    │   ├── BloomFilter.java  # BitSet-based probabilistic filter
    │   └── ByteArrayComparator.java
    └── test/java/core/
        └── LSMTest.java      # Unit and integration tests (including restart)

Design Notes

  • SkipList (not Red-Black tree): Lock-free concurrent inserts via VarHandle CAS; no rebalancing; natural ordering for range scans.
  • WAL format: Key Size (4B) | Key | Value Size (4B, -1 = tombstone) | Value for simple parsing and recovery.
  • SSTables are immutable: Simplifies concurrency and compaction; updates and deletes are handled by newer tables and tombstones.
  • Size-tiered compaction: Merges several small SSTables into one larger one to bound read amplification and file count.
  • Virtual thread for compaction: Background work doesn’t block the main request path; Java 21 virtual threads keep overhead low.

License

This project is provided as-is for reference and learning. Use and modify according to your needs.

About

Small DB replication

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors