Skip to content

rivetsql/rivetsql

Rivet Logo

Rivet

Declarative SQL pipelines with multi-engine execution, quality checks, and built-in testing.

PyPI version Python versions License Docs


Rivet is a framework that revolutionizes data pipelines by strictly separating concerns. It allows you to define your pipeline once and run it on DuckDB, Polars, PySpark, Postgres or any other engine without changing your logic.

🧠 The Mental Model

Rivet pipelines are built on three foundational pillars:

Concept Rivet Abstraction Description
What to compute Joints Named, declarative units of computation (SQL, Python, Source, Sink).
How to compute Engines Deterministic compute engines that execute the logic.
Where data lives Catalogs Named references to data locations like filesystems, databases, or object stores.

This architecture lets you build portable pipelines. Adjacent SQL joints assigned to the same engine are automatically fused into a single query to reduce memory pressure and avoid unnecessary data movement.


✨ Key Features

  • 🔄 Multi-Engine Execution: Swap compute engines without rewriting pipelines.
  • 🛠️ Declarative Flexibility: Define joints using SQL, YAML, or Python.
  • 🛡️ Ironclad Data Quality: * Assertions run pre-write on computed data to catch errors before they hit your target.
    • Audits run post-write by reading back from the target catalog to verify state.
  • 🧪 Built-in Offline Testing: Validate your transformation logic using offline fixture data without needing a live database.
  • 💻 Interactive REPL: Use rivet repl for a full-screen terminal UI to explore data, run ad-hoc queries, and iterate on pipeline logic.
  • 🔀 Advanced Write Strategies: Supports 7 write modes including append, replace, merge, and scd2 (Slowly Changing Dimensions).

⚡ Quick Start

1. Install

Install Rivet with all plugins:

pip install 'rivetsql[all]'

Or install only what you need:

pip install 'rivetsql[duckdb]'    # recommended for local dev

2. Initialize a Project

Scaffold a new project with the required directory structure:

rivet init my_pipeline
cd my_pipeline

3. Run the Pipeline

Compile and execute your DAG:

rivet run

💡 Example: A Complete Pipeline

Three files. Source → Transform → Sink. That's it.

1. Read raw data from a catalog:

-- sources/raw_orders.sql
-- rivet:name: raw_orders
-- rivet:type: source
-- rivet:catalog: local
-- rivet:table: raw_orders
select * from raw_orders

2. Transform with plain SQL:

-- joints/daily_revenue.sql
-- rivet:name: daily_revenue
-- rivet:type: sql
SELECT
    order_date,
    SUM(amount) AS revenue
FROM raw_orders
WHERE status = 'completed'
GROUP BY order_date

3. Write results with quality checks:

-- sinks/daily_revenue_out.sql
-- rivet:name: daily_revenue_out
-- rivet:type: sink
-- rivet:upstream: [daily_revenue]
-- rivet:catalog: warehouse
-- rivet:table: daily_revenue
-- rivet:write_strategy: replace
-- rivet:assert: not_null(revenue)
-- rivet:assert: row_count(min=1)
$ rivet run
✓ compiled 3 joints in 38ms
  raw_orders          ✓ OK (1200 rows)
  daily_revenue       ✓ OK (90 rows)
  daily_revenue_out   ✓ OK (90 rows)

  38ms | 3 joints | 1 groups | 0 failures

If an assertion like not_null fails, the write is completely aborted, keeping your target clean.


🧩 Rich Plugin Ecosystem

Rivet is fully extensible through plugins.

Package Engine Type Catalog Type Best For
rivet-duckdb duckdb duckdb Local analytics and fast SQL on files.
rivet-polars polars In-process DataFrame transforms.
rivet-pyspark pyspark Large-scale distributed processing.
rivet-postgres postgres postgres PostgreSQL databases as sources and sinks.
rivet-aws s3, glue AWS S3 object storage and Glue Data Catalog.
rivet-databricks databricks unity, databricks Databricks SQL warehouses and Unity Catalog.

📚 Documentation

Start here:


🤝 Contributing

Pull requests are welcome! Check out our Contribution Guidelines.

git clone https://github.com/rivetsql/rivetsql

Built for data engineers who love SQL, demand quality, and value flexibility.

About

Declarative pipelines with multi-engine execution, quality checks, and built-in testing.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors