Rivet is a framework that revolutionizes data pipelines by strictly separating concerns. It allows you to define your pipeline once and run it on DuckDB, Polars, PySpark, Postgres or any other engine without changing your logic.
Rivet pipelines are built on three foundational pillars:
| Concept | Rivet Abstraction | Description |
|---|---|---|
| What to compute | Joints | Named, declarative units of computation (SQL, Python, Source, Sink). |
| How to compute | Engines | Deterministic compute engines that execute the logic. |
| Where data lives | Catalogs | Named references to data locations like filesystems, databases, or object stores. |
This architecture lets you build portable pipelines. Adjacent SQL joints assigned to the same engine are automatically fused into a single query to reduce memory pressure and avoid unnecessary data movement.
- 🔄 Multi-Engine Execution: Swap compute engines without rewriting pipelines.
- 🛠️ Declarative Flexibility: Define joints using SQL, YAML, or Python.
- 🛡️ Ironclad Data Quality: * Assertions run pre-write on computed data to catch errors before they hit your target.
- Audits run post-write by reading back from the target catalog to verify state.
- 🧪 Built-in Offline Testing: Validate your transformation logic using offline fixture data without needing a live database.
- 💻 Interactive REPL: Use
rivet replfor a full-screen terminal UI to explore data, run ad-hoc queries, and iterate on pipeline logic. - 🔀 Advanced Write Strategies: Supports 7 write modes including
append,replace,merge, andscd2(Slowly Changing Dimensions).
Install Rivet with all plugins:
pip install 'rivetsql[all]'Or install only what you need:
pip install 'rivetsql[duckdb]' # recommended for local devScaffold a new project with the required directory structure:
rivet init my_pipeline
cd my_pipelineCompile and execute your DAG:
rivet runThree files. Source → Transform → Sink. That's it.
1. Read raw data from a catalog:
-- sources/raw_orders.sql
-- rivet:name: raw_orders
-- rivet:type: source
-- rivet:catalog: local
-- rivet:table: raw_orders
select * from raw_orders2. Transform with plain SQL:
-- joints/daily_revenue.sql
-- rivet:name: daily_revenue
-- rivet:type: sql
SELECT
order_date,
SUM(amount) AS revenue
FROM raw_orders
WHERE status = 'completed'
GROUP BY order_date3. Write results with quality checks:
-- sinks/daily_revenue_out.sql
-- rivet:name: daily_revenue_out
-- rivet:type: sink
-- rivet:upstream: [daily_revenue]
-- rivet:catalog: warehouse
-- rivet:table: daily_revenue
-- rivet:write_strategy: replace
-- rivet:assert: not_null(revenue)
-- rivet:assert: row_count(min=1)$ rivet run
✓ compiled 3 joints in 38ms
raw_orders ✓ OK (1200 rows)
daily_revenue ✓ OK (90 rows)
daily_revenue_out ✓ OK (90 rows)
38ms | 3 joints | 1 groups | 0 failuresIf an assertion like not_null fails, the write is completely aborted, keeping your target clean.
Rivet is fully extensible through plugins.
| Package | Engine Type | Catalog Type | Best For |
|---|---|---|---|
rivet-duckdb |
duckdb |
duckdb |
Local analytics and fast SQL on files. |
rivet-polars |
polars |
— | In-process DataFrame transforms. |
rivet-pyspark |
pyspark |
— | Large-scale distributed processing. |
rivet-postgres |
postgres |
postgres |
PostgreSQL databases as sources and sinks. |
rivet-aws |
— | s3, glue |
AWS S3 object storage and Glue Data Catalog. |
rivet-databricks |
databricks |
unity, databricks |
Databricks SQL warehouses and Unity Catalog. |
Start here:
Pull requests are welcome! Check out our Contribution Guidelines.
git clone https://github.com/rivetsql/rivetsqlBuilt for data engineers who love SQL, demand quality, and value flexibility.