DuckDB

Announcing DuckDB 1.5.2

2026-04-13T00:00:00+00:00

In this blog post, we highlight a few important fixes in DuckDB v1.5.2, the second patch release in DuckDB's v1.5 line. You can find the complete release notes on GitHub.

To install the new version, please visit the installation page.

Data Lake and Lakehouse Formats

DuckLake

We are proud to release a stable, production-ready lakehouse specification and its reference implementation in DuckDB.

We published a detailed blog post on the DuckLake site but here's a quick summary: DuckLake v1.0 ships dozens of bugfixes and guarantees backward-compatibility. Additionally, it has a number of cool features: data inlining, sorted tables, bucket partitioning, and deletion buffers as Iceberg-compatible Puffin files. More on this in the announcement blog post.

Iceberg

The Iceberg extension ships a number of new features. It now supports the following:

GEOMETRY type
ALTER TABLE statement
Updates and deletes from partitioned tables
Truncate and bucket partitions

Last week, DuckDB Labs engineer Tom Ebergen gave a talk at the Iceberg Summit titled “Building DuckDB-Iceberg: Exploring the Iceberg Ecosystem”, where he shared his experiences with Iceberg.

Preliminary Jepsen Test Results

To make DuckDB as robust as possible, we started a collaboration with Jepsen. The preliminary test suite is available at https://github.com/duckdb/duckdb-jepsen.

The test suite has uncovered a bug that was triggered by INSERT INTO statements that perform conflict resolution on a primary key, and already shipped a fix in this release.

New Online Shell

The online WebAssembly shell at shell.duckdb.org received a complete overhaul. A highlight of the new shell is the ability to store and list files using the .files dot command and its variants.

Using the file storage feature, you can turn your browser session into workbench: you can drag-and-drop files from your local file system to upload them, create new ones using DuckDB's COPY ... TO statement and download the results. For more information on this feature, use the .help command.

The new shell comes with a few built-in datasets: you're welcome to try them out and experiment. Your old links to shell.duckdb.org should still work but if you experience any problems, please submit an issue in the duckdb-web repository.

Benchmarks

We benchmarked DuckDB using the Linux v7 kernel on an r8gd.8xlarge instance with 32 vCPUs, 256 GiB RAM, and an NVMe SSD. We first ran the scale factor 300 test on Ubuntu 24.04 LTS, then upgraded to Ubuntu 26.04 beta. We noticed that the composite TPC-H score shows a ~10% improvement, jumping from 778,041 to 854,676 when measured with TPC-H's QphH@Score metric.

Coming Up

This quarter, we have quite a few exciting events lined up.

DuckCon #7. On June 24, we'll host our next user conference, DuckCon #7, in Amsterdam's beautiful Royal Tropical Institute. If you have been building cool things with DuckDB, consider submitting a talk by April 22. Registrations are also open – and free!

AI Council Talk. On May 12, DuckDB co-creator Hannes Mühleisen will give a talk at AI Council 2026 titled “Super-Secret Next Big Thing for DuckDB”. Well, at this point, we cannot tell you more than he will present the super-secret next big thing for DuckDB. But, if you cannot make it, don't worry: we'll publish the presentation afterwards.

Ubuntu Summit Talk. We already talked about performance on Ubuntu. In late May, Gábor Szárnyas of DuckDB Labs will give a talk titled “DuckDB: Not Quack Science” at the Ubuntu Summit.

Conclusion

This post is a short summary of the changes in v1.5.2. As usual, you can find the full release notes on GitHub.

DuckLake v1.0: The Lakehouse Format Built on SQL Reaches Production-Readiness

2026-04-13T00:00:00+00:00

We released the DuckLake v1.0 standard!

Data Inlining in DuckLake: Unlocking Streaming for Data Lakes

2026-04-02T00:00:00+00:00

DuckLake’s data inlining stores small updates directly in the catalog, eliminating the “small files problem” and making continuous streaming into data lakes practical. Our benchmark shows 926× faster queries and 105× faster ingestion when compared to Iceberg.

DuckDB Now Speaks Dutch!

2026-04-01T00:00:00+00:00

Historically speaking, SQL queries have always been formulated in English. The initial name of the language was even Structured English Query Language (SEQUEL), before it became SQL. Now, what if the Dutch hadn't traded away New Amsterdam (present-day New York)? Would we all have been writing SQL in Dutch instead?

Well, wonder no longer. Today we're releasing EendDB: a DuckDB extension that brings you the Gestructureerde Zoektaal, or GZT for short.

Want joins? We've got SAMENVOEGEN. Aggregates? GROEP PER. Window functions? Those work too — though you'll have to look up the Dutch keywords in the repository yourself.

You can try it out right now in DuckDB v1.5.1:

INSTALL eenddb FROM community;
LOAD eenddb;
CALL enable_dutch_parser();

MAAK TABEL eend (
    id        GEHEEL_GETAL,
    naam      TEKST,
    leeftijd  GEHEEL_GETAL,
    gewicht   KOMMAGETAL,
    soort     TEKST
);

TOEVOEGEN AAN eend WAARDEN
    (1, 'Donald',  29, 1.2, 'Wilde eend'),
    (2, 'Daffy',   35, 1.5, 'Zwarte eend'),
    (3, 'Daisy',   27, 1.1, 'Wilde eend'),
    (4, 'Scrooge', 75, 1.8, 'Wilde eend');

SELECTEER *
VAN eend
WAARBIJ gewicht > 1.2 EN naam ZOALS '%D%'
VOLGORDE PER leeftijd;

┌───────┬─────────┬──────────┬─────────┬─────────────┐
│  id   │  naam   │ leeftijd │ gewicht │    soort    │
│ int32 │ varchar │  int32   │  float  │   varchar   │
├───────┼─────────┼──────────┼─────────┼─────────────┤
│     2 │ Daffy   │       35 │     1.5 │ Zwarte eend │
└───────┴─────────┴──────────┴─────────┴─────────────┘

Of course, no query language is complete without joins and aggregates. Let's create a second table and count the ducks per soort:

MAAK TABEL soorten (soort TEKST, leefgebied TEKST);

TOEVOEGEN AAN soorten WAARDEN
    ('Wilde eend',  'Meren en rivieren'),
    ('Zwarte eend', 'Kustgebieden');

SELECTEER s.leefgebied, count(*) ALS aantal_eenden
VAN eend ALS e
LINKS SAMENVOEGEN soorten ALS s OP e.soort = s.soort
GROEP PER s.leefgebied
VOLGORDE PER aantal_eenden AFLOPEND;

┌───────────────────┬───────────────┐
│    leefgebied     │ aantal_eenden │
│      varchar      │     int64     │
├───────────────────┼───────────────┤
│ Meren en rivieren │             3 │
│ Kustgebieden      │             1 │
└───────────────────┴───────────────┘

After we are done playing around, we obviously have to clean up after ourselves. Rather than DROP a table, in Dutch we like to throw it away (“weggooien”):

GOOI_WEG TABEL eend;
GOOI_WEG TABEL soorten;

Under the hood, the parser is using DuckDB's new experimental parser, based on Parsing Expression Grammar.

For more examples, check out the repository on GitHub.

Announcing DuckDB 1.5.1

2026-03-23T00:00:00+00:00

In this blog post, we highlight a few important fixes in DuckDB v1.5.1, the first patch release in DuckDB's v1.5 line. You can find the complete release notes on GitHub.

To install the new version, please visit the installation page.

Data Lake and Lakehouse Formats

Lance Support

Thanks to the collaboration with LanceDB, DuckDB now supports reading and writing the Lance lakehouse format through the lance core extension.

INSTALL lance;
LOAD lance;

You can write to Lance as follows:

COPY (
    SELECT 1::BIGINT AS id, 'a'::VARCHAR AS s
    UNION ALL
    SELECT 2::BIGINT AS id, 'b'::VARCHAR AS s
) TO 'example.lance' (FORMAT lance);

And read it like so:

SELECT count(*) FROM 'example.lance';

┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│            2 │
└──────────────┘

Lance support is also available for DuckDB v1.4.4 LTS and v1.5.0.

Iceberg Support

We extended support for Iceberg v3 tables, including:

the VARIANT and TIMESTAMP_NS types
default values
deletion vectors (delete and update v3 tables)
inserting into a partitioned table
creating a partitioned table
Parquet Copy options support

Configuration Options

The httpfs extension has a new setting:

SET force_download_threshold = 2_000_000;

This will force full file download on any file < 2 MB. The default value is 0, but we may revisit the setting default in the next release.

Fixes

Globbing Performance

There have been reports by users (thanks!) that S3 globbing performance degraded in certain cases – this has now been addressed.

Non-Interactive Shell

On Linux and macOS, DuckDB's new CLI had an issue executing the input received through a non-interactive shell. In practice, this meant that scripts piped into DuckDB were not executed. For v1.5.0, there was a simple workaround available. We fixed the issue in v1.5.1, so there is no need for a workaround.

Indexes

This release ships two fixes for ART indexes. If you are using indexes in your workload (directly or through key / unique constraints), we recommend updating to v1.5.1 as soon as possible.

Landing Page Improvements

We are shipping a new section of the landing page that showcases all the technologies DuckDB can run on… or in! Check it out!

Conclusion

This post is a short summary of the changes in v1.5.1. As usual, you can find the full release notes on GitHub.

DuckDB.ExtensionKit: Building DuckDB Extensions in C#

2026-03-20T00:00:00+00:00

Introduction

DuckDB has a flexible extension mechanism that allows extensions to be loaded dynamically at runtime. This makes it easy to extend DuckDB’s main feature set without adding everything to the main binary. Extensions can add support for new file formats, introduce custom types, or provide new scalar and table functions. A significant part of DuckDB’s functionality is actually implemented using this extension mechanism in the form of core extensions, which are developed alongside the engine itself by the DuckDB team. For example, DuckDB can read and write JSON files via the json extension and integrate with PostgreSQL using the postgres extension.

DuckDB also has a thriving ecosystem of community extensions, i.e., third-party extensions, maintained by community members, covering a wide range of use cases and integrations. For example, you can expose additional cryptographic functionality through the crypto community extension.

How Extensions Are Built Today

Today, developers can use the same C++ API that the core extensions use for developing extensions. A template for creating extensions is available in the extension-template repository. While powerful, the C++ extension API is tightly coupled to DuckDB’s internal APIs, so it can (and often will) change between DuckDB versions. Additionally, using it requires building the whole DuckDB engine and its documentation is not as complete as that of the C API.

To solve these issues, DuckDB also provides an experimental template for C/C++ based extensions that link with the C Extension API of DuckDB. This API provides a stable, backwards-compatible interface for developing extensions and is designed to allow extensions to work across different DuckDB versions. Because it is a C-based API, it can also be used from other programming languages such as Rust.

Even with the C API, writing extensions still means working at a low level, performing manual memory management, and writing a lot of boilerplate code. While the C API solves stability and compatibility, it doesn’t solve developer experience for higher-level ecosystems. This is where DuckDB.ExtensionKit comes in, aiming to make extension development more accessible to developers working in the .NET ecosystem. By building on top of the DuckDB C Extension API and compiling extensions using the .NET Native AOT (Ahead-of-Time) compilation, DuckDB.ExtensionKit offers the best of both worlds: native DuckDB extensions that integrate like any other extension, combined with the productivity and rich library ecosystem of C# and .NET.

DuckDB.ExtensionKit

DuckDB.ExtensionKit provides a set of C# APIs and build tooling for implementing DuckDB extensions. It exposes the low-level DuckDB C Extension API as C# methods, and also provides type-safe, higher-level APIs for defining scalar and table functions, while still producing native DuckDB extensions. The toolkit also includes a source generator that automatically generates the required boilerplate code, including the native entry point and API initialization.

With DuckDB.ExtensionKit, building an extension closely resembles building a regular C# library. Extension authors create a C# project that references the ExtensionKit runtime and implements functions using the provided, type-safe APIs that expose DuckDB concepts.

At build time, the source generator emits the required boilerplate, including the native entry point and extension initialization. The project is then compiled using .NET Native AOT, producing a native DuckDB extension binary that can be loaded and used by DuckDB like any other extension, without requiring a .NET runtime.

To show a concrete example for this process, the following snippet shows a small DuckDB extension implemented using DuckDB.ExtensionKit that exposes both a scalar function and a table function for working with JWTs (JSON Web Token). At a high level, writing an extension with DuckDB.ExtensionKit involves defining a C# type that represents the extension and registering functions explicitly. In the example below, this is done by creating a partial class annotated with the [DuckDBExtension] attribute and implementing the RegisterFunctions method. The implementation makes use of the System.IdentityModel.Tokens.Jwt NuGet package, illustrating how extensions can easily take advantage of existing .NET libraries.

We'll add two functions, a scalar function for extracting a single claim from a JWT and a table function for extracting multiple claims.

public static partial class JwtExtension
{
  private static void RegisterFunctions(DuckDBConnection connection)
  {
    connection.RegisterScalarFunction<string, string, string?>("extract_claim_from_jwt", ExtractClaimFromJwt);

    connection.RegisterTableFunction("extract_claims_from_jwt", (string jwt) => ExtractClaimsFromJwt(jwt),
                                     c => new { claim_name = c.Key, claim_value = c.Value });
  }

  private static string? ExtractClaimFromJwt(string jwt, string claim)
  {
    var jwtHandler = new JwtSecurityTokenHandler();
    var token = jwtHandler.ReadJwtToken(jwt);
    return token.Claims.FirstOrDefault(c => c.Type == claim)?.Value;
  }

  private static Dictionary<string, string> ExtractClaimsFromJwt(string jwt)
  {
    var jwtHandler = new JwtSecurityTokenHandler();
    var token = jwtHandler.ReadJwtToken(jwt);
    return token.Claims.ToDictionary(c => c.Type, c => c.Value);
  }
}

In just 25 lines, we have built an extension that adds extract_claim_from_jwt and extract_claims_from_jwt functions to DuckDB. We can call these functions just like any other function. For example, to extract the name field from a claim, we can run:

SELECT extract_claim_from_jwt(
    'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImExZmIyY2NjN2FiMjBiMDYyNzJmNGUxMjIwZDEwZmZlIn0.eyJpc3MiOiJodHRwczovL2lkcC5sb2NhbCIsImF1ZCI6Im15X2NsaWVudF9hcHAiLCJuYW1lIjoiR2lvcmdpIERhbGFraXNodmlsaSIsInN1YiI6IjViZTg2MzU5MDczYzQzNGJhZDJkYTM5MzIyMjJkYWJlIiwiYWRtaW4iOnRydWUsImV4cCI6MTc2NjU5MTI2NywiaWF0IjoxNzY2NTkwOTY3fQ.N7h2xc4rgS4oPo8IO9wyG1lnr2wqTUC80YudWTXp7rXmU2JdsUiweKmuYVVbygdJAR4PJmbQtak4_VuZg2fZFILVpzDyLvGITfUW_18XuDQ_SIm3VlfAuHOVHfruuvvSAfjUkTW2Jlrv3ihFYgusV58vjhcVFHssOGMEbtMNo10Jf62dczVVGNZXh_OOLS0nTLffhY94sZddqQIE56W8xhLK5YMO4gO8voMzhUwDwucnVvyNfui38MPDNdTSKjn3Ab0hG8jzOVhbYSCHf0eQsbxPzGtXUCJobScWDb78IphFWec6W4ugIYp5CMh3C_noQi94NYjQg2P-AJ5FLCKzKA',
    'name'
);

This returns Giorgi Dalakishvili. Let's test the table function:

SELECT *
FROM extract_claims_from_jwt(
    'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImExZmIyY2NjN2FiMjBiMDYyNzJmNGUxMjIwZDEwZmZlIn0.eyJpc3MiOiJodHRwczovL2lkcC5sb2NhbCIsImF1ZCI6Im15X2NsaWVudF9hcHAiLCJuYW1lIjoiR2lvcmdpIERhbGFraXNodmlsaSIsInN1YiI6IjViZTg2MzU5MDczYzQzNGJhZDJkYTM5MzIyMjJkYWJlIiwiYWRtaW4iOnRydWUsImV4cCI6MTc2NjU5MTI2NywiaWF0IjoxNzY2NTkwOTY3fQ.N7h2xc4rgS4oPo8IO9wyG1lnr2wqTUC80YudWTXp7rXmU2JdsUiweKmuYVVbygdJAR4PJmbQtak4_VuZg2fZFILVpzDyLvGITfUW_18XuDQ_SIm3VlfAuHOVHfruuvvSAfjUkTW2Jlrv3ihFYgusV58vjhcVFHssOGMEbtMNo10Jf62dczVVGNZXh_OOLS0nTLffhY94sZddqQIE56W8xhLK5YMO4gO8voMzhUwDwucnVvyNfui38MPDNdTSKjn3Ab0hG8jzOVhbYSCHf0eQsbxPzGtXUCJobScWDb78IphFWec6W4ugIYp5CMh3C_noQi94NYjQg2P-AJ5FLCKzKA'
);

This returns:

claim_name	claim_value
iss	https://idp.local
aud	my_client_app
name	Giorgi Dalakishvili
sub	5be86359073c434bad2da3932222dabe
admin	true
exp	1766591267
iat	1766590967

How DuckDB.ExtensionKit Works

DuckDB.ExtensionKit relies on several modern C# language and runtime features to efficiently bridge DuckDB’s C extension API to managed code. These features make it possible to build native extensions in C# without introducing a managed runtime dependency at load time.

Function Pointers

DuckDB’s C extension API is exposed as a versioned function table: a large struct (duckdb_ext_api_v1) whose fields are C function pointers (e.g., duckdb_open, duckdb_register_scalar_function, duckdb_vector_get_data, and so on). DuckDB.ExtensionKit mirrors this mechanism in C#. It defines a C# representation of the struct (DuckDBExtApiV1), where each field is declared as a C# function pointer (delegate* unmanaged[Cdecl]<...>). This maps the C ABI directly: calling into DuckDB becomes a simple indirect call through a function pointer field, rather than a delegate invocation with runtime marshaling.

Entrypoint

A DuckDB extension needs to expose an entrypoint function following the C calling convention (the entrypoint that should be exported from the binary is the name of the extension plus _init_c_api). This way, DuckDB can locate it when the extension is loaded. In the C extension template, this is handled with macros that generate the exported function and the surrounding boilerplate.

DuckDB.ExtensionKit follows the same model, but generates the boilerplate from C# instead of C macros. The source generator emits a native-compatible entrypoint that retrieves the API table (via the access object) and performs the required initialization, just like the C template does. The generated method is annotated with [UnmanagedCallersOnly(EntryPoint = "...")], which instructs the .NET toolchain to export a real native symbol with that name and make it callable from C. With .NET Native AOT, this becomes an actual exported function in the produced binary – allowing DuckDB to load and call into the extension exactly as it would for a C implementation.

Native AOT

Finally, Native AOT is what makes this approach practical for DuckDB extensions. Once the extension code and generated sources are compiled, the project is published using .NET Native AOT. This step produces a native binary with no dependency on a managed runtime at load time. The resulting artifact is a native DuckDB extension that can be loaded and executed in the same way as extensions written in C or C++. From DuckDB’s perspective, there is no difference between an extension built with DuckDB.ExtensionKit and one implemented in a traditional native language.

Current Status and Limitations

DuckDB.ExtensionKit, just like the C extension template, is currently experimental. The APIs are still evolving, and not all extension features supported by DuckDB are exposed yet.

The toolkit relies on .NET Native AOT, which means extensions need to be built for specific target platforms (for example, linux-x64, osx-arm64, or win-x64). As with other native extensions, binaries are platform-specific and need to be built accordingly.

Build Your Own Extension in C#

DuckDB.ExtensionKit is available as an open-source project on GitHub under the MIT license. The project includes example extensions that demonstrate how to define and build DuckDB extensions in C#. The repository contains a JWT-based example extension that showcases both scalar functions and table functions, as well as the full build and publishing workflow using .NET Native AOT.

Feedback, bug reports, and contributions are welcome through GitHub issues.

Closing Thoughts

DuckDB’s extension mechanism has proven to be a flexible foundation for extending the system without complicating the core engine. DuckDB.ExtensionKit explores how this mechanism can be made accessible to a broader audience by leveraging the .NET ecosystem, while still producing native extensions that integrate directly with DuckDB.

Although C# is typically viewed as a high-level language, this project demonstrates that it can also be used to implement low-level, ABI-compatible components when needed. By combining modern C# features with DuckDB’s existing extension interface, it is possible to write extensions in a high-level language without giving up control over native boundaries.

Big Data on the Cheapest MacBook

2026-03-11T00:00:00+00:00

Apple released the MacBook Neo today and there is no shortage of tech reviews explaining whether it's the right device for you if you are a student, a photographer or a writer. What they don't tell you is whether it fits into our Big Data on Your Laptop ethos. We wanted to answer this using a data-driven approach, so we went to the nearest Apple Store, picked one up and took it for a spin.

What's in the Box?

Well, not much! If you buy this machine in the EU, there isn't even a charging brick included. All you get is the laptop and a braided USB-C cable. But you likely already have a few USB-C bricks lying around – let's move on to the laptop itself!

The only part of the hardware specification that you can select is the disk: you can pick either 256 or 512 GB. As our mission is to deal with alleged “Big Data”, we picked the larger option, which brings the price to $700 in the US or €800 in the EU. The amount of memory is fixed to 8 GB. And while there is only a single CPU option, it is quite an interesting one: this laptop is powered by the 6-core Apple A18 Pro, originally built for the iPhone 16 Pro.

It turns out that we have already tested this phone under some unusual circumstances. Back in 2024, with DuckDB v1.2-dev, we found that the iPhone 16 Pro could complete all TPC-H queries at scale factor 100 in about 10 minutes when air-cooled and in less than 8 minutes while lying in a box of dry ice. The MacBook Neo should definitely be able to handle this workload – but maybe it can even handle a bit more. Cue the inevitable benchmarks!

ClickBench

For our first experiment, we used ClickBench, an analytical database benchmark. ClickBench has 43 queries that focus on aggregation and filtering operations. The operations run on a single wide table with 100M rows, which uses about 14 GB when serialized to Parquet and 75 GB when stored in CSV format.

Benchmark Environment

We ported ClickBench's DuckDB implementation to macOS and ran it on the MacBook Neo using the freshly minted v1.5.0 release. We only applied a small tweak: as suggested in our performance guide, we slightly lowered the memory limit to 5 GB, to reduce relying on the OS' swapping and to let DuckDB handle memory management for larger-than-memory workloads. This is a common trick in memory-constrained environments where other processes are likely using more than 20% of the total system memory.

We also re-ran ClickBench with DuckDB v1.5.0 on two cloud instances, yielding the following lineup:

The star of our show, the MacBook Neo with 2 performance cores, 4 efficiency cores and 8 GB RAM
c6a.4xlarge with 16 AMD EPYC vCPU cores and 32 GB RAM. This instance is popular in ClickBench with about 80 individual results reported.
c8g.metal-48xl with a whopping 192 Graviton4 vCPU cores and 384 GB RAM. This instance is often at the top of the overall ClickBench leaderboard.

The benchmark script first loaded the Parquet file into the database. Then, as per ClickBench's rules, it ran each query three times to capture both cold runs (the first run when caches are cold) and hot runs (when the system has a chance to exploit e.g. file system caching).

Results and Analysis

Our experiments produced the following aggregate runtimes, in seconds:

Machine	Cold run (median)	Cold run (total)	Hot run (median)	Hot run (total)
MacBook Neo	0.57	59.73	0.41	54.27
c6a.4xlarge	1.34	145.08	0.50	47.86
c8g.metal-48xl	1.54	169.67	0.05	4.35

Cold run. The results start with a big surprise: in the cold run, the MacBook Neo is the clear winner with a sub-second median runtime, completing all queries in under a minute! Of course, if we dig deeper into the setups, there is an explanation for this. The cloud instances have network-attached disks, and accessing the database on these dominates the overall query runtimes. The MacBook Neo has a local NVMe SSD, which is far from best-in-class, but still provides relatively quick access on the first read.

Hot run. In the hot runs, the MacBook's total runtime only improves by approximately 10%, while the cloud machines come into their own, with the c8g.metal-48xl winning by an order of magnitude. However, it's worth noting that on median query runtimes the MacBook Neo can still beat the c6a.4xlarge, a mid-sized cloud instance. And the laptop's total runtime is only about 13% slower despite the cloud box having 10 more CPU threads and 4 times as much RAM.

TPC-DS

For our second experiment, we picked the queries of the TPC-DS benchmark. Compared to the ubiquitous TPC-H benchmark, which has 8 tables and 22 queries, TPC-DS has 24 tables and 99 queries, many of which are more complex and include features such as window functions. And while TPC-H has been optimized to death, there is still some semblance of value in TPC-DS results. Let's see whether the cheapest MacBook can handle these queries!

For this round, we used DuckDB's LTS version, v1.4.4. We generated the datasets using DuckDB's tpcds extension and set the memory limit to 6 GB.

At SF100, the laptop breezed through most queries with a median query runtime of 1.63 seconds and a total runtime of 15.5 minutes.

At SF300, the memory constraint started to show. While the median query runtime was still quite good at 6.90 seconds, DuckDB occasionally used up to 80 GB of space for spilling to disk and it was clear that some queries were going to take a long time. Most notably, query 67 took 51 minutes to complete. But hardware and software continued to work together tirelessly, and they ultimately passed the test, completing all queries in 79 minutes.

Should You Buy One?

Here's the thing: if you are running Big Data workloads on your laptop every day, you probably shouldn't get the MacBook Neo. Yes, DuckDB runs on it, and can handle a lot of data by leveraging out-of-core processing. But the MacBook Neo's disk I/O is lackluster compared to the Air and Pro models (about 1.5 GB/s compared to 3–6 GB/s), and the 8 GB memory will be limiting in the long run. If you need to process Big Data on the move and can pay up a bit, the other MacBook models will serve your needs better and there are also good options for Linux and Windows.

All that said, if you run DuckDB in the cloud and primarily use your laptop as a client, this is a great device. And you can rest assured that if you occasionally need to crunch some data locally, DuckDB on the MacBook Neo will be up to the challenge.

Announcing DuckDB 1.5.0

2026-03-09T00:00:00+00:00

We are proud to release DuckDB v1.5.0, codenamed “Variegata” after the Paradise shelduck (Tadorna variegata) endemic to New Zealand.

In this blog post, we cover the most important updates for this release around support, features and extensions. As always, there is more: for the complete release notes, see the release page on GitHub.

To install the new version, please visit the installation page. Note that it can take a few days to release some extensions (e.g., the UI) client libraries (e.g., Go, R, Java) due to the extra changes and review rounds required.

With this release, we will have two DuckDB releases available: v1.4 (LTS) and v1.5 (current). The next release – planned for September – will ship a major version, DuckDB v2.0.

New Features

Command Line Client

For users who use DuckDB through the terminal, the highlight of the new release is a rework of the CLI client with a new color scheme, dynamic prompts, a pager and many other convenience features.

Color Scheme

We shipped a new color palette and harmonized it with the documentation. The color palette is available in both dark mode and light mode. Both use two shades of gray, and five colors for keywords, strings, errors, functions and numbers. You can find the color palette in the Design Manual.

You can customize the color scheme using the .highlight_colors dot command:

.highlight_colors column_name darkgreen bold_underline
.highlight_colors numeric_value red bold
.highlight_colors string_value purple2
FROM ducks;

Dynamic Prompts in the CLI

DuckDB v1.5.0 introduces dynamic prompts for the CLI (PR #19579). By default, these show the database and schema that you are currently connected to:

duckdb

memory D ATTACH 'my_database.duckdb';
memory D USE my_database;
my_database D CREATE SCHEMA my_schema;
my_database D USE my_schema;
my_database.my_schema D ...

These prompts can be configured using bracket codes to have a maximum length, run a custom query, use different colors, etc. (#19579).

`.tables` and `DESCRIBE`

To show the columns of an individual table, use the DESCRIBE statement:

memory D ATTACH 'https://blobs.duckdb.org/data/animals.db' AS animals_db;
memory D USE animals_db;
animals_db D DESCRIBE ducks;

┌──────────────────────┐
│        ducks         │
│                      │
│ id           integer │
│ name         varchar │
│ extinct_year integer │
└──────────────────────┘

The .tables dot command lists the attached catalogs, the schemas and tables in them, and the columns in each table.

memory D ATTACH 'https://blobs.duckdb.org/data/animals.db' AS animals_db;
memory D ATTACH 'https://blobs.duckdb.org/data/numbers1.db';
memory D .tables

 ────────────── animals_db ───────────────
 ───────────────── main ──────────────────
┌─────────────────┐┌──────────────────────┐
│      swans      ││        ducks         │
│                 ││                      │
│ id      integer ││ id           integer │
│ name    varchar ││ name         varchar │
│ species varchar ││ extinct_year integer │
│ color   varchar ││                      │
│ habitat varchar ││        5 rows        │
│                 │└──────────────────────┘
│     3 rows      │
└─────────────────┘
  numbers1
 ── main ──
┌──────────┐
│   tbl    │
│          │
│ i bigint │
│          │
│  2 rows  │
└──────────┘

Accessing the Last Result Using `_`

You can access the last result of a query inline using the underscore character _. This is not only convenient but also makes it unnecessary to re-run potentially long-running queries:

memory D ATTACH 'https://blobs.duckdb.org/data/animals.db' AS animals_db;
memory D USE animals_db;
animals_db D FROM ducks WHERE extinct_year IS NOT NULL;
┌───────┬──────────────────┬──────────────┐
│  id   │       name       │ extinct_year │
│ int32 │     varchar      │    int32     │
├───────┼──────────────────┼──────────────┤
│     1 │ Labrador Duck    │         1878 │
│     3 │ Crested Shelduck │         1964 │
│     5 │ Pink-headed Duck │         1949 │
└───────┴──────────────────┴──────────────┘
animals_db D FROM _;
┌───────┬──────────────────┬──────────────┐
│  id   │       name       │ extinct_year │
│ int32 │     varchar      │    int32     │
├───────┼──────────────────┼──────────────┤
│     1 │ Labrador Duck    │         1878 │
│     3 │ Crested Shelduck │         1964 │
│     5 │ Pink-headed Duck │         1949 │
└───────┴──────────────────┴──────────────┘

Last but not least, the CLI now has a pager! It is triggered when there are more than 50 rows in the results.

memory D .maxrows 100
memory D FROM range(0, 100);

You can navigate on Linux and Windows using Page Up / Page Down. On macOS, use Fn + Up / Down. To exit the pager, press Q.

The initial implementation of the pager was provided by tobwen in #19004.

PEG Parser

DuckDB v1.5 ships an experimental parser based on PEG (Parser Expression Grammars). The new parser enables better suggestions, improved error messages, and allows extensions to extend the grammar. The PEG parser is currently disabled by default but you can opt-in using:

CALL enable_peg_parser();

The PEG parser is already used for generating suggestions. You can cycle through the options using TAB.

animals_db D FROM ducks WHERE habitat IS 
IS           ISNULL       ILIKE        IN           INTERSECT    LIKE

We are planning to make the switch to the new parser in the upcoming DuckDB release.

As a tradeoff, the parser has a slight performance overhead, however, this is in the range of milliseconds and is thus negligible for analytical queries. For more details on the rationale for using a PEG parser and benchmark results, please refer to the CIDR 2026 paper by Hannes and Mark, or their blog post summarizing the paper.

`VARIANT` Type

DuckDB now natively supports the VARIANT type, inspired by Snowflake's semi-structured VARIANT data type and available in Parquet since 2025. Unlike the JSON type, which is physically stored as text, VARIANT stores typed, binary data. Each row in a VARIANT column is self-contained with its own type information. This leads to better compression and query performance. Here are a few examples of using VARIANT.

Store different types in the same column:

CREATE TABLE events (id INTEGER, data VARIANT);
INSERT INTO events VALUES
    (1, 42::VARIANT),
    (2, 'hello world'::VARIANT),
    (3, [1, 2, 3]::VARIANT),
    (4, {'name': 'Alice', 'age': 30}::VARIANT);

SELECT * FROM events;

┌───────┬────────────────────────────┐
│  id   │            data            │
│ int32 │          variant           │
├───────┼────────────────────────────┤
│     1 │ 42                         │
│     2 │ hello world                │
│     3 │ [1, 2, 3]                  │
│     4 │ {'name': Alice, 'age': 30} │
└───────┴────────────────────────────┘

Check the underlying type of each row:

SELECT id, data, variant_typeof(data) AS vtype
FROM events;

┌───────┬────────────────────────────┬───────────────────┐
│  id   │            data            │       vtype       │
│ int32 │          variant           │      varchar      │
├───────┼────────────────────────────┼───────────────────┤
│     1 │ 42                         │ INT32             │
│     2 │ hello world                │ VARCHAR           │
│     3 │ [1, 2, 3]                  │ ARRAY(3)          │
│     4 │ {'name': Alice, 'age': 30} │ OBJECT(name, age) │
└───────┴────────────────────────────┴───────────────────┘

You can extract fields from nested variants using the dot notation or the variant_extract function:

SELECT data.name FROM events WHERE id = 4;
-- or 
SELECT variant_extract(data, 'name') AS name FROM events WHERE id = 4;

┌─────────┐
│  name   │
│ variant │
├─────────┤
│ Alice   │
└─────────┘

DuckDB also supports reading VARIANT types from Parquet files, including shredding (storing nested data as flat values).

`read_duckdb` Function

The read_duckdb table function can read DuckDB databases without first attaching them. This can make reading from DuckDB databases more ergonomic – for example, you can use globbing. You can read the example numbers databases as follows:

SELECT min(i), max(i)
FROM read_duckdb('numbers*.db');

┌────────┬────────┐
│ min(i) │ max(i) │
│ int64  │ int64  │
├────────┼────────┤
│      1 │      5 │
└────────┴────────┘

Azure Writes

You can now write to the Azure Blob or ADLSv2 storage using the COPY statement:

-- Write query results to a Parquet file on Blob Storage
COPY (SELECT * FROM my_table)
TO 'az://my_container/path/output.parquet';

-- Write a table to a CSV file on ADLSv2 Storage
COPY my_table
TO 'abfss://my_container/path/output.csv';

ODBC Scanner

We are now shipping an ODBC scanner extension. This allows you to query a remote endpoint as follows:

LOAD odbc_scanner;
SET VARIABLE conn = odbc_connect('Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE;UID=scott;PWD=tiger;');
SELECT * FROM odbc_query(getvariable('conn'), 'SELECT SYSTIMESTAMP FROM dual;');

In the coming weeks, we'll publish the documentation page and release a followup post on the ODBC scanner. In the meantime, please refer to the project's README.

Major Changes

Breaking Change for Datetime Function

The date_trunc function, when applied to a DATE, now returns a TIMESTAMP instead of a date.

-- v1.4.4:
SELECT typeof(date_trunc('month', DATE('2026-03-27')));
-- returns DATE

-- v1.5.x:
SELECT typeof(date_trunc('month', DATE('2026-03-27')));
-- returns TIMESTAMP

Lakehouse Updates

All of DuckDB’s supported Lakehouse formats have received some updates in DuckDB v1.5.

DuckLake

The main DuckLake change for DuckDB v1.5 is updating the DuckLake specification to v0.4. We are aiming for this to be the same specification that ships with DuckLake v1.0, which will be released in April. Its main highlights include:

Macro support.
Sorted tables.
Deletion inlining and addition of partial delete files.
Internal rework of DuckLake options.

We'll announce more details about these features in the blog post for DuckLake v1.0.

Delta Lake

For the Delta Lake extension, the team has focused on improving support for writes via Unity Catalog, Delta idempotent writes and table CHECKPOINTs.

Iceberg

For the Iceberg extension, the team is working on a larger release for v1.5.1. For v1.5.0, the main feature is the addition of table properties in the CREATE TABLE statement:

CREATE TABLE test_create_table (a INTEGER)
WITH (
    'format-version' = '2', -- format version will be elevated to format-version when creating a table
    'location' = 's3://path/to/data', -- location will be elevated to location when creating a table
    'property1' = 'value1',
    'property2' = 'value2'
);

Other minor additions have been made to enable passing EXTRA_HTTP_HEADERS when attaching to an Iceberg catalog, which has unlocked Google’s BigLake.

Both Delta and DuckLake have implemented the VARIANT type. Iceberg’s VARIANT type will ship in the v1.5.1 release with some other features that are specific to the Iceberg v3 specification.

Network Stack

The default backend for the httpfs extension has changed from httplib to curl. As one of the most popular and well-tested open-source projects, we expect curl to provide long-standing stability and security for DuckDB. Regardless of the http library used, openssl is still the backing SSL library and options such as http_timeout, http_retries, etc. are still the same.

Our community has been testing the new network stack for the last few weeks. Still, if you encounter any issues, please submit them to the duckdb-httpfs repository.

If you are interested in more details, click here.

Announcing DuckDB 1.4.4 LTS

2026-01-26T00:00:00+00:00

In this blog post, we highlight a few important fixes in DuckDB v1.4.4, the fourth patch release in DuckDB's 1.4 LTS line. The release ships bugfixes, performance improvements and security patches. You can find the complete release notes on GitHub.

To install the new version, please visit the installation page.

Fixes

This version ships a number of performance improvements and bugfixes.

#20233 Function chaining not allowed in QUALIFY #20233

Correctness

Crashes and Internal Errors

Performance

Miscellaneous

Conclusion

This post was a short summary of the changes in v1.4.4. As usual, you can find the full release notes on GitHub. We would like to thank our contributors for providing detailed issue reports and patches. In the coming month, we'll release DuckDB v1.5.0. We'll also keep v1.4 LTS updated until mid-September. We'll announce the release date of v1.4.5 in the release calendar in the coming months.

Earlier today, we pushed an incorrect tag that was visible for a few minutes. No binaries or extensions were available under this tag and we replaced it as soon as we noticed the issue. Our apologies for the erroneous release.

Announcing Vortex Support in DuckDB

2026-01-23T00:00:00+00:00

I think it is worth starting this intro by talking a little bit about the established format for columnar data. Parquet has done some amazing things for analytics. If you go back to the times where CSV was the better alternative, then you know how important Parquet is. However, even if the specification has evolved over time, Parquet has some design constraints. A particular limitation is that it is block-compressed and engines need to decompress pages in order to do further operations like filtering, decoding values, etc. For a while, researchers and private companies have been working on alternatives to Parquet that could improve on some of Parquet’s shortcomings. Vortex, from the SpiralDB team, is one of them.

What is Vortex?

Vortex is an extensible, open source format for columnar data. It was created to handle heterogeneous compute patterns and different data modalities. But, what does this mean?

The project was donated to the Linux Foundation by the SpiralDB team in August 2025.

Vortex provides different layouts and encodings for different data types. Some of the most notable are ALP for floating point encoding or FSST for string encoding. This lightweight compression strategy keeps data sizes down while allowing one of Vortex’s most important features: compute functions. By knowing the encoded layout of the data, Vortex is able to run arbitrary expressions on compressed data. This allows a Vortex reader to execute, for example, filter expressions within storage segments without decompressing data.

We mentioned heterogeneous compute to emphasize that Vortex was designed with the idea of having optimized layouts for different data types, including vectors, large text or even image or audio, but also to maximize CPU or GPU saturation. The idea is that decompression is deferred all the way to the GPU or CPU, enabling what Vortex calls “late materialization”. The FastLanes encoding, a project originating at CWI (like DuckDB), is one of the main drivers behind this feature.

Vortex also supports dynamically loaded libraries (similar to DuckDB extensions) to provide new encodings for specific types as well as specific compute functions, e.g. for geospatial data. Another very interesting feature is encoding WebAssembly into the file, which can allow the reader to benefit from specific compute kernels applied to the file.

Besides DuckDB, other engines such as DataFusion, Spark and Arrow already offer integration with Vortex.

For more information, check out the Vortex documentation.

The DuckDB Vortex Extension

DuckDB is a database as the name says, yes, but it is also widely used as an engine to query many different data sources. Through core or community extensions, DuckDB can integrate with:

Databases like Snowflake, BigQuery or PostgreSQL.
Lakehouse formats like Delta, Iceberg or DuckLake.
File formats, most notably JSON, CSV, Parquet and most recently Vortex.

The community has gotten very creative, though, so these days you can even read YAML and Markdown with DuckDB using community extensions.

All this is possible due to the DuckDB extension system, which makes it relatively easy to implement logic to interact with different file formats or external systems.

The SpiralDB team built a DuckDB extension. Together with the DuckDB Labs team, we have made the extension available as a core DuckDB extension, so that the community can enjoy Vortex as a first-class citizen in DuckDB.

Example Usage

Installing and using the Vortex extension is very simple:

INSTALL vortex;
LOAD vortex;

Then, you can easily use it to read and write, similar to other extensions such as Parquet.

SELECT * FROM read_vortex('my.vortex');

COPY (SELECT * FROM generate_series(0, 3) t(i))
TO 'my.vortex' (FORMAT vortex);

Why Vortex and DuckDB?

Vortex claims to do well primarily at three use cases:

Traditional SQL analytics: Through late decompression and compute expressions on compressed data, Vortex can filter down data within the storage segment, reducing IO and memory consumption.
Machine learning pre-processing pipelines: By supporting a wide variety of encodings for different data types, Vortex claims to be effective at reading and writing data, whether it is audio, text, images or vectors.
AI model training: Encodings such as FastLanes allow for a very efficient copy of data to the GPU. Vortex is aiming at being able to copy data directly from S3 object storage to the GPU.

The promise of more efficient IO and memory use through late decompression is a good reason to try DuckDB and Vortex for SQL analytics. On another note, if you are looking at running analytics on unified datasets that are used for multiple use cases, including pre-processing pipelines and AI training, then Vortex may be a good candidate since it is designed to fit all of these use cases well.

Performance Experiment

For those who are number hungry, we decided to run a TPC-H benchmark scale factor 100 with DuckDB to understand how Vortex can perform as a storage format compared to Parquet. We tried to make the benchmark as fair as possible. These are the parameters:

Run on Mac M1 with 10 cores & 32 GB of memory.
The benchmark runs each query 5 times and the average is used for the final report.
The DuckDB connection is closed after each query to try to make runs “colder” and avoid DuckDB's caching (particularly with Parquet) from influencing the results. OS page caching does have an influence in subsequent runs but we decided to acknowledge this factor and still keep the first run.
Each TPC-H table is a single file, which means that lineitem files for Parquet and Vortex are quite large (both around 20 GB). This allows us to ignore the effect of globbing and having many small files.
Data files used for the benchmark are generated with tpchgen-rs and are copied out using DuckDB’s Parquet and Vortex extensions.
We compared Vortex against Parquet v1 and v2. The v2 specification allows for considerably faster reading than the v1 specification but many writers do not support this, so we thought it was worth including both.

The results are very good. The TPC-H benchmark runs 18% faster with respect to Parquet V2 and 35% faster than Parquet V1 (using the geometric means, which is the recommended approach).

Another interesting result is the standard deviation across runs. There was a considerable difference between the first (and coldest) run of each query and subsequent runs in Parquet, while Vortex performed very well across all runs with a much smaller standard deviation.

Format	Geometric Mean (s)	Arithmetic Mean (s)	Avg Std Dev (s)	Total Time (s)
parquet_v1	2.324712	2.875722	0.145914	63.265881
parquet_v2	1.839171	2.288013	0.182962	50.336281
vortex	1.507675	1.991289	0.078893	43.808349

The times did vary across different runs of the same benchmark, and subsequent runs have yielded similar results but with slight variations. The differences between Parquet v2 and Vortex have always been around 12-18% in geometric means and around 8-14% in total times. Benchmarking is very hard!

Click here to see a more detailed breakdown of the benchmark results.

DuckDB

Announcing DuckDB 1.5.2

Data Lake and Lakehouse Formats

DuckLake

Iceberg

Preliminary Jepsen Test Results

New Online Shell

Benchmarks

Coming Up

Conclusion

DuckLake v1.0: The Lakehouse Format Built on SQL Reaches Production-Readiness

Data Inlining in DuckLake: Unlocking Streaming for Data Lakes

DuckDB Now Speaks Dutch!

Announcing DuckDB 1.5.1

Data Lake and Lakehouse Formats

Lance Support

Iceberg Support

Configuration Options

Fixes

Globbing Performance

Non-Interactive Shell

Indexes

Landing Page Improvements

Conclusion

DuckDB.ExtensionKit: Building DuckDB Extensions in C#

Introduction

How Extensions Are Built Today

DuckDB.ExtensionKit

How DuckDB.ExtensionKit Works

Function Pointers

Entrypoint

Native AOT

Current Status and Limitations

Build Your Own Extension in C#

Closing Thoughts

Big Data on the Cheapest MacBook

What's in the Box?

ClickBench

Benchmark Environment

Results and Analysis

TPC-DS

Should You Buy One?

Announcing DuckDB 1.5.0

New Features

Command Line Client

Color Scheme

Dynamic Prompts in the CLI

.tables and DESCRIBE

Accessing the Last Result Using _

Pager

PEG Parser

VARIANT Type

read_duckdb Function

Azure Writes

ODBC Scanner

Major Changes

Breaking Change for Datetime Function

Lakehouse Updates

DuckLake

Delta Lake

Iceberg

Network Stack

Announcing DuckDB 1.4.4 LTS

Fixes

Correctness

Crashes and Internal Errors

Performance

Miscellaneous

Conclusion

Announcing Vortex Support in DuckDB

What is Vortex?

The DuckDB Vortex Extension

Example Usage

Why Vortex and DuckDB?

Performance Experiment

`.tables` and `DESCRIBE`

Accessing the Last Result Using `_`

`VARIANT` Type

`read_duckdb` Function