DuckLake

DuckLake is a 2025 lakehouse format from the DuckDB team. Its central design choice is unusual: instead of storing table metadata as JSON / Avro files in the data directory (the Iceberg / Delta / Hudi pattern), DuckLake puts the catalog metadata in a regular SQL database (any Postgres, MySQL, SQLite, or DuckDB). Data files remain Parquet on object storage. The result is a single small binary plus a Postgres — no HMS, no REST catalog server, no JVM — with full ACID semantics on the lakehouse.

Key Features:

SQL-Database Catalog. Metadata lives in real tables, queryable with SQL. Inspecting the table layout means SELECT * FROM ducklake_metadata, not parsing JSON files.
ACID Transactions. Multi-table commits via a single SQL transaction in the metadata DB — the database does the heavy lifting Iceberg builds itself.
Time Travel & Schema Evolution. Standard lakehouse table-format features.
Single Binary. DuckDB + DuckLake extension. No JVM, no Spark cluster, no HMS process. Run on a laptop or a Lambda.
Open Spec. Format and protocol are open; other engines can implement readers.
Object-Storage-Native. Parquet on S3 / GCS / R2; metadata DB anywhere with a JDBC driver.

Why It’s Notable:

The big three open table formats (Iceberg, Hudi, Delta) reinvented database internals on top of object storage — transaction logs, optimistic concurrency, snapshot isolation, all hand-built. DuckLake takes the opposite philosophical approach: object storage holds the data, a real database holds the metadata, and you compose two well-understood systems instead of building a new one. It’s the simplest lakehouse design currently shipping.

Trade-offs:

Pros. Drastically simpler operations, instant ACID via SQL transactions, easier introspection, no metadata-only catalog server to run.
Cons. Catalog DB becomes a coordination bottleneck at very high write concurrency; ecosystem is brand new (2025); fewer engines support it today.

Use Cases:

Small-to-medium lakehouses where the operational overhead of Iceberg + Polaris is overkill.
Embedded / single-node analytics on top of Parquet on S3.
Greenfield projects starting in 2025+ that want the simplest possible lakehouse stack.
Data-engineering teams already standardized on DuckDB for local development.

DuckLake is brand new — treat it as a credible architectural option to track, not yet as a default production choice.