Trino & Presto
Trino (originally PrestoSQL) and Presto (PrestoDB) are distributed SQL query engines designed to run interactive analytical queries over heterogeneous data sources. Both descend from the engine Facebook built in 2012 to query Hadoop with low latency; the project forked in 2020 when the original creators left Facebook and renamed their fork Trino. Today Trino is the more actively developed branch and the de-facto open query engine for the lakehouse.
Key Features:
- ANSI SQL. Full standard-SQL semantics, including window functions, CTEs, subqueries, and complex types.
- Connector Architecture. Native readers for Iceberg, Hudi, Delta Lake, Hive, MySQL, PostgreSQL, Cassandra, MongoDB, Kafka, Elasticsearch, and 30+ more — queryable in a single SQL statement via cross-source joins.
- Pushdown Optimization. Predicates, projections, and aggregations push into the underlying source when supported (e.g. Postgres for filters, Iceberg for partition pruning).
- In-Memory Execution. Trino is a query engine, not a database — no storage of its own. Distributed coordinator + workers process pipelined operators in memory across the cluster.
- Cost-Based Optimizer. Statistics-driven join reordering and dynamic filtering for star-schema queries on the lakehouse.
- Federation. Join data across systems — e.g. an Iceberg fact table to a Postgres dimension — without ETL.
Trino vs. Presto:
- Trino — Faster development cadence, broader connector ecosystem, used by Starburst and the open community. Default choice today.
- Presto (PrestoDB) — Maintained by Meta and the Linux Foundation; still in wide use at Meta, Uber, and a long tail of older Presto deployments.
- The SQL surface and connector model are nearly identical — differences are operational and around recent feature additions.
Use Cases:
- Interactive BI on a lakehouse — Tableau, Superset, Metabase against Iceberg / Delta / Hudi tables.
- Ad-hoc analytics across a Hive Metastore + cloud RDBMS without a data movement layer.
- Federated queries that join warehouse data with operational stores in a single SQL statement.
- The query layer in disaggregated lakehouse stacks (Iceberg + Polaris + Trino) — an open alternative to Snowflake / BigQuery / Databricks SQL.