Delta Live Tables (DLT) – Slowly Changing Dimensions SCD Type 2
Delta Live Tables (DLT) is a declarative framework in Databricks for building reliable, automated ETL pipelines. It can automatically manage SCD Type 2 logic, tracking complete historical changes for dimension tables.
What SCD Type 2 Means
- SCD Type 2 keeps full history of record changes.
- When a tracked attribute changes, a new row is inserted.
- Old rows are marked as no longer current.
- Useful for customer history, product changes, employee roles, etc.
How DLT Implements SCD Type 2
DLT simplifies SCD2 by using the dlt.apply_changes function, which automatically:
- Detects changes by business key
- Expires the previous “current” row
- Inserts a new version with updated values
- Maintains current/historical indicators and effective dates
Pipeline Structure
- A source table or stream (CDC feed or Silver table)
- A target dimension table in DLT
- A configuration using dlt.apply_changes with SCD Type 2 options
Example Source Table (Customer Changes)
@dlt.table
def customer_changes():
return spark.readStream.table("silver.customer_changes")
- customer_id – business key
- name, address – tracked attributes
- change_ts – timestamp used to order events
Example Target Table (Dimension Seed)
@dlt.table
def dim_customer():
return spark.read.table("dim.dim_customer_seed")
DLT SCD Type 2 Logic
dlt.apply_changes(
target = "dim_customer",
source = "customer_changes",
keys = ["customer_id"],
sequence_by = col("change_ts"),
stored_as_scd_type = "2",
track_history_columns = ["name", "address"]
)
- keys defines the business key
- sequence_by determines the event order
- stored_as_scd_type = "2" activates SCD Type 2 mode
- track_history_columns defines which changes create new versions
Why Use DLT for SCD Type 2
- Simplifies complex MERGE logic
- Automates orchestration (Bronze → Silver → Gold)
- Supports streaming and batch sources
- Provides built-in lineage, monitoring, and data quality
- Handles automatic maintenance of Delta tables
Summary
- DLT reads change data from a source table or stream
- A dimension table is defined as a DLT-managed table
- dlt.apply_changes uses SCD2 logic to insert new versions and expire old ones
- The result is a fully historical, audit-friendly dimension table