Data lineage in Unity Catalog automatically tracks the flow and transformations of data across your organization. It captures relationships between tables, columns, and processes, enabling impact analysis, compliance audits, and debugging.
Unity Catalog automatically captures lineage from:
No additional configuration or instrumentation is required — lineage is captured transparently.
In the Databricks workspace:
For programmatic access to lineage data, query the system tables in your metastore.
-- View upstream dependencies (which tables feed into my_table)
SELECT
upstream_table_id,
upstream_schema_id,
upstream_catalog_id,
downstream_table_id,
creation_time
FROM system.access.table_lineage
WHERE downstream_table_full_name = 'finance_prod.transactions.orders'
ORDER BY creation_time DESC;
-- View column-level dependencies
SELECT
upstream_catalog,
upstream_schema,
upstream_table,
upstream_column,
downstream_catalog,
downstream_schema,
downstream_table,
downstream_column,
created_at
FROM system.access.column_lineage
WHERE downstream_table_full_name = 'finance_prod.transactions.order_summary'
ORDER BY created_at DESC;
-- Identify all tables that depend on a specific source table
SELECT DISTINCT downstream_table_full_name
FROM system.access.table_lineage
WHERE upstream_table_full_name = 'finance_prod.transactions.orders';
When modifying a table schema or transforming data, query lineage to identify all downstream tables that will be affected. Prioritize testing and communication accordingly.
-- "If I change the orders table, what else breaks?"
WITH RECURSIVE impacted AS (
SELECT downstream_table_full_name
FROM system.access.table_lineage
WHERE upstream_table_full_name = 'finance_prod.transactions.orders'
)
SELECT * FROM impacted;
For regulatory requirements (GDPR, CCPA, SOX), lineage proves how sensitive data flows through the organization. Use lineage to demonstrate that only authorized systems access protected data.
When downstream reports show unexpected values, trace lineage backward to identify where corrupted or incorrect data originated. Narrow down which transformation step introduced the issue.
New team members can explore related tables by following lineage chains, understanding data transformations and dependencies without manual documentation.