The metastore is the cloud-backed central repository for all Unity Catalog metadata.
-- Via SQL (in any workspace in the account)
CREATE METASTORE my_metastore
PROVIDER_NAME aws -- or 'azure', 'gcp'
LOCATION 's3://my-bucket/metastore-root' -- cloud storage path
Alternatively, use the Databricks account console (admin.databricks.com) under Admin → Metastores to create visually.
A workspace can use only one metastore. All workspaces assigned to the same metastore share catalogs and permissions.
-- Via account console or API
-- Associate the workspace with the metastore
Once assigned, run a test query to verify:
SELECT current_metastore();
Catalogs are top-level containers. Schemas organize tables within catalogs.
-- Create a catalog (requires account-level admin or owner role)
CREATE CATALOG finance_prod;
-- Create a schema within the catalog
CREATE SCHEMA finance_prod.transactions;
-- Create a table in the schema (3-level path)
CREATE TABLE finance_prod.transactions.orders (
order_id INT,
customer_id INT,
order_date TIMESTAMP,
amount DECIMAL(10, 2)
);
Required for external tables and volumes.
-- Create a storage credential (AWS example)
CREATE STORAGE CREDENTIAL s3_credentials
PROVIDER amazon
CREDENTIAL_PROVIDER_TYPE assume_role
WITH (ROLE_ARN = 'arn:aws:iam::123456789012:role/UnityCatalogRole');
-- Create an external location tied to a storage credential
CREATE EXTERNAL LOCATION s3_data_lake
URL 's3://my-data-bucket/external-data'
WITH (STORAGE_CREDENTIAL = s3_credentials);
Control who can access catalogs, schemas, and tables using the principle of least privilege.
-- Grant USE CATALOG to a group (enables browsing the catalog)
GRANT USE CATALOG ON CATALOG finance_prod TO `data-team@company.com`;
-- Grant USE SCHEMA to access a specific schema
GRANT USE SCHEMA ON SCHEMA finance_prod.transactions TO `data-team@company.com`;
-- Grant SELECT to read table data
GRANT SELECT ON TABLE finance_prod.transactions.orders TO `data-team@company.com`;
-- Grant MODIFY for write access
GRANT MODIFY ON TABLE finance_prod.transactions.orders TO `data-team@company.com`;
-- Grant ALL PRIVILEGES (full control, use sparingly)
GRANT ALL PRIVILEGES ON CATALOG finance_prod TO `admin-group@company.com`;
-- Show all grants on a catalog
SHOW GRANTS ON CATALOG finance_prod;
-- Show all grants on a specific schema
SHOW GRANTS ON SCHEMA finance_prod.transactions;
-- Show grants for a specific principal
SHOW GRANTS ON SCHEMA finance_prod.transactions TO `data-team@company.com`;
-- View who has access to a table
SHOW GRANTS ON TABLE finance_prod.transactions.orders;
data-engineers, analysts) for easier management and scalability.SHOW GRANTS and audit logs to review who has access to sensitive data.Unity Catalog organizes data as catalog.schema.table instead of the legacy two-level schema.table (Hive metastore) or database.table. The catalog level is where you typically draw the environment or business-domain boundary — finance_prod, marketing_dev — and grants flow down the hierarchy. The benefit is one workspace can host many isolated data products with their own access policies, instead of needing one workspace per environment.
The metastore is an account-level object that holds catalog/schema/table metadata and grants — it lives above any individual workspace. A single metastore is attached to one or more workspaces in the same region, so multiple workspaces share the same data namespace and access policies. This is the architectural shift from the legacy Hive metastore, which was per-workspace; with UC, identity and data permissions are centralized at the account.
An external location is a Unity Catalog object that pairs a cloud storage URI (e.g., s3://bucket/path/) with a storage credential (an IAM role or service principal). It governs read/write access to that path — UC checks READ FILES/WRITE FILES grants before issuing temporary credentials. Use external locations for data that lives in your own buckets and you want to keep control of (regulatory data, data shared with non-Databricks systems); use managed tables when UC can own the storage layout entirely.
Managed tables: UC owns the storage location (a metastore-level managed root) and the file lifecycle — DROP TABLE deletes the data. External tables: you specify a path under an external location, UC tracks the metadata but does not delete files on drop. Default to managed tables for everything UC originates; use external tables only when the data must be readable by non-Databricks consumers, or when migrating from an existing data lake without moving files.
Permissions are granted to principals (users, groups, service principals) on securables (catalog, schema, table, view, function, volume, external location, model). To query a table you need USE CATALOG on its catalog, USE SCHEMA on its schema, and SELECT on the table itself — the chain enforces visibility at every level. Row filters and column masks attach to tables/views as functions for fine-grained control. Always grant to groups, never individuals; UC integrates with SCIM-provisioned identities from your IdP.
Use UCX (the Databricks-provided migration toolkit) to inventory the legacy hive_metastore catalog, generate an assessment report, and plan moves. The mechanical steps are: create the UC metastore and a target catalog, set up storage credentials and external locations covering the source paths, then for each table either run SYNC for in-place upgrade (external tables) or CREATE TABLE ... DEEP CLONE for a copy. Update jobs and notebooks to use three-level names. Run both metastores in parallel during cutover; retire hive_metastore only after every consumer is migrated.