Attaching a masking policy to every sensitive column by hand does not scale past a few dozen tables. The tag-based approach inverts the relationship: tag the columns with their data category once (PII, PHI, FINANCIAL), bind the masking policy to the tag, and any future column that inherits or is assigned the tag is masked automatically. Combined with SYSTEM$CLASSIFY for automated discovery, the pipeline becomes scan → classify → tag → policy bind → masked output, with governance reports proving every column tagged PII has a policy attached.
The five-step pipeline below is the canonical pattern. Each step is idempotent and can be re-run on a schedule as new tables are loaded.
┌──────────────────────────────────────────────────────────────────────┐
│ 1. SCAN │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ CALL SYSTEM$CLASSIFY('db.schema.table', cfg) │ │
│ │ → samples values, runs Snowflake-managed classifiers │ │
│ └────────────────────────────┬─────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 2. CLASSIFY │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Output: per-column probability per privacy category │ │
│ │ PII / QUASI_IDENTIFIER / SENSITIVE / IDENTIFIER │ │
│ └────────────────────────────┬─────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 3. TAG │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ CALL SYSTEM$SET_RETURN_TAG() or │ │
│ │ ALTER TABLE … MODIFY COLUMN x SET TAG governance.PII │ │
│ └────────────────────────────┬─────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 4. POLICY BIND │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ALTER TAG governance.PII │ │
│ │ SET MASKING POLICY pii_email_mask │ │
│ │ → policy auto-applies wherever the tag appears │ │
│ └────────────────────────────┬─────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ 5. MASKED OUTPUT │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Query returns masked value, governance reports updated │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
SYSTEM$CLASSIFY samples table contents and emits a JSON object naming each column's most likely privacy category. It uses Snowflake-managed classifiers for common patterns (email, phone, SSN, credit card, IP address) plus regex/dictionary classifiers you can register yourself.
USE ROLE GOVERNANCE_ADMIN;
-- Run on a single table
CALL SYSTEM$CLASSIFY(
'ANALYTICS.PUBLIC.CUSTOMERS',
{'auto_tag': true}
);
-- Iterate the whole schema
DECLARE c CURSOR FOR
SELECT table_schema || '.' || table_name AS qname
FROM ANALYTICS.INFORMATION_SCHEMA.TABLES
WHERE table_type = 'BASE TABLE';
BEGIN
FOR t IN c DO
CALL SYSTEM$CLASSIFY(:t.qname, {'auto_tag': true});
END FOR;
END;
-- Inspect the latest classification per column
SELECT object_database, object_schema, object_name, column_name,
privacy_category, semantic_category, confidence
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_CLASSIFICATION_LATEST;
With auto_tag: true, Snowflake assigns the system-defined tags SNOWFLAKE.CORE.PRIVACY_CATEGORY and SNOWFLAKE.CORE.SEMANTIC_CATEGORY automatically. Most teams pair these with their own governance tags for finer-grained routing.
USE ROLE GOVERNANCE_ADMIN;
CREATE SCHEMA IF NOT EXISTS analytics.governance;
-- Allowed-values tags act as enums
CREATE OR REPLACE TAG analytics.governance.data_class
ALLOWED_VALUES 'PUBLIC', 'INTERNAL', 'CONFIDENTIAL', 'RESTRICTED'
COMMENT = 'Information classification per data handling policy';
CREATE OR REPLACE TAG analytics.governance.pii
ALLOWED_VALUES 'EMAIL', 'PHONE', 'SSN', 'NAME', 'ADDRESS'
COMMENT = 'PII subtype for tag-based masking';
-- Manual assignment
ALTER TABLE customers
MODIFY COLUMN email SET TAG analytics.governance.pii = 'EMAIL',
analytics.governance.data_class = 'CONFIDENTIAL',
COLUMN phone SET TAG analytics.governance.pii = 'PHONE',
analytics.governance.data_class = 'CONFIDENTIAL',
COLUMN ssn SET TAG analytics.governance.pii = 'SSN',
analytics.governance.data_class = 'RESTRICTED';
-- Tag the schema (every object inherits, useful as a default)
ALTER SCHEMA analytics.public
SET TAG analytics.governance.data_class = 'INTERNAL';
Binding a masking policy to a tag is a single DDL statement. Every column that carries the tag — present and future — receives the policy automatically. The masking UDF can dispatch on the tag value to vary masking by PII subtype.
CREATE OR REPLACE MASKING POLICY analytics.governance.pii_mask AS
(val STRING) RETURNS STRING ->
CASE
WHEN IS_ROLE_IN_SESSION('PII_READER') THEN val
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('analytics.governance.pii') = 'EMAIL'
THEN REGEXP_REPLACE(val, '.+@', '****@')
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('analytics.governance.pii') = 'PHONE'
THEN '***-***-' || RIGHT(val, 4)
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('analytics.governance.pii') = 'SSN'
THEN 'XXX-XX-' || RIGHT(val, 4)
ELSE '***REDACTED***'
END;
-- Bind once, applies everywhere the tag is set
ALTER TAG analytics.governance.pii
SET MASKING POLICY analytics.governance.pii_mask;
After this, every existing and future column tagged with analytics.governance.pii is masked according to the role of the caller and the tag value of the column. No further ALTER TABLE calls are needed when new tables are tagged.
Tags propagate through views. A column in a view that is derived from a tagged base column inherits the tag, so the policy continues to apply downstream. SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES_WITH_LINEAGE exposes this transitive view.
SELECT object_database, object_schema, object_name, column_name,
tag_database, tag_schema, tag_name, tag_value, level, lineage
FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES_WITH_LINEAGE
WHERE tag_name = 'PII'
ORDER BY object_name, column_name;
Two queries every governance program should run on a schedule.
-- 1. Coverage: every PII-tagged column should have a masking policy
SELECT t.object_name, t.column_name, t.tag_value, p.policy_name
FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES t
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE.POLICY_REFERENCES p
ON t.object_database = p.ref_database_name
AND t.object_schema = p.ref_schema_name
AND t.object_name = p.ref_entity_name
AND t.column_name = p.ref_column_name
WHERE t.tag_name = 'PII'
ORDER BY p.policy_name NULLS FIRST;
-- 2. Drift: classification suggests PII but no tag attached
SELECT c.object_name, c.column_name, c.privacy_category, c.confidence
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_CLASSIFICATION_LATEST c
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES t
ON c.object_database = t.object_database
AND c.object_schema = t.object_schema
AND c.object_name = t.object_name
AND c.column_name = t.column_name
AND t.tag_name = 'PII'
WHERE c.privacy_category = 'IDENTIFIER'
AND c.confidence > 0.85
AND t.tag_name IS NULL;
The first query proves coverage to auditors. The second flags newly loaded data that should be tagged but is not yet — typical action is to auto-tag and let the bound policy take effect on the next query.