Dynamic Data Masking in Snowflake
Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SNOWFLAKE DYNAMIC DATA MASKING β
β β
β ββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CLIENT APP βββββΆβ MASKING POLICY ENGINE β β
β ββββββββββββββββ β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β CONDITION β β COLUMN β β ROW-LEVEL β β β
β β β RULES β β SECURITY β β MASKING β β β
β β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β β
β β β β β β β
β β βΌ βΌ βΌ β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β MASKING FUNCTIONS β β β
β β β ββββββββββ ββββββββββ ββββββββββ ββββββββ β β β
β β β β SHA256 β β SHA512 β βAES_ENC β βHASH β β β β
β β β ββββββββββ ββββββββββ ββββββββββ ββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ENCRYPTED DATA LAYER β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β Column 1 β β Column 2 β β Column 3 β β Column 4 β β β
β β β (Masked) β β (Original)β β (Tokenized)β β(Redacted)β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Dynamic Data Masking is a security feature that transforms sensitive data at query time based on the executing role. It provides column-level security without modifying stored data β applying role-based transformations (full, partial, hash, null) transparently.
A masking policy is a named object containing SQL expressions that define transformation logic. It accepts the original column value, current role, and session context, returning a transformed value. Multiple policies can exist on a table; one active policy per column.
Use full masking for PII (SSN, credit card). Use partial masking for phone numbers (show last 4). Use hashing for data matching across systems. Create separate policies per sensitivity level. Audit quarterly for policy coverage gaps.
- Column-level security: Transform data per role without changing storage
- Zero overhead: Original data preserved; masking applied at query layer only
- Multiple types: Full, partial, hash, null, external function transforms
- Role-based: Different roles see different representations of same data
- Audit: Use POLICY_REFERENCES() and ACCESS_HISTORY for compliance reporting
Detailed Explanation
Dynamic Data Masking (DDM) in Snowflake is a powerful security feature that provides real-time transformation of sensitive data at the query layer without modifying the underlying stored data. This mechanism operates as a security abstraction layer, intercepting SQL queries and applying predefined transformation rules based on the executing user's role, attributes, and context.
The masking policy engine evaluates each column access against a set of conditional rules that determine whether to return the original value, a masked version, or an error. This evaluation happens at query execution time, ensuring zero latency impact on data storage while providing consistent security enforcement across all access paths.
Snowflake's masking policies support multiple transformation functions including full redaction (replacing values with static strings), partial masking (preserving partial data like last 4 digits of SSN), hashing (SHA-256, SHA-512), encryption (AES-256), and tokenization. Each function can be combined with conditional logic to create context-aware masking that adapts based on user roles, time of day, IP address, or any custom session parameter.
Column-level security through masking policies enables fine-grained access control where different users can query the same table but receive different data views. For example, a customer service representative might see only the last 4 digits of a credit card number, while a fraud analyst sees the full number, and an auditor sees an audit trail of who accessed what.
The conditional masking feature allows organizations to implement complex business rules such as masking data differently during business hours versus off-hours, or applying stricter masking for users accessing from external networks. This flexibility is critical for compliance with regulations like GDPR, CCPA, HIPAA, and PCI-DSS that require different levels of data protection based on data classification and user authorization.
Snowflake's approach to dynamic data masking differs from traditional static masking by eliminating the need for separate masked copies of datasets. This reduces storage costs, eliminates data synchronization issues, and ensures that masked data always reflects the most current state of the source data. The masking policies are stored as metadata and applied transparently, making the implementation invisible to end-user applications.
Key Concepts
| Concept | Description | Use Case |
|---|---|---|
| Masking Policy | SQL object defining masking rules for columns | Apply consistent masking across multiple tables |
| Conditional Masking | Role-based or context-aware data transformation | Different masks for different user roles |
| Column-Level Security | Fine-grained access control at column level | Protect PII while allowing query access |
| Tokenization | Replace sensitive data with non-reversible tokens | PCI-DSS compliance for payment data |
| External Tokenization | Token generation via external services | Integration with enterprise tokenization systems |
| Masking Functions | Built-in functions for data transformation | SHA-256, AES encryption, partial masking |
| Policy Assignment | Attaching masking policies to table columns | Apply policies to existing tables |
| Policy Stacking | Multiple policies on a single column | Layered security controls |
| Session Context | User/session attributes for policy evaluation | Dynamic masking based on runtime context |
| Data Classification | Automated sensitive data detection | Identify columns requiring masking |
Code Examples
1. Creating a Basic Masking Policy
-- Create a masking policy for PII data
CREATE OR REPLACE MASKING POLICY pii_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'SECURITY_OFFICER') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN REGEXP_REPLACE(val, '.', '*')
WHEN CURRENT_ROLE() = 'SUPPORT' THEN
CONCAT(SUBSTRING(val, 1, 2), REPEAT('*', LENGTH(val) - 4), SUBSTRING(val, -2))
ELSE '***MASKED***'
END;
-- Apply masking policy to a column
ALTER TABLE customers MODIFY COLUMN email SET MASKING POLICY pii_masking_policy;
-- Apply masking policy to multiple columns
ALTER TABLE customers MODIFY COLUMN phone_number SET MASKING POLICY pii_masking_policy;
ALTER TABLE customers MODIFY COLUMN ssn SET MASKING POLICY pii_masking_policy;
2. Conditional Masking Based on Context
-- Create a conditional masking policy with time-based rules
CREATE OR REPLACE MASKING POLICY conditional_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
-- Full access during business hours for admins
WHEN CURRENT_ROLE() = 'ADMIN'
AND HOUR(CURRENT_TIMESTAMP()) BETWEEN 8 AND 18 THEN val
-- Partial mask during business hours for analysts
WHEN CURRENT_ROLE() = 'ANALYST'
AND HOUR(CURRENT_TIMESTAMP()) BETWEEN 8 AND 18 THEN
CONCAT(SUBSTRING(val, 1, 3), '***', SUBSTRING(val, -3))
-- Full mask outside business hours for everyone
ELSE '***RESTRICTED***'
END;
-- Create a masking policy with network-based conditions
CREATE OR REPLACE MASKING POLICY network_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() = 'ADMIN' THEN val
WHEN CURRENT_WAREHOUSE() IN ('EXTERNAL_WH', 'PARTNER_WH') THEN
CONCAT('EXTERNAL_', HASH(val, 256))
ELSE val
END;
3. Numeric and Date Masking
-- Create a masking policy for numeric data
CREATE OR REPLACE MASKING POLICY numeric_masking_policy AS (val NUMBER)
RETURNS NUMBER ->
CASE
WHEN CURRENT_ROLE() = 'ADMIN' THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN ROUND(val, -2) -- Round to nearest 100
WHEN CURRENT_ROLE() = 'FINANCE' THEN ROUND(val, -1) -- Round to nearest 10
ELSE 0
END;
-- Create a masking policy for date data
CREATE OR REPLACE MASKING POLICY date_masking_policy AS (val DATE)
RETURNS DATE ->
CASE
WHEN CURRENT_ROLE() = 'ADMIN' THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN DATE_TRUNC('MONTH', val) -- First day of month
WHEN CURRENT_ROLE() = 'SUPPORT' THEN DATE_TRUNC('YEAR', val) -- First day of year
ELSE '1900-01-01'::DATE
END;
-- Apply policies to financial table
ALTER TABLE financial_transactions MODIFY COLUMN amount SET MASKING POLICY numeric_masking_policy;
ALTER TABLE financial_transactions MODIFY COLUMN transaction_date SET MASKING POLICY date_masking_policy;
4. Tokenization Policy
-- Create a tokenization masking policy
CREATE OR REPLACE MASKING POLICY tokenization_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'TOKENIZER') THEN val
ELSE HASH(val, 256) -- SHA-256 hash for tokenization
END;
-- Create a reversible tokenization policy using AES encryption
CREATE OR REPLACE MASKING POLICY aes_tokenization_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'DETOKENIZER') THEN val
ELSE AES_ENCRYPT(val, 'your-secret-key-here') -- Encrypted token
END;
-- Apply tokenization to sensitive columns
ALTER TABLE customer_data MODIFY COLUMN credit_card_number SET MASKING POLICY tokenization_policy;
ALTER TABLE customer_data MODIFY COLUMN ssn SET MASKING POLICY aes_tokenization_policy;
5. Python Implementation
import snowflake.connector
from snowflake.connector import DictCursor
def create_masking_policies():
"""Create comprehensive masking policies in Snowflake"""
conn = snowflake.connector.connect(
user='your_user',
password='your_password',
account='your_account',
warehouse='COMPUTE_WH',
database='SECURITY_DB',
schema='MASKING'
)
try:
cursor = conn.cursor()
# Create masking policy for email
cursor.execute("""
CREATE OR REPLACE MASKING POLICY email_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'DATA_ENGINEER') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN
CONCAT(SUBSTRING(val, 1, 2), '***@', SPLIT_PART(val, '@', 2))
ELSE '***@***.com'
END
""")
# Create masking policy for phone numbers
cursor.execute("""
CREATE OR REPLACE MASKING POLICY phone_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'SUPPORT') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN
CONCAT('(***) ***-', SUBSTRING(val, -4))
ELSE '(***) ***-****'
END
""")
# Create masking policy for financial data
cursor.execute("""
CREATE OR REPLACE MASKING POLICY financial_masking_policy AS (val NUMBER)
RETURNS NUMBER ->
CASE
WHEN CURRENT_ROLE() = 'FINANCE_ADMIN' THEN val
WHEN CURRENT_ROLE() = 'FINANCE_ANALYST' THEN ROUND(val, -2)
WHEN CURRENT_ROLE() = 'ANALYST' THEN ROUND(val, -3)
ELSE 0
END
""")
# Apply policies to tables
cursor.execute("""
ALTER TABLE customer_data
MODIFY COLUMN email SET MASKING POLICY email_masking_policy;
""")
cursor.execute("""
ALTER TABLE customer_data
MODIFY COLUMN phone SET MASKING POLICY phone_masking_policy;
""")
cursor.execute("""
ALTER TABLE financial_data
MODIFY COLUMN amount SET MASKING POLICY financial_masking_policy;
""")
print("Masking policies created and applied successfully!")
finally:
conn.close()
def query_masked_data():
"""Demonstrate how different roles see different data"""
conn = snowflake.connector.connect(
user='your_user',
password='your_password',
account='your_account',
warehouse='COMPUTE_WH',
database='SECURITY_DB',
schema='MASKING'
)
try:
cursor = conn.cursor()
# Query as analyst role
cursor.execute("USE ROLE ANALYST")
cursor.execute("""
SELECT
customer_id,
email,
phone,
credit_card_number
FROM customer_data
LIMIT 5
""")
print("Data viewed as ANALYST:")
for row in cursor.fetchall():
print(f" ID: {row[0]}, Email: {row[1]}, Phone: {row[2]}, CC: {row[3]}")
# Query as admin role
cursor.execute("USE ROLE ADMIN")
cursor.execute("""
SELECT
customer_id,
email,
phone,
credit_card_number
FROM customer_data
LIMIT 5
""")
print("\nData viewed as ADMIN:")
for row in cursor.fetchall():
print(f" ID: {row[0]}, Email: {row[1]}, Phone: {row[2]}, CC: {row[3]}")
finally:
conn.close()
if __name__ == "__main__":
create_masking_policies()
query_masked_data()
Performance Metrics
| Metric | Value | Description |
|---|---|---|
| Policy Evaluation Latency | < 1ms per column | Negligible impact on query performance |
| Storage Overhead | 0 bytes | No additional storage for masked data |
| Policy Cache Hit Rate | > 99% | Policies cached in memory for fast access |
| Concurrent Policy Evaluations | 10M+ per second | High throughput for enterprise workloads |
| Policy Deployment Time | < 1 second | Instant policy application |
| Query Impact | < 2% overhead | Minimal performance degradation |
Best Practices
-
Use Role-Based Masking: Always design masking policies around user roles rather than individual users for scalability and maintainability.
-
Implement Least Privilege: Start with the most restrictive masking and gradually grant exceptions based on business need.
-
Test Policy Impact: Measure query performance before and after applying masking policies to ensure they don't introduce unexpected latency.
-
Version Control Policies: Store masking policy definitions in Git and deploy through CI/CD pipelines for audit trail and rollback capability.
-
Monitor Policy Usage: Use
ACCESS_HISTORYto track which policies are being applied and identify potential abuse patterns. -
Combine with Row-Level Security: Masking policies work best when combined with row-level security policies for comprehensive data protection.
-
Use External Tokenization: For PCI-DSS compliance, consider using external tokenization services instead of built-in masking for payment card data.
-
Regular Policy Audits: Review masking policies quarterly to ensure they align with current compliance requirements and data classification standards.
See Also
- PySpark Iceberg - Data lake security patterns
- Delta Lake on Databricks - Delta Lake security model
- Data Warehouse Concepts - Data warehouse design principles