Snowflake Advanced UDFs and UDTFs
Advanced UDFs and UDTFs in Snowflake enable custom data transformations, complex calculations, and reusable business logic that extends Snowflake's built-in capabilities.
Architecture Overview
<svg width="800" height="450" viewBox="0 0 800 450" xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="udfGrad" x1="0%" y1="0%" x2="100%" y2="0%">
<stop offset="0%" style="stop-color:#E74C3C;stop-opacity:1" />
<stop offset="100%" style="stop-color:#EC7063;stop-opacity:1" />
</linearGradient>
<linearGradient id="udtfGrad" x1="0%" y1="0%" x2="100%" y2="0%">
<stop offset="0%" style="stop-color:#3498DB;stop-opacity:1" />
<stop offset="100%" style="stop-color:#5DADE2;stop-opacity:1" />
</linearGradient>
</defs>
<text x="400" y="30" text-anchor="middle" font-size="18" font-weight="bold" fill="#333">Advanced UDFs and UDTFs Architecture</text>
<rect x="30" y="60" width="350" height="180" rx="10" fill="url(#udfGrad)" opacity="0.9"/>
<text x="205" y="85" text-anchor="middle" font-size="14" fill="white" font-weight="bold">User-Defined Functions (UDFs)</text>
<rect x="50" y="100" width="150" height="60" rx="8" fill="white" opacity="0.9"/>
<text x="125" y="125" text-anchor="middle" font-size="10" fill="#333" font-weight="bold">Scalar UDFs</text>
<text x="125" y="145" text-anchor="middle" font-size="9" fill="#666">Single value in, single value out</text>
<rect x="210" y="100" width="150" height="60" rx="8" fill="white" opacity="0.9"/>
<text x="285" y="125" text-anchor="middle" font-size="10" fill="#333" font-weight="bold">Stored UDFs</text>
<text x="285" y="145" text-anchor="middle" font-size="9" fill="#666">Persisted results caching</text>
<text x="205" y="185" text-anchor="middle" font-size="11" fill="white">SQL | JavaScript | Python | Java | Scala</text>
<text x="205" y="205" text-anchor="middle" font-size="10" fill="white" font-style="italic">Can be used in SELECT, WHERE, GROUP BY</text>
<text x="205" y="225" text-anchor="middle" font-size="10" fill="white" font-style="italic">Deterministic or non-deterministic</text>
<rect x="420" y="60" width="350" height="180" rx="10" fill="url(#udtfGrad)" opacity="0.9"/>
<text x="595" y="85" text-anchor="middle" font-size="14" fill="white" font-weight="bold">UDTFs</text>
<rect x="440" y="100" width="150" height="60" rx="8" fill="white" opacity="0.9"/>
<text x="515" y="125" text-anchor="middle" font-size="10" fill="#333" font-weight="bold">Table Functions</text>
<text x="515" y="145" text-anchor="middle" font-size="9" fill="#666">Multi-row output</text>
<rect x="600" y="100" width="150" height="60" rx="8" fill="white" opacity="0.9"/>
<text x="675" y="125" text-anchor="middle" font-size="10" fill="#333" font-weight="bold">Generator Functions</text>
<text x="675" y="145" text-anchor="middle" font-size="9" fill="#666">Row generation</text>
<text x="595" y="185" text-anchor="middle" font-size="11" fill="white">Used with LATERAL FLATTEN, TABLE()</text>
<text x="595" y="205" text-anchor="middle" font-size="10" fill="white" font-style="italic">Returns result set, not single value</text>
<text x="595" y="225" text-anchor="middle" font-size="10" fill="white" font-style="italic">Can process arrays and objects</text>
<rect x="30" y="260" width="740" height="170" rx="10" fill="#27AE60" opacity="0.85"/>
<text x="400" y="285" text-anchor="middle" font-size="14" fill="white" font-weight="bold">UDF vs UDTF Comparison</text>
<rect x="50" y="300" width="170" height="110" rx="8" fill="white"/>
<text x="135" y="320" text-anchor="middle" font-size="11" fill="#333" font-weight="bold">UDF</text>
<text x="135" y="340" text-anchor="middle" font-size="9" fill="#666">Returns single value</text>
<text x="135" y="355" text-anchor="middle" font-size="9" fill="#666">Used in expressions</text>
<text x="135" y="370" text-anchor="middle" font-size="9" fill="#666">Inline or stored</text>
<text x="135" y="385" text-anchor="middle" font-size="9" fill="#666">Deterministic option</text>
<text x="135" y="400" text-anchor="middle" font-size="9" fill="#666">Performance: Fast</text>
<rect x="240" y="300" width="170" height="110" rx="8" fill="white"/>
<text x="325" y="320" text-anchor="middle" font-size="11" fill="#333" font-weight="bold">UDTF</text>
<text x="325" y="340" text-anchor="middle" font-size="9" fill="#666">Returns multiple rows</text>
<text x="325" y="355" text-anchor="middle" font-size="9" fill="#666">Used with TABLE()</text>
<text x="325" y="370" text-anchor="middle" font-size="9" fill="#666">Can be pipelined</text>
<text x="325" y="385" text-anchor="middle" font-size="9" fill="#666">Row-by-row processing</text>
<text x="325" y="400" text-anchor="middle" font-size="9" fill="#666">Performance: Medium</text>
<rect x="430" y="300" width="170" height="110" rx="8" fill="white"/>
<text x="515" y="320" text-anchor="middle" font-size="11" fill="#333" font-weight="bold">Stored Procedure</text>
<text x="515" y="340" text-anchor="middle" font-size="9" fill="#666">Can do DML/DDL</text>
<text x="515" y="355" text-anchor="middle" font-size="9" fill="#666">Side effects allowed</text>
<text x="515" y="370" text-anchor="middle" font-size="9" fill="#666">Transaction control</text>
<text x="515" y="385" text-anchor="middle" font-size="9" fill="#666">Complex logic</text>
<text x="515" y="400" text-anchor="middle" font-size="9" fill="#666">Performance: Varies</text>
<rect x="620" y="300" width="140" height="110" rx="8" fill="white"/>
<text x="690" y="320" text-anchor="middle" font-size="11" fill="#333" font-weight="bold">External Fn</text>
<text x="690" y="340" text-anchor="middle" font-size="9" fill="#666">Calls cloud APIs</text>
<text x="690" y="355" text-anchor="middle" font-size="9" fill="#666">Network latency</text>
<text x="690" y="370" text-anchor="middle" font-size="9" fill="#666">Batch processing</text>
<text x="690" y="385" text-anchor="middle" font-size="9" fill="#666">Custom logic</text>
<text x="690" y="400" text-anchor="middle" font-size="9" fill="#666">Performance: Variable</text>
</svg>
Scalar UDFs
SQL Scalar UDF
CREATE OR REPLACE FUNCTION calculate_discount(price DECIMAL(10,2), discount_pct DECIMAL(5,2))
RETURNS DECIMAL(10,2)
LANGUAGE SQL
IMMUTABLE
AS
$$
SELECT price * (1 - discount_pct / 100)
$$;
-- Usage
SELECT
product_name,
price,
calculate_discount(price, 15) as discounted_price
FROM products;
JavaScript Scalar UDF
CREATE OR REPLACE FUNCTION parse_email(email VARCHAR)
RETURNS OBJECT
LANGUAGE JAVASCRIPT
IMMUTABLE
AS
$$
const parts = email.split('@');
return {
username: parts[0],
domain: parts[1],
isValid: parts.length === 2 && parts[1].includes('.')
};
$$;
-- Usage
SELECT
email,
parse_email(email):username::STRING as username,
parse_email(email):domain::STRING as domain
FROM users;
Python Scalar UDF
CREATE OR REPLACE FUNCTION sentiment_score(text VARCHAR)
RETURNS FLOAT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('textblob')
HANDLER = 'get_sentiment'
AS
$$
from textblob import TextBlob
def get_sentiment(text):
blob = TextBlob(text)
return blob.sentiment.polarity
$$;
User-Defined Table Functions (UDTFs)
SQL UDTF
CREATE OR REPLACE FUNCTION generate_date_range(start_date DATE, end_date DATE)
RETURNS TABLE (date DATE, day_name VARCHAR, week_number INTEGER)
LANGUAGE SQL
AS
$$
WITH date_series AS (
SELECT DATEADD(day, SEQ4(), :start_date) as date
FROM TABLE(GENERATOR(ROWCOUNT => DATEDIFF('day', :start_date, :end_date) + 1))
)
SELECT
date,
DAYNAME(date) as day_name,
WEEKOFYEAR(date) as week_number
FROM date_series
$$;
-- Usage
SELECT * FROM TABLE(generate_date_range('2024-01-01', '2024-12-31'));
JavaScript UDTF
CREATE OR REPLACE FUNCTION split_string_to_rows(input VARCHAR, delimiter VARCHAR)
RETURNS TABLE (value VARCHAR, position INTEGER)
LANGUAGE JAVASCRIPT
AS
$$
const values = INPUT_DELIMITER.split(DELIMITER);
for (let i = 0; i < values.length; i++) {
yield {VALUE: values[i], POSITION: i + 1};
}
$$
INPUT_DELIMITER => (delimiter VARCHAR, delimiter VARCHAR);
-- Usage
SELECT * FROM TABLE(split_string_to_rows('a,b,c,d', ','));
Python UDTF for Time Series
CREATE OR REPLACE FUNCTION moving_average(values ARRAY, window_size INTEGER)
RETURNS TABLE (index INTEGER, value FLOAT, moving_avg FLOAT)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
HANDLER = 'compute_moving_avg'
AS
$$
class compute_moving_avg:
def __init__(self):
self.values = []
self.window_size = 0
def process(self, values, window_size):
self.values = values
self.window_size = window_size
for i in range(len(values)):
start = max(0, i - window_size + 1)
window_vals = values[start:i+1]
moving_avg = sum(window_vals) / len(window_vals)
yield (i, values[i], moving_avg)
$$;
Stored UDFs for Caching
CREATE OR REPLACE FUNCTION expensive_calculation(input VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
CALLED ON NULL INPUT
STABLE
AS
$$
SELECT MD5(input || CURRENT_DATE())
$$;
-- Stored UDF with caching
CREATE OR REPLACE FUNCTION cached_lookup(key VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
STABLE
AS
$$
SELECT value FROM lookup_table WHERE lookup_key = key
$$;
Use IMMUTABLE for functions with no side effects and consistent outputs. Use STABLE for functions that depend on session state. Use VOLATILE for functions with random or time-dependent outputs.
UDF Best Practices
| Aspect | Best Practice | Impact |
|---|---|---|
| Determinism | Mark as IMMUTABLE/STABLE | Enables caching |
| NULL handling | Use COALESCE or CALLED ON NULL | Prevents errors |
| Language choice | Match to complexity | Performance |
| Batch size | Process in batches | Memory efficiency |
| Error handling | Validate inputs | Reliability |
- UDFs return single values, UDTFs return result sets
- SQL, JavaScript, Python, Java, and Scala are supported
- IMMUTABLE/STABLE/VOLATILE affect caching behavior
- UDTFs are essential for row-generating transformations
- Stored UDFs cache results for repeated calls