In the modern data ecosystem, automation is key to building reliable, efficient, and scalable data pipelines. As data volumes and frequency of updates increase, manually managing ETL (Extract, Transform, Load) jobs becomes inefficient. That’s where Snowflake Streams and Tasks come in — offering a simple yet powerful way to handle Change Data Capture (CDC) and automated ETL scheduling.
A Stream in Snowflake is a mechanism that tracks changes (inserts, updates, deletes) made to a table. Instead of reloading entire datasets, Streams allow you to capture only the delta — the data that has changed since the last run.
This makes them ideal for incremental data processing.
Example: Creating a Stream
CREATE OR REPLACE STREAM orders_stream
ON TABLE raw.orders
SHOW_INITIAL_ROWS = TRUE;
The SHOW_INITIAL_ROWS = TRUE option ensures you get all existing rows as part of the first change capture, which is useful for initial loads.
Once a Stream is created, Snowflake automatically manages its offset — so each query fetches only new changes since the previous one.
A Task in Snowflake is a scheduled unit of work that runs SQL statements (like transformations or merges) automatically. You can schedule Tasks using simple time intervals or CRON expressions for fine-grained control.
Example: Creating a Task
CREATE OR REPLACE TASK orders_etl_task
WAREHOUSE = compute_wh
SCHEDULE = 'USING CRON 0 5 * * * UTC' -- Runs daily at 5 AM
AS
MERGE INTO analytics.orders a
USING (SELECT * FROM raw.orders_stream) s
ON a.id = s.id
WHEN MATCHED THEN UPDATE SET a.amount = s.amount
WHEN NOT MATCHED THEN INSERT (id, amount, order_date)
VALUES (s.id, s.amount, s.order_date);
This Task merges incremental changes from the Stream into the analytics table automatically at the defined schedule.
When combined, Streams and Tasks form a fully automated ETL pipeline. Here’s how it works:
raw.orders) — Ingests raw transactional data.orders_stream) — Tracks any new inserts, updates, or deletes.orders_etl_task) — Automatically applies these changes to the target analytics table.This setup eliminates the need for manual triggers or complex orchestration tools.
You can even chain multiple tasks to create a dependency-based pipeline:
ALTER TASK orders_cleanup ADD AFTER orders_etl_task;
This ensures the cleanup task executes only after ETL completion.
✅ Conditional Execution:
Use this built-in function to skip unnecessary runs:
SELECT SYSTEM$STREAM_HAS_DATA('orders_stream');
If no changes are detected, the Task can be configured to exit early — saving compute credits.
✅ Error Handling:
Combine Tasks with Snowflake’s notification integration or monitoring tools to capture failures automatically.
✅ Performance Optimization:
Use clustering keys on large target tables and smaller virtual warehouses for frequent incremental jobs.
✅ Version Control:
Store Stream & Task definitions in your CI/CD setup or Terraform for consistent deployment.
Imagine you have an e-commerce application generating millions of order events daily. Instead of reprocessing all records each night, you can:
This approach provides low-latency data availability, making Snowflake an ideal platform for modern, event-driven data architectures.
Snowflake Streams & Tasks simplify incremental data ingestion, CDC, and ETL scheduling — all within the Snowflake ecosystem.
No need for Airflow, CronJobs, or external tools — just pure SQL automation.