Top 20 Snowflake Interview Questions & Answers (2025)

1. What is Snowflake’s architecture and why is it unique?

Snowflake uses a multi-cluster shared data architecture, separating compute, storage, and cloud services.

Storage sits in cloud object storage (S3/GCS/Azure Blob).
Compute uses virtual warehouses.
Cloud services handle security, metadata, optimization, etc.
This separation allows independent scaling, which is Snowflake’s biggest advantage.

2. Explain Virtual Warehouses and how scaling works.

A Virtual Warehouse is compute used for query processing.
Supports:

Scaling Up → increase size (S → M → L)
Scaling Out → multi-cluster (auto-scale for concurrency)
Warehouses can auto-suspend and auto-resume to save credits.

3. Difference between Cloning vs Zero-Copy Cloning.

Snowflake always uses zero-copy cloning — meaning it creates a metadata pointer, not a physical copy of data.
You can clone tables, schemas, databases instantly without extra storage.

4. What is Time Travel?

Allows accessing historical data (deleted/updated/overwritten) within a retention window (1–90 days).
Syntax:

SELECT * FROM table AT (OFFSET => -60*60);

5. How does Fail-safe work?

Fail-safe provides a 7-day recovery window after Time Travel expires.
It’s managed by Snowflake internally and cannot be queried by users.

6. What is Micro-Partitioning?

Snowflake stores data in immutable micro-partitions (50–500 MB), automatically sorted and compressed.
This enables efficient pruning → better performance and lower cost.

7. Explain Caching Layers in Snowflake.

Snowflake has 3 major caches:

Result Cache → returns results instantly (if query repeated)
Metadata Cache → micro-partition boundary metadata
Local Disk Cache → cached data on warehouse nodes
Caching improves speed without extra cost.

8. External vs Internal Stages.

Internal Stage → inside Snowflake (named stage, table stage)
External Stage → cloud storage (S3, GCS, Azure)
Used for ingesting or unloading files.

9. What is a Stream?

Streams track changes (CDC) on tables — inserts, updates, deletes.
Used with Tasks for incremental ingestion.

10. What is a Task?

Tasks schedule SQL execution (like cron jobs).
Used to automate pipelines, e.g. incremental loads with Streams.

11. Explain Snowpipe and how it works.

Snowpipe enables continuous data ingestion using:

Event notifications (S3, GCS, Azure)
COPY statements triggered automatically
It loads data in micro-batches.

12. Role-Based Access Control (RBAC)

Snowflake follows hierarchical RBAC:

Users
Roles
Privileges
Objects
Best practice: assign roles → roles → users (never direct privileges to users).

13. Transient Tables vs Temporary Tables.

Temporary Table → session-level, auto-deleted
Transient Table → permanent but no Fail-safe
Used for staging tables to reduce cost.

14. Secure View vs Normal View.

View exposes underlying SQL and optimizations
Secure View hides underlying logic and prevents exposure of raw data
Used for compliance and data sharing.

15. What is Query Profile?

A visualization tool that shows:

data scanned
partitions pruned
execution steps
bottlenecks
Used for performance tuning.

16. What are Clustering Keys? When should you use them?

Snowflake auto-clusters by default, but clustering keys help when:

Large tables (>1B rows)
Poor pruning
Uneven access patterns
Don’t overuse — clustering costs credits.

17. How does Snowflake charge customers?

Snowflake pricing includes:

Storage (TB/month)
Compute (credits per hour)
Cloud Services (small overhead)
Data Transfer (for cross-region)
Compute is the major cost.

18. External Tables — where to use them?

Used when querying data directly from cloud storage without loading into Snowflake.
Useful for:

Data lakes
Raw logs
Large unstructured datasets

19. What is a Materialized View?

A Materialized View stores precomputed results.
Best for frequently-run queries.
Limitations:

Costly to maintain
Does not support all SQL (e.g., joins limited)
Not ideal for high-churn tables

20. Best practices for Snowflake performance tuning

Use proper warehouse size
Avoid SELECT *
Load semi-structured data using VARIANT
Use result cache wisely
Apply clustering only where needed
Partition pruning-aware SQL
Prefer streams + tasks over manual ETL logic

Hinzinfotech