Top 20 Snowflake Interview Questions & Answers (2025)

Below are the 20 most commonly asked Snowflake interview topics — explained clearly and practically.


1. What is Snowflake’s architecture and why is it unique?

Snowflake uses a multi-cluster shared data architecture, separating compute, storage, and cloud services.

  • Storage sits in cloud object storage (S3/GCS/Azure Blob).
  • Compute uses virtual warehouses.
  • Cloud services handle security, metadata, optimization, etc.
    This separation allows independent scaling, which is Snowflake’s biggest advantage.

2. Explain Virtual Warehouses and how scaling works.

A Virtual Warehouse is compute used for query processing.
Supports:

  • Scaling Up → increase size (S → M → L)
  • Scaling Out → multi-cluster (auto-scale for concurrency)
    Warehouses can auto-suspend and auto-resume to save credits.

3. Difference between Cloning vs Zero-Copy Cloning.

Snowflake always uses zero-copy cloning — meaning it creates a metadata pointer, not a physical copy of data.
You can clone tables, schemas, databases instantly without extra storage.


4. What is Time Travel?

Allows accessing historical data (deleted/updated/overwritten) within a retention window (1–90 days).
Syntax:

SELECT * FROM table AT (OFFSET => -60*60);

5. How does Fail-safe work?

Fail-safe provides a 7-day recovery window after Time Travel expires.
It’s managed by Snowflake internally and cannot be queried by users.


6. What is Micro-Partitioning?

Snowflake stores data in immutable micro-partitions (50–500 MB), automatically sorted and compressed.
This enables efficient pruning → better performance and lower cost.


7. Explain Caching Layers in Snowflake.

Snowflake has 3 major caches:

  • Result Cache → returns results instantly (if query repeated)
  • Metadata Cache → micro-partition boundary metadata
  • Local Disk Cache → cached data on warehouse nodes
    Caching improves speed without extra cost.

8. External vs Internal Stages.

  • Internal Stage → inside Snowflake (named stage, table stage)
  • External Stage → cloud storage (S3, GCS, Azure)
    Used for ingesting or unloading files.

9. What is a Stream?

Streams track changes (CDC) on tables — inserts, updates, deletes.
Used with Tasks for incremental ingestion.


10. What is a Task?

Tasks schedule SQL execution (like cron jobs).
Used to automate pipelines, e.g. incremental loads with Streams.


11. Explain Snowpipe and how it works.

Snowpipe enables continuous data ingestion using:

  • Event notifications (S3, GCS, Azure)
  • COPY statements triggered automatically
    It loads data in micro-batches.

12. Role-Based Access Control (RBAC)

Snowflake follows hierarchical RBAC:

  • Users
  • Roles
  • Privileges
  • Objects
    Best practice: assign roles → roles → users (never direct privileges to users).

13. Transient Tables vs Temporary Tables.

  • Temporary Table → session-level, auto-deleted
  • Transient Table → permanent but no Fail-safe
    Used for staging tables to reduce cost.

14. Secure View vs Normal View.

  • View exposes underlying SQL and optimizations
  • Secure View hides underlying logic and prevents exposure of raw data
    Used for compliance and data sharing.

15. What is Query Profile?

A visualization tool that shows:

  • data scanned
  • partitions pruned
  • execution steps
  • bottlenecks
    Used for performance tuning.

16. What are Clustering Keys? When should you use them?

Snowflake auto-clusters by default, but clustering keys help when:

  • Large tables (>1B rows)
  • Poor pruning
  • Uneven access patterns
    Don’t overuse — clustering costs credits.

17. How does Snowflake charge customers?

Snowflake pricing includes:

  • Storage (TB/month)
  • Compute (credits per hour)
  • Cloud Services (small overhead)
  • Data Transfer (for cross-region)
    Compute is the major cost.

18. External Tables — where to use them?

Used when querying data directly from cloud storage without loading into Snowflake.
Useful for:

  • Data lakes
  • Raw logs
  • Large unstructured datasets

19. What is a Materialized View?

A Materialized View stores precomputed results.
Best for frequently-run queries.
Limitations:

  • Costly to maintain
  • Does not support all SQL (e.g., joins limited)
  • Not ideal for high-churn tables

20. Best practices for Snowflake performance tuning

  • Use proper warehouse size
  • Avoid SELECT *
  • Load semi-structured data using VARIANT
  • Use result cache wisely
  • Apply clustering only where needed
  • Partition pruning-aware SQL
  • Prefer streams + tasks over manual ETL logic

Leave a Reply

Your email address will not be published. Required fields are marked *