Top 10 Snowflake interview questions

Q. What are the essential features of Snowflake?
Snowflake has the following key features:

With Snowflake, you can interact with the data cloud through a web interface. Users can navigate the web GUI to control their accounts, monitor resources, and monitor resources and system usage queries data, etc.
Users can connect to Snowflake’s data cloud using a wide range of client connectors and drivers. Among these connectors are Python Connector (an interface for writing Python applications to connect to Snowflake), Spark connector, NodeJS driver, .NET driver, JBDC driver for Java development, ODBC driver for C or C++ programming, etc.
The core architecture of Snowflake enables it to operate on the public cloud, where it uses virtualized computing instances and efficient storage buckets for processing huge amounts of big data cost-effectively and scalable.
Snowflake integrates with a number of big data tools, including business intelligence, machine learning, data integration, security, and governance tools.
With advanced features such as simplicity, increased performance, high concurrency, and profitability, Snowflake is incomparable to other traditional data warehouse solutions.
Snowflake supports the storage of both structured and semi-structured data (such as JSON, Avro, ORC, Parquet, and XML data).
Snowflake automates cloud data management, security, governance, availability, and data resilience, resulting in reduced costs, no downtime, and better operational efficiency.
With it, users can rapidly query data from a database, without having an impact on the underlying dataset. This allows them to receive data closer to real-time.
Most DDL (Data Definition Language) and DML (Data Manipulation Language) commands in SQL are supported by the Snowflake data warehouse. Additionally, advanced DML, lateral views, transactions, stored procedures, etc., are also supported.

Q. Explain Snowflake Architecture?.

The Snowflake architecture is a hybrid of shared-disk (A common disk or storage device is shared by all computing nodes) and shared-nothing (Each computing node has a private memory and storage space) database architecture in order to combine the best of both. Snowflake utilizes a central data repository for persistent data, which is available to all compute nodes similar to a shared-disk architecture. But, equally, as with shared-nothing architectures, Snowflake uses massively parallel computing (MPP) clusters for query processing, in which each node stores part of the whole data set locally.
The Snowflake architecture is divided into three key layers as shown below:

Database Storage Layer: Once data has been loaded into Snowflake, this layer reorganizes that data into a specific format like columnar, compressed, and optimized format. The optimized data is stored in cloud storage.
Query Processing Layer: In the processing layer, queries are executed using virtual warehouses. Virtual warehouses are independent MPP (Massively Parallel Processing) compute clusters comprised of multiple compute nodes that Snowflake allocates from cloud providers. Due to the fact that virtual warehouses do not share their compute resources with each other, their performance is independent of each other.
Cloud Services Layer: It provides services to administer and manage a Snowflake data cloud, such as access control, authentication, metadata management, infrastructure management, query parsing, optimization, and many more.

Q. What do you mean by virtual warehouse?
A virtual warehouse is basically a collection of computing resources (like CPU, memory, Solid state drive, etc.) that customers can access to run queries, load data, and perform other DML (Data Manipulation Language) and SQL (Structured Query Language) operations.

For example, it provides memory, temporary storage, and CPU resources that can be used for DML operations and SQL execution. You can use this independent compute cluster at any time and then turn it off when not needed. You are charged (paid) for each virtual warehouse you run, their size, and how long they run. Virtual warehouses do not share their compute resources with each other, and therefore, their performance is independent of each other.

Q. Is snowflake OLTP (Online Transactional Processing) or OLAP (Online Analytical Processing)?
Snowflake is developed as an OLAP (Online Analytical Processing) database system, not as an OLTP (Online Transaction Processing) database system. In OLTP (Online Transaction Processing), data is collected, stored, and processed from real-time transactions, but in OLAP (Online Analytical Processing), complex queries are used to evaluate aggregated historical data from OLTP systems. Snowflake is not designed to handle much updating and inserting of small amounts of data like a transactional database would. Snowflake, for instance, cannot handle referential integrity because, even though it supports integrity and other constraints, they are not enforced except the NOT NULL constraint that is always enforced. Other constraints than NOT NULL are created as disabled constraints. However, depending on the use, we may also use it for online transaction processing (OLTP).

Q. Snowflake is what kind of database?
All Snowflake’s features are built on top of SQL (Structured Query Language) databases. The data in this relational database system is stored in columns and it is compatible with other tools, including Excel and Tableau. As a SQL database, Snowflake contains a query tool, supports multi-statement transactions, provides role-based security, etc.

Q. Explain in short about Snowflake Clustering?.
In Snowflake, clustering is a type of data partitioning, where unique cluster keys are specified for each table. Cluster keys are subsets of a table’s columns that are used to co-locate data within the table. These keys are appropriate for comprehensive tables. The process of managing clustered data in a table is known as re-clustering.

Q. Explain what is Snowflake Time travel and Data Retention Period?.
Time travel is a Snowflake feature that gives you access to historical data present in the Snowflake data warehouse. For example, suppose you accidentally delete a table named Employee. Using time travel, it is possible to go back five minutes in time to retrieve the data you lost. Data that has been altered or deleted can be accessed via Snowflake Time Travel at any point within a defined period. It is capable of performing the following tasks within a specific/defined period of time:

Analyzing data manipulations and usage over a specified period of time.
Restoring data-related objects (tables, schemas, and databases) that are accidentally lost (dropped)/
Backup and duplication of data (clones) at or before specific points in the past.
As soon as the defined/specific period of time (data retention period) expires, the data moves into Snowflake Fail-safe and these actions/tasks cannot be performed.

Q. What is Data Retention Period in Snowflake?
The data retention period is a critical component of Snowflake Time Travel. When data in a table is modified, such as when data is deleted or objects containing data are removed, Snowflake preserves the state of that data before it was updated. Data retention specifies how many days historical data will be preserved, enabling Time Travel operations (SELECT, CREATE, CLONE, UNDROP, etc.) to be performed on it.

All Snowflake accounts have a default retention period of 1 day (24 hours). By default, the data retention period for standard objectives is 1 day, while for enterprise editions and higher accounts, it is 0 to 90 days.

Q. Explain what is fail-safe?.
Snowflake offers a default 7-day period during which historical data can be retrieved as a fail-safe feature. Following the expiration of the Time Travel data retention period, the fail-safe default period begins. Data recovery through fail-safe is performed under best-effort conditions, and only after all other recovery options have been exhausted. Snowflake may use it to recover data that has been lost or damaged due to extreme operational failures. It may take several hours to several days for Fail-safe to complete data recovery.

Q. Can you explain how Snowflake differs from AWS (Amazon Web Service)?
Cloud-based data warehouse platforms like Snowflake and Amazon Redshift provide excellent performance, scalability, and business intelligence tools. In terms of core functionality, both platforms provide similar capabilities, such as relational management, security, scalability, cost efficiency, etc. There are, however, several differences between them, such as pricing, user experience and deployment options.

There is no maintenance required with Snowflake as it is a complete SaaS (Software as a Service) offering. In contrast, AWS Redshift clusters require manual maintenance.
The Snowflake security model uses always-on encryption to enforce strict security checks, while Redshift uses a flexible, customizable approach.
Storage and computation in Snowflake are completely independent, meaning the storage costs are approximately the same as those in S3. In contrast, AWS bypasses this problem with a Red Shift spectrum and lets you query data that is directly available in S3. Despite this, it is not flawless like Snowflake.

Q. Could AWS glue connect to Snowflake?
Yes, you can connect the Snowflake to AWS glue. AWS glue fits seamlessly into Snowflake as a data warehouse service and presents a comprehensive managed environment. Combining these two solutions makes data ingestion and transformation easier and more flexible.

Q. Explain how data compression works in Snowflake and write its advantages?.
An important aspect of data compression is the encoding, restructuring, or other modifications necessary to minimize its size. As soon as we input data into Snowflake, it is systematically compacted (compressed). Compressing and storing the data in Snowflake is achieved through modern data compression algorithms. What makes snowflake so great is that it charges customers by the size of their data after compression, not by the exact data.

Snowflake Compression has the following advantages:

Compression lowers storage costs compared with original cloud storage.
On-disk caches do not incur storage costs.
In general, data sharing and cloning involve no storage expenses.

Q. What ETL tools are you using in Snowflake?
Some of the best ETL tools of Snowflake’s are:

Etleap
Blendo
Matillion
Hevo Data
StreamSets
Apache Airflow

Q. What programming languages are supported in Snowflake?
Snowflake supports various programming languages such as Go, C, . NET, Java, Python, Node.js, etc.