Introduction to Snowflake Cloud Data Warehouse
Many distinctive features make Snowflake, a cloud-native analytical data warehouse solution, incredibly
strong and set it apart from other cloud-native data warehouse solutions provided by major providers. The
multi-cluster shared data architecture that Snowflake features is based on allows it to provide a unified
and seamless data experience. It is constructed using the AWS, Azure, and Google cloud platforms and
combines the features of conventional databases with several fresh and innovative options.
Snowflake SaaS is a service that comes with a plethora of built-in features. Some of the major Snowflake
features are:
- Cloud agnostic being supported by all three dominant cloud providers – Azure, AWS, and GCP.
- It supports the auto-scaling of resources.
- Auto-resume/suspend clusters.
- Zero maintenance – users do not need to worry about automated updates, patches, and load balancing.
- Secure and encrypted data - Supports RBAC, MFA, and SSO and is certified by HIPAA and SOC 2.
In this article, we will explore some of the unique Snowflake features that make it an excellent choice for
cloud data warehousing.
Performance Management
Snowflakes delivers great performance without any manual performance optimization. This is an excellent
offering because many efforts are required to fine-tune traditional data warehouses' databases and objects.
Snowflake has almost eliminated that by the following features:
Workload Distribution
In traditional data warehouses, ETL/ELT processes are usually conducted during non-business hours when BI
reports or cubes are not refreshed to avoid concurrent workload.
In the case of Snowflake, concurrency is no longer an issue with its multi-cluster design. One of the
primary advantages of this design is the ability to separate workloads to be processed against their
computing clusters, referred to as a virtual warehouse. Refer below for its workload distribution
architecture:
Queries from one virtual warehouse do not affect those from another. Having separate virtual warehouses for
users and apps allows ETL/ELT processing, BI, and reporting to execute without competing for resources.
Snowflake Micro-partitions
Optimizing tables with indexes, partitions, statistics, etc., requires substantial expertise and high effort
and is also time-consuming. Snowflake almost makes this non-existent with its auto-optimization of the table
with one of the features called Micro-partitioning in Snowflake. Below is an illustration:
The table comprises 24 rows distributed across four micro-partitions, with the rows distributed evenly
across each micro-partition. The data is sorted and saved by column within each micro-partition, allowing
Snowflake to conduct the following actions for table queries:
- Remove any micro-partitions that are not required for the question.
- Inside the remaining micro-partitions, prune by column.
Snowflake automatically divides tables by grouping rows into individual micro-partitions of 50–500 MB of
data. Snowflake micro-partitions create more uniformly sized partitions that limit data skew and make it
easy to prune large tables at an extremely granular level. In other words, queries can use micro-partition
metadata to determine which partitions are relevant for a query so that only those are scanned. Snowflake
goes one step further by limiting partition scanning to only the columns filtered in a query.
Clustering in Snowflake
The performance of queries suffers when a table's data needs to be sorted. The Snowflake clustering key is
the subset of columns that can be utilized to co-locate data and sort data that is contained in tables. For
each micro-partition made during data loading, Snowflake collects clustering metadata. Then, to avoid
needless micro-partition scanning, the metadata is used. If queries take longer than intended for very big
tables, Snowflake clustering keys might be manually created.
Operational features
Snowflake offers a few features that reduce many development efforts needed for key operations that are
often required. These Snowflake features come in handy, especially in cases of urgency. Below are some of
the unique features in that regard:
Snowflake Zero Copy Clones
A handy and unique feature of Snowflake is Zero-copy cloning. In traditional data warehousing systems,
cloning data is very tedious because it requires an entirely new, distinct environment and putting data into
it to duplicate an existing database. It incurs a cost for more storage. It is impractical to use this for
ad hoc analysis often, creating different environments or checking modifications. With Snowflake's zero-copy
cloning feature, you may virtually immediately duplicate any database or table without making a new copy. It
accomplishes this by keeping track of changes made to the clone in its metadata store while using the
duplicate data files in the backend.
Sample code for cloning a table:
create table Orders_Clone clone Orders;
Snowflake SWAP Table Command
Swapping tables is a capability that is similar to cloning. This is very helpful for moving development
databases or tables into production. There are several ways to accomplish this, such as simply making a copy
of the development table and renaming it to the production table. However, Snowflake has a much simpler
method that involves swapping the table, which is very similar to cloning. In essence, all we are doing is
switching the metadata.
Sample code for swapping a table:
alter table if exists TAB1 swap with TAB2
Snowflake UNDROP Command
Deleting the wrong table is a nightmare for many developers. A great feature in Snowflake is the UNDROP
command that easily recovers from such errors. Otherwise, you must spend much time restoring your backup and
data when this happens. With the UNDROP command, Snowflake can restore objects instantly if you remain in
the recovery window.
Sample code for undropping a table-
undrop table TABLE_NAME;
Share and Collaborate
Snowflake offers a unique feature that allows data owners to share their data with partners or
other
consumers without creating new copies of their data. Data consumers only pay for data processing because no
data movement is involved, and its memory is unused. It also removes the dependency on Email or FTP
services.
Snowflake Data Ingestion Features
Snowflake does not just support data ingestion, but it provides powerful capabilities to build processes
that are otherwise very complex quickly. Below are some of these features:
Continuous Data Ingestion by Snowflake Snowpipe
Snowflake snowpipe, the data pipeline service, becomes very useful to load new data in micro-batches when it
becomes available in supported external storages. It removes the hurdle of loading data in larger batches.
Instead, changes are reflected in a few minutes. Snowflake snowpipe works by event-based triggers from
external stages to let it know that new files are available for import. These files are then copied to a
queue from which they are loaded into Snowflake. Snowflake snowpipe is also callable through its REST
endpoints, making it easier to use in custom applications.
Handling Semi-structured/Hierarchical Data
Using schema on a read data type called VARIANT, Snowflake's data architecture enables storing structured
and semi-structured data in the same location. Snowflake automatically parses the imported data, extracts
the characteristics, and saves the data in columnar format. Consequently, pipelines for data extraction are
no longer required.
VARIANT supports data import from JSON, Avro, ORC, or Parquet format; then Snowflake can build the hierarchy
for you and store it in a VARIANT. It can also create the hierarchy manually.
Snowflake CDC Streams
Snowflake offers a stream object feature that captures DML (inserts, updates, and deletes) changes made to a
table and then triggers an event based on that. This helps in querying and consuming changed data. This is
also called Snowflake change data capture.
Conclusion
Snowflake data warehouse’s cloud-native architecture, automatic scalability, and comprehensive security
measures make it a top solution for data warehousing. Its support for multiple data integration tools and
languages, including SQL, Python, and R, enables easy data analysis. With Snowflake’s powerful analytical
capabilities, organizations can derive valuable insights from their data and make informed decisions.
Overall, Snowflakes’ unique features make it a standout choice for businesses of all options. Explore
Snowflake today to unleash the full potential of your data and stay ahead of the competition.
Explore unique features of Snowflake Cloud Data Warehouse with our Snowflake
Service Partners for more insights and expertise. Keep Learning!