Snowflake is an insanely cool next generation SaaS data warehousing solution that operates in the cloud!
Engineered from the ground up, Snowflake takes advantage of the elasticity that the cloud provides – and is truly revolutionary in every aspect.
Harnessing the power of the cloud, Snowflake has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake – making its unique architecture and value proposition difficult to compete with in the market.
From an end user perspective, Snowflake is incredibly appealing. Building data warehouses and petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself and even more importantly, the analytics you derive from
In this course, you'll learn about the many distinguishing features that set Snowflake apart from its competitors.
For any feedback, queries, or suggestions relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Learn about Snowflake and how it can provision cloud-hosted data warehouses
- Learn how to administrate a Snowflake data warehouse
- Learn how to scale Snowflake data warehouses instantly and on-demand
- Learn how to use Snowflake to perform analytics on petabyte scale and beyond datasets
Intended Audience
- Anyone interested
in learning about Snowflake, and the benefits of using it to build a data warehouse in the cloud
Prerequisites
To get the most from this course, it would help to have a basic understanding of:
Basic Cloud and SaaS knowledge
- Basic DBA knowledge
- Basic SQL knowledge
Welcome back. In this lesson, I'll review each of the key database objects that are used to store and organize your data once ingested into Snowflake. Having a good understanding of each of these helps to ensure that your data is secured the way you need it to be and optimally structured for fast and performant queries or data retrieval. Let's begin. From the top-down, Snowflake uses the following concepts to organize your data. Databases. A database is the top level container. Within a single Snowflake account, multiple databases can be created. Each of the following objects are then created within a specific database.
Schemas. A database schema is used to petition of various tables and views within a single database. Read and write permissions can be set on various schemas, and these can then be inherited down across all contained tables and all views.
Tables. Tables are used to store actual data. Within Snowflake, there are different types of tables such as permanent, transient, temporary, and external. I'll go into these in more detail as we continue along.
And views. Views are SQL select queries that return a subset of table data. There are several different view types such as non-materialized views or otherwise known as regular views, materialized views, and secure views. I'll dive deeper into these different view types later on in this lesson.
For now, consider that once your data has been correctly ingested and organized within Snowflake, various SQL queries can then be created using dot notation to specify either the database schema and/or table or view name, allowing you to focus in on and target specific subsets of your data hosted within Snowflake. As mentioned previously, when it comes to working with tables, Snowflake provides various table types for different use cases. Tables are database objects which hold data, and understanding which Snowflake table type to use and when is important to balance out performance, running cost, and data availability, etc. The matrix displayed here, documents, the various table type characteristics. Let's now review each of the different table types and compare their features.
A permanent table is the default type of table created within Snowflake when using the create table SQL statement and without other table modifier keywords. The example provided here on this slide creates a permanent table to store course-related data. Permanent tables are characterized by having the following features. They either have 0-1 or 0-90 days of time travel depending on the version of Snowflake you are using, standard or enterprise respectively. I'll review time travel in a separate lesson later on. They also come with seven days of failsafe access. And in terms of cloning support, you can clone a permanent table into another permanent or transient or temporary table.
Transient tables are similar to permanent tables, with the key difference being that they have a shorter time travel period 0-1 days, and that they do not come with support for failsafe. Transient tables persist until explicitly dropped and can be accessed by all users who have been assigned the necessary privileges. Transient tables do not incur the same failsafe expense that permanent tables incur, therefore they are moderately cheaper to operate and run.
Cloning operations are permitted on transient tables such that you can create another temporary or transient table as a clone. However, you cannot create a permanent table as a clone. Temporary tables are purposely designed and used to store non-permanent data and for which only persist during the current session. Once the user's current session has ended, all data held within the temporary table is removed and unrecoverable. Temporary tables have 0-1 days worth of time travel and no failsafe.
A temporary table can be cloned into another temporary or transient table, but not into a permanent table, as per the same transient table restriction just mentioned. The last table type to review is the external table type. External tables are used to pull in and store data that originates externally to the Snowflake system. To populate an external table, you must first set up and establish a Snowflake stage. I'll document stages later in the course within the data loading lesson. But, for now, just consider staging as a component of the data loading process, helping you to get your external data into Snowflake.
External tables store file level metadata about the data files themselves, enabling querying of data stored in files in an external stage as if it were inside a database. External tables permit only read operations on the data held within them. Deletes and updates are not permitted. However, external table joins two other tables for querying purposes are permitted. Finally, external tables do not support time travel, failsafe, and cloning.
Moving onto views, Snowflake supports several different types of views; regular, materialized, and secure. Each different type of view has its own feature set, which again, should be understood so that the right type of view is chosen for the right type of data requirement. A regular view or non-materialized view is the default type of view used often within Snowflake. When querying a regular view, the results derived are not cached for future use. Therefore, from a performance point of view, they are slower when compared with materialized views, which do consequently cache data, as you'll hear of more in the next slide. In the example provided here, the top 10 courses view performs a table join between the courses table and the authors table to pull up the top 10 by votes, courses. Each time this view is queried upon, the underlying tables will be rescanned.
Materialized views are only available in the enterprise edition of Snowflake. They are not available in the standard edition. The main distinguishing feature of a materialized view is that it is a pre-computed view. What this means is that, periodically, the underlying data from the real tables is queried and computed and then cached, so that the materialized view itself gains a performance boost for reads. This type of view is highly desirable when performing complex and expensive queries on large tables in your Snowflake data warehouse, often a requirement when running enterprise, business intelligence reports etc.
Materialized views require additional storage space and active maintenance, and therefore incur additional running costs; something to consider. The example shown here shows how to create a materialized view for the top 10 courses. The final view is the so-called secure view. Secure views are used to address and implement security and privacy needs, upping the level of control over possible exposure of either the schema or data that the secure view sits across.
Creating a secured view helps you to safeguard and lock down the internal data structures from those that shouldn't be able to access them. In the example provided here, a secure view named author is created. This view can only return data about the currently logged in user as per the where clause, which filters on the current user. As earlier mentioned, Snowflake can ingest both structured and semi-structured data. This is ideal for data sets that are formatted in not only tabular form or CSV form, but also for those that are stored in formats, such as JSON, Avro, ORC being Optimized Row Columnar, Parquet, and/or XML. Formats for which are often used for delivery of data generated by potentially IoT devices, mobile apps, and/or APIs.
Regardless of format type, Snowflake has the capabilities to ingest and make the collective data available for deriving business insights and intel. Now, when it comes to storing semi-structured data, Snowflake provides the variant data type together with the object and array data types. The variant data type is a kind of catchall universal data type, meaning that it can hold and store values of just about anything, albeit with an upper limit of 16 megabytes uncompressed in terms of sizing. In the example provided here, the courses table contains a course column of type variant. The following insert statement is used to populate the course column with JSON-based data which is first parsed using the parse_json function, which takes a JSON formatted string and then returns a JSON document, which is stored internally as a variant.
Executing a select statement on the variant stored data returns it accordingly as shown here. The object data type can be used to store collections of key value pairs, where the provided key must be a non-empty string and its mapped value is a value of variant type. In the example provided here, the courses table contains a lessons column of type object. The following insert statement is used to populate the lessons column with two lessons L1 and L2. Data representing the lessons is then provided.
All together, the entire data structure is provided as a JSON string, which is then passed into a JSON document using the parse_json function. Again, when reading the data back out of the courses table, the lessons column is returned in JSON format, displaying the collection of key value pairs held within. The final data type that helps with storing semi-structured data is the array data type. The array data type can be used to store list-like indexed data, consisting of variant values. In the example provided here, the courses table contains a lessons column of type array. The following three insert statements demonstrate how to populate the lessons column with various data types; integers, strings, and JSON respectively. Performing a select across the courses table with the array-based data previously populated returns in the following format as displayed here.
Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.
Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).