Snowflake is an insanely cool next generation SaaS data warehousing solution that operates in the cloud!
Engineered from the ground up, Snowflake takes advantage of the elasticity that the cloud provides – and is truly revolutionary in every aspect.
Harnessing the power of the cloud, Snowflake has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake – making its unique architecture and value proposition difficult to compete with in the market.
From an end user perspective, Snowflake is incredibly appealing. Building data warehouses and petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself and even more importantly, the analytics you derive from
In this course, you'll learn about the many distinguishing features that set Snowflake apart from its competitors.
For any feedback, queries, or suggestions relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Learn about Snowflake and how it can provision cloud-hosted data warehouses
- Learn how to administrate a Snowflake data warehouse
- Learn how to scale Snowflake data warehouses instantly and on-demand
- Learn how to use Snowflake to perform analytics on petabyte scale and beyond datasets
Intended Audience
- Anyone interested
in learning about Snowflake, and the benefits of using it to build a data warehouse in the cloud
Prerequisites
To get the most from this course, it would help to have a basic understanding of:
Basic Cloud and SaaS knowledge
- Basic DBA knowledge
- Basic SQL knowledge
Welcome back. In this lesson, I'll review and focus in on the concept of a Snowflake stage and how they are used to facilitate the process of data loading. Snowflake provides several different stage types, and knowing which one to use for the right use case will help you to correctly and efficiently load data into your Snowflake environment. Let's begin.
When it comes to moving data into and out of Snowflake, you'll need to become familiar with the concept of staging. Staging is the process of preparing data to be moved into and out of Snowflake. To help with this process, Snowflake utilizes the concept of a stage. A stage in Snowflake is an intermediate space where data files are stored. Later on, you use the Snowflake SQL copy command to reference the stage by name, such that when the copy command is executed, data stored within the stage hosted files is pulled in, passed, and processed into a named Snowflake table.
It's also worth noting that stages can be used for the inverse data flow, that is, when requiring to export data out of Snowflake. Snowflake provides several different stage types. Generally speaking, if your data to be loaded is stored in cloud storage, then an external stage is used. Alternatively, if your data is located within Snowflake itself, then an internal stage is used. Within snowflake, stages are objects and as such are created within a schemer inside a database and are associated with a role.
External stages are storage locations external to Snowflake. Potentially, in another cloud storage location, for example, Amazon S3, Microsoft Azure Blob Storage, or GCP Cloud Storage. Establishing an external stage requires you to reference the external cloud storage location, acting kind of like a some link. As you might expect, internal stages are exactly that internal; they are located within Snowflake itself. Snowflake provides several different internal stage types, user stages, table stages, and internal name stages. Let's now review each of these individually.
User stages are allocated by default to each and every Snowflake user. A user stage is unique to the user. Users can only access their own stage and not anyone else's. User stage objects cannot be altered or dropped. This stage is a convenient option if your files will only be accessed by a single user but need to be copied into multiple tables. For this staging option to be useful when performing table inserts, the Snowflake user must have insert rights on the table itself.
Table stages are allocated by default, each and every created table object. This type of stage is purposely useful for loading data into the implicitly connected table. Table Stages share the same name as the table, and again, cannot be altered or dropped. This stage is a convenient option if your files need to be accessible to multiple users and only need to be copied into a single table. To be able to stage files in a table stage, you must be the table owner.
Internal name stages are named database objects that provide the greatest degree of flexibility for data loading, but require more effort to set up and manage. If your data loading requirement cannot be fulfilled by either user or table stages, then consider using an internal name stage. Unlike user in table stages, internal name stages are not created automatically, instead, you must create them manually. In doing so, they become database objects that can have security permissions applied to them. In the example shown here, the put command is used to first load two data files hosted on the local file system into a user stage. The copy command is next used to move the staging data into respective tables. Later on, data processing of sorts has performed across both tables, resulting in the population of a third table.
Another copy command is used to export the table data out into a table stage. And finally, a get command is used to export the table stage data out onto the local file system. This type of setup is possible because the user performing this has insert permissions on the first two tables, and the data being unloaded out of the final table only needs to go to one location. The create stage command is displayed here, demonstrates how to create an external stage for an AWS S3 bucket. Within this example, AWS credentials are used to provide authenticated access to the S3 bucket. The create stage command is displayed here, demonstrates how to create an internal name stage using the default CSV file format and service side encryption.
When working with internal stages, data is loaded into an internal stage using the put command, after which data can then be loaded into the table using the copy into command. Similarly, if we need to unload the data from a table, it has to be first loaded into a staging table using the copy into command. And then from the staging table, it can be exported using the get command. Note; both the put and get commands are used with internal stages only. They are not used or required when working with external stages.
Snowflake supports multiple file formats for loading data, including: CSV, JSON, AVRO, ORC, parquet, and XML. With your data now staged, the various copy into commands displayed on both this slide and the next slide demonstrate the various options available to write the data into a Snowflake database table. And this slide, copy into commands, are provided for CSV, JSON, and XML file formats. Likewise, on this slide, copy into commands are provided for AVRO, ORC, and Parquet file formats. To summarize, when it comes to data loading and dealing with various file formats and semi-structured and structured data, Snowflake has really done an amazing job, providing you with flexible, easy to use but powerful features for moving data into and out of Snowflake.
Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.
Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).