image
Snowflake Architecture
Start course
Difficulty
Beginner
Duration
1h 33m
Students
923
Ratings
4.5/5
Description

Snowflake is an insanely cool next generation SaaS data warehousing solution that operates in the cloud!

Engineered from the ground up, Snowflake takes advantage of the elasticity that the cloud provides – and is truly revolutionary in every aspect.

Harnessing the power of the cloud, Snowflake has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake – making its unique architecture and value proposition difficult to compete with in the market.

From an end user perspective, Snowflake is incredibly appealing. Building data warehouses and petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself and even more importantly, the analytics you derive from

In this course, you'll learn about the many distinguishing features that set Snowflake apart from its competitors.

For any feedback, queries, or suggestions relating to this course, please contact us at support@cloudacademy.com.

Learning Objectives

  • Learn about Snowflake and how it can provision cloud-hosted data warehouses
  • Learn how to administrate a Snowflake data warehouse
  • Learn how to scale Snowflake data warehouses instantly and on-demand
  • Learn how to use Snowflake to perform analytics on petabyte scale and beyond datasets

Intended Audience

  • Anyone interested in learning about Snowflake, and the benefits of using it to build a data warehouse in the cloud

Prerequisites

To get the most from this course, it would help to have a basic understanding of:

  • Basic Cloud and SaaS knowledge
  • Basic DBA knowledge
  • Basic SQL knowledge
Transcript

Welcome back. In this lesson, I'll dive deeper into each of the three key architectural layers of Snowflake. After this lesson, you'll be able to clearly articulate what each layer is responsible for and how they work in combination with each other. Let's begin. As previously mentioned, the Snowflake Architecture consists of three core layers; cloud services, query processing, and the database storage layer. When you create a new Snowflake account, Snowflake automatically sets up and deploys the cloud services layer. This acts as an orchestrator for the remainder of the Snowflake system that gets deployed into your chosen cloud provider platform.

In terms of management and access, all components deployed into this layer are managed solely by Snowflake. Access is not permitted nor granted to you, to any of the components within this layer, such as the infrastructure manager, optimizer, metadata manager etc. The cloud services layer is responsible for providing authentication and authorization controls. Beyond this, other core services are provided and collectively act as a central nervous system for the other parts of Snowflake. For example, this layer is responsible for generating and signing metadata, coordinating transactions, performing downstream infrastructure management, query planning and optimization, security management, and orchestrating client sessions.

Physically, this layer is deployed across one or several instances, spread across multiple regions within the chosen cloud provider. Thereby providing availability and protection from cloud or regional events. Again, unfortunately, access to this layer is not given. This is for the protection and integrity of the Snowflake system. Down the stack, the query processing layer provides the raw compute and power for all data management operations and is exposed to the end user as configurable virtual warehouses. This layer provides the end user with an option for spinning up on demand compute clusters as and when required.

Equally important, virtual warehouses can be suspended either manually or automatically after a configurable period of in activity, thereby ensuring that you're not incurring expense for idle compute resource. Virtual warehouses come in a set of predetermined sizes, with each consecutive size basically doubling its resources and processing capabilities. Launch time for a virtual warehouse typically happens in sub-second timing. It is very quick. Once up and running, billing begins on a per-second basis, with a minimum of 60 seconds bill time per launch or resume.

Each federal cluster runs independently of any other one, useful for preventing query contention; that is, different queries can be assigned out to different federal clusters based on the execution requirements. All together, this type of system flexibility within Snowflake can accommodate, for example, Joe from finance, executing his long-running end of month reports without impacting Bob from the DevOps team, who needs to run various quickfire ad hoc queries frequently. In terms of data persistence, this is achieved within the database storage layer, which has been optimized for read operations.

All data imported into your Snowflake account will be stored within the data storage layer. Here, the data is persisted within an internal optimized compressed columnar format. This micro partition format is proprietary to Snowflake, and therefore can't be read directly. It can only be accessed through a virtual warehouse. Depending on the cloud platform option chosen during account provisioning time, these so-called micro partitions are stored on either Amazon S3 if AWS was chosen, Google Cloud Storage if GCP was chosen, or Microsoft Azure Blob Storage if Azure was chosen. The availability and durability built directly within these cloud platform storage services is then passed on through to Snowflake, meaning your data is always highly durable and available.

About the Author
Students
142970
Labs
69
Courses
109
Learning Paths
209

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).