Snowflake is an insanely cool next generation SaaS data warehousing solution that operates in the cloud!
Engineered from the ground up, Snowflake takes advantage of the elasticity that the cloud provides – and is truly revolutionary in every aspect.
Harnessing the power of the cloud, Snowflake has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake – making its unique architecture and value proposition difficult to compete with in the market.
From an end user perspective, Snowflake is incredibly appealing. Building data warehouses and petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself and even more importantly, the analytics you derive from
In this course, you'll learn about the many distinguishing features that set Snowflake apart from its competitors.
For any feedback, queries, or suggestions relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Learn about Snowflake and how it can provision cloud-hosted data warehouses
- Learn how to administrate a Snowflake data warehouse
- Learn how to scale Snowflake data warehouses instantly and on-demand
- Learn how to use Snowflake to perform analytics on petabyte scale and beyond datasets
Intended Audience
- Anyone interested
in learning about Snowflake, and the benefits of using it to build a data warehouse in the cloud
Prerequisites
To get the most from this course, it would help to have a basic understanding of:
Basic Cloud and SaaS knowledge
- Basic DBA knowledge
- Basic SQL knowledge
Welcome back. In this lesson, I'll provide you with a high level overview of Snowflake, highlighting its most important features which differentiate it from its competitors. After this lesson, you'll be able to answer questions like what is Snowflake? What is it useful for and where does it excel? And a simplified version of how does it work? Let's begin.
Snowflake is an insanely cool next generation SaaS, or Software as a Service, data warehousing solution that operates in the cloud. Engineered from the ground up, it takes advantage of the elasticity that the cloud provides and is truly revolutionary in every aspect. Harnessing the power of the cloud, it has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake, making its unique architecture and value proposition difficult to compete with in the wider market.
From an end user perspective, Snowflake is incredibly appealing, building data warehouses in petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself, and even more importantly, the analytics you derive from it. As you'll soon see, Snowflake's architecture decouples compute from storage, both of which run in the cloud elastically and both of which can be scaled up independently. This type of architecture will more than likely accommodate the scaling needs for any business venturing into the data warehouse space.
New Snowflake accounts can be configured to run on any of the top three public cloud providers: AWS, Microsoft Azure, and Google Cloud Platform. Let's now explore some of the reasons that might make you consider Snowflake for your business. Snowflake is super easy to learn and use, with an almost zero admin footprint. Certainly from an infrastructure perspective, there is no requirement for you to worry about infrastructure issues such as installing storage disks, configuring them in a raid pattern, etc. Instead, since Snowflake is built on the cloud, it just automatically scales out storage as and when required. In fact, all compute and storage requirements are taken care of behind the scenes by Snowflake, meaning you don't need to be involved nor should be involved in maintaining the underlying hardware.
When it comes to speed, Snowflake has high octane performance. Regardless of how much data is fed into Snowflake, query time performance remains at the top end. Query times are unnaturally fast and when required can be made even faster by dialing up the speed. This is accomplished by using virtual warehouses. A virtual warehouse is basically a cluster of compute resources which have access to the same data held within the Snowflake data storage layer. Multiple virtual warehouses of differing performance can be provisioned within Snowflake.
Each virtual warehouse can then be assigned a particular user or team within the business, each running independently of the others. This style of configuration and setup ensures that the right level of processing performance is always available as and when required. Virtual warehouses also enable simple and complex SQL queries to be processed concurrently, executing independently of each other but on the same data, enabling different users in use cases to query the data. Administration of Snowflake itself is typically performed through its simple and intuitive web based admin console.
Having established a Snowflake database, you can then easily share parts of it with other Snowflake users, even external to your own Snowflake account. Likewise, this also means that you too can import data from externally shared Snowflake data sources. This type of data sharing is more than useful for collaboration purposes. Finally, Snowflake has been purposely designed to ingest processed and host data of varying formats in both structured and semi-structured form. For example, it's not uncommon to have tabular data stored alongside JSON data or alongside Parquet data, all within the same SQL table. This then enables you as a data analyst to run SQL queries across all of the different data formats at the same time.
Snowflake, being provided as a SaaS solution, provides differing methods of connectivity. The default method of connection, if it can be called that, is via the Snowflake web admin console. When you provision a Snowflake account, you will be assigned your own unique Snowflake account specific URL. Browsing to and authenticating against this URL will provide you with access into your Snowflake account. Snowflake authentication and authorization is handled within Snowflake by the cloud services layer. The cloud services layer provides other important runtime features being the entry point into the entire Snowflake system.
Connectivity to Snowflake can also be accomplished using the SnowSQL command line client. This Python-based client allows you to execute both SQL, DDL, and DML type queries directly from within your terminal. It can also be used to bulk upload datasets. The SnowSQL command line client provides autocompletion and navigation, making it super easy to navigate quickly the data structures and schemers that you have built and hosted within your Snowflake databases. When it comes to connectivity, Snowflake also provides several ODBC, JDBC, and native drivers, allowing you to integrate against the data sources that you host within Snowflake from external applications, whether they be custom built applications or third party systems, such as Apache Spark, Data Bricks, Splunk, Tableau, Quicksight etc.
As earlier hinted, Snowflake uses a layered architecture to provide its services to you. The layering is made up of the following three key layers. Starting at the top, we have the cloud services layer. Next down the stack is the query processing layer, and at the bottom of the stack we have the database storage layer. In the next lesson, I'll explore each of these three layers in greater detail, but for now, the key points behind Snowflake's unique cloud-based architecture are: It has independent scalable compute and storage layers, the cloud services layer acts as the orchestrator, the query processing layer provides on demand query processing compute in the form of virtual warehouses, and the database storage layer is a centralized shared storage layer available to all virtual warehouses simultaneously.
Now it should be emphasized that Snowflake is a data warehouse solution; that is, it is optimally designed and architected for read queries on data that doesn't change often. Understanding this is central to using its capabilities optimally. Attempting to use Snowflake as a transactional or OLTP type database, would be unwise and is strongly discouraged since performance and costing would definitely work against you and not for you. On the other hand, using it as a data warehouse solution is definitely hitting the sweet spot. This is what it is designed for.
Snowflake provides best inbreed features for data warehousing your data, some of which I've already mentioned, others of which are shown here. There are almost too many good reasons to not consider Snowflake as the first choice tool for your data warehousing needs. Alright, this now completes this quickfire Snowflake introductory lesson. In this lesson, hopefully, by now, you've started to pick up on the buzz surrounding Snowflake and are ready to dive deeper.
Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.
Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).