image
Snowpipe

Contents

Introduction
1
Course Intro
PREVIEW1m 46s
2
Snowflake Intro
PREVIEW8m 43s
Architecture
Snowflake
4
6
8
Security
9m 43s
9
Pricing
6m 39s
11
12
Snowpipe
4m 23s
Summary
13
Start course
Difficulty
Beginner
Duration
1h 33m
Students
681
Ratings
4.4/5
starstarstarstarstar-half
Description

Snowflake is an insanely cool next generation SaaS data warehousing solution that operates in the cloud!

Engineered from the ground up, Snowflake takes advantage of the elasticity that the cloud provides – and is truly revolutionary in every aspect.

Harnessing the power of the cloud, Snowflake has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake – making its unique architecture and value proposition difficult to compete with in the market.

From an end user perspective, Snowflake is incredibly appealing. Building data warehouses and petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself and even more importantly, the analytics you derive from

In this course, you'll learn about the many distinguishing features that set Snowflake apart from its competitors.

For any feedback, queries, or suggestions relating to this course, please contact us at support@cloudacademy.com.

Learning Objectives

  • Learn about Snowflake and how it can provision cloud-hosted data warehouses
  • Learn how to administrate a Snowflake data warehouse
  • Learn how to scale Snowflake data warehouses instantly and on-demand
  • Learn how to use Snowflake to perform analytics on petabyte scale and beyond datasets

Intended Audience

  • Anyone interested in learning about Snowflake, and the benefits of using it to build a data warehouse in the cloud

Prerequisites

To get the most from this course, it would help to have a basic understanding of:

  • Basic Cloud and SaaS knowledge
  • Basic DBA knowledge
  • Basic SQL knowledge
Transcript

Welcome back. In this lesson, I'll provide a quick review of Snowpipe and its intended use as a tool for continuous data ingestion. Let's begin. Snowpipe is a cloud-hosted service that operates within your Snowflake account. When in flight, it continuously fetches data from a pre-configured data source. If the data source is cloud-hosted, for example, either Amazon S3, Microsoft Azure Blob storage, or Google Cloud Storage, then Snowpipe ties into the event notification capabilities of that particular storage service. With this type of eventing in place, it is literally just a matter of minutes before that data becomes available as table hosted data within Snowflake, ready to be queried upon.

Snowpipe under the hood is a serverless component which is operated by Snowflake, but for which has a running cost associated with it, which you must pay for if you're using it. Charging is based on the amount of actual compute resource utilization involved in any data loading pipes that you have configured. Billing is tracked at a per second and per core granularity and is very economical for continuous data loading jobs. Snowpipe provides two types of integration, the first of which is seen here. This type of integration simply relies on the event notifications that cloud storage services such as Amazon S3 provide.

In this particular case, Snowpipe is simply configured to trigger on Amazon SQS notifications. Configuring this type of integration is simple and quick and can be completed typically within 5-10 minutes. In the background, Snowflake operates and maintains a fleet of servers which asynchronously gather the data and distribute it into Snowflake target tables. A key part of this design is that the operational server fleet is completely abstracted away from you, meaning your involvement with Snowpipe is purely configuration-based.

The second type of integration facilitates more advanced requirements, often those that involve a custom on-prem hosted data source. The Snowpipe integration option is shown here, provides a REST API endpoint that allows you to push notification messages to it. Declaring that you have dropped some new files into a cloud storage location, such as an Amazon S3 bucket. On the receiving side, the serverless loader internal to the Snowpipe service reacts and goes out to the cloud storage location and retrieves the new data files. This type of integration will typically require a little bit of custom code or scripting to coordinate the REST API calls in the placement of the data files. The good thing about this type of integration is that it does cover you for any obscure and/or edge case data sources that you may have and for which you need to continuously publish data into, and then to have it also be received by Snowflake.

Central to Snowpipe and how it is applied within a worksheet within Snowflake is the concept of a pipe. Creating a pipe is as simple as running a Snowflake create pipe SQL statement. A pipe definition simply wraps around a copy statement. In the example shown here, a Snowpipe is established by specifying the ARN of an AWS SNS topic, which hasn't turned behind the scenes being configured to receive notifications when data files are placed in a similarly named S3 bucket. Following on, the create pipe statement includes a copy command to specify the destination location of where the data should end up. In this case, the data is expected to be saved in the Cloud Academy skills table via a stage named Skills Stage. Behind the scene, Snowpipe uses a combination of the file name in a checksum performed over the file to determine and track new files, and therefore ensure that only new data gets loaded in.

About the Author
Students
132607
Labs
68
Courses
112
Learning Paths
183

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).