Snowflake is an insanely cool next generation SaaS data warehousing solution that operates in the cloud!
Engineered from the ground up, Snowflake takes advantage of the elasticity that the cloud provides – and is truly revolutionary in every aspect.
Harnessing the power of the cloud, Snowflake has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake – making its unique architecture and value proposition difficult to compete with in the market.
From an end user perspective, Snowflake is incredibly appealing. Building data warehouses and petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself and even more importantly, the analytics you derive from
In this course, you'll learn about the many distinguishing features that set Snowflake apart from its competitors.
For any feedback, queries, or suggestions relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Learn about Snowflake and how it can provision cloud-hosted data warehouses
- Learn how to administrate a Snowflake data warehouse
- Learn how to scale Snowflake data warehouses instantly and on-demand
- Learn how to use Snowflake to perform analytics on petabyte scale and beyond datasets
Intended Audience
- Anyone interested
in learning about Snowflake, and the benefits of using it to build a data warehouse in the cloud
Prerequisites
To get the most from this course, it would help to have a basic understanding of:
Basic Cloud and SaaS knowledge
- Basic DBA knowledge
- Basic SQL knowledge
Welcome back. In this lesson, I'll review some of the more important security-related features that Snowflake provides. Understanding these features and how they can be configured is important when it comes to ensuring your data remains protected and secured at all times. Let's begin. Security within Snowflake and across all of your data hosted within it is taken very seriously, as to be expected. It is implemented within and across the different layers that make up Snowflake, all of which when carefully used helps to ensure access to your cloud hosted data is always secure and granted to only those permitted.
Security starts at the network layer. Network connections can be managed and controlled by establishing one or several Snowflake network policies. A network policy consists of one or many allow listings, together with one or many deny listings. These listings can contain themselves either a single IP address or a range of IP addresses, all of which represent the sender end. Once the network policy is activated, these network policy roles will then be used to decide whether or not to accept incoming traffic to Snowflake. Addresses must be supplied using IPV4 notation and can involve cider notation when specifying blocks of IP addresses.
Network policies can be applied either at the account level or at the user level or both. To help improve security at the network layer, Snowflake provides integration options that remove the need to have Snowflake network traffic traverse the Internet. When running your Snowflake account on AWS or Azure, you can leverage private links to establish secure private connections between your on-prem networks and the Snowflake network. The security benefit of doing so is that network communications to and from Snowflake are now performed in private using the cloud service provider's backbone network. Meaning, Snowflake network communications no longer go out over the Internet.
Communications can then be routed over the likes of a direct connect connection when using AWS to your on-prem network as seen here. Note, this option is only available in the business critical edition of Snowflake. When it comes to authentication, Snowflake provides various options. Depending on your integration requirements, you can select from any of the following: Snowflake managed username password credentials. This option stores the user's password natively within the Snowflake user object. This option can be used in conjunction with MFA or Multi-Factor Authentication implemented within Snowflake using Duo, a popular third party provider of MFA.
Federated authentication against a SAML 2.0 compliant identity provider of your liking. For example, a DFS or Okta, OAuth 2.0 for delegated authorization, RSA-based key pairs using minimally a 2048 bit RSA key pair. The key pair is created locally with the public key being stored within the Snowflake user object. Snowflake supports key rotation in a manner that provides seamless uninterrupted access to Snowflake. And external browser. On a desktop system, the Snowflake driver, in this case is designed to throw open the system's default browser when user authentication is required. This is often used for SAML V2-based SSO authentication, but can be used with the built-in username password option if needed. This option expects end user participation and therefore only works with desktop client applications.
Once authenticated, access to various databases, schemas, tables, views, and other objects is granted through the use of both discretionary access controls and role-based access controls. Discretionary access controls are used to establish object ownership for whom can then grant access to the owned object to other users. Role-based access controls, on the other hand, are used to establish role-based permissions which are then in turn assigned to users. The ability to use and configure both discretionary access controls and role-based access controls, or RBAC, within your Snowflake account, ensures that you can model very granular access controls when required, enabling you to provide least privileged based access grants to just those who should have it.
When it comes to Snowflake security best practices, role-based access control-based permissions should be used, and therefore must be well understood to ensure that they are being used in the most effective way possible. As seen here, RBAC involves the following key concepts: Users: Users are established within your Snowflake account. When using RBAC, permissions are not assigned directly to users, instead they are assigned indirectly through the use of roles.
Roles: A role is an entity in which one or several allow or deny privileges are allocated. Privileges are specified against a named securable object. For example, a database, table, view, store procedure etc. Understanding this basic relationship between users, roles, privileges, and database objects will help you to establish the most effective and secure posture within your Snowflake account. Additionally, it's useful to know the following facts regarding RBAC. Users can be granted multiple roles. A default role can be defined for each user. At any time, users can choose and change the active role within their Snowflake session. Roles can be made hierarchical; that is, roles can be granted to other roles.
Privileges associated with a child role are inherited upwards by any role inheriting from it directly or indirectly. A securable object is owned by a role. Typically, the role that was first used to create the object in question. Users who are granted the same role that created an object take on the same permissions and ability to then control it. By default, the owning role gets all privileges on an object, including granting or revoking privileges on the same object to other roles. And finally, ownership can be transferred from one role to another. Each and every Snowflake account, once provisioned, starts out with a set of predefined roles established out of the box and for which ranging capabilities. All but the public role are to be considered powerful, and therefore should be restricted and assigned to only a minimal set of trusted users.
Starting out with the public role, this has minimal privileges, and by default gets automatically granted to all users and all other roles within your Snowflake account. Next up is the sysadmin role. This role is used to grant permissions for creating and managing other objects. Next is the securityadmin role. This role is focused on managing users, custom roles, and granting access to objects. Finally, at the top of the role hierarchy is the accountadmin role. For obvious reasons, this is the most powerful role and can be used to manage all objects within your Snowflake account, as well as view and manage billing and credit information. This role, when required, can be used to kill off rogue or long running SQL queries that are accumulating cost etc.
Now, to round out our understanding and comprehension of RBAC, which we now know is of central importance to establishing best practice access controls within Snowflake, consider the following made up scenario. As seen here, we have a single database named Cloud Academy partitioned into two schemas named Reporting and Prod. Two functional roles are established named reporting and analytics. The reporting role allows read only operations against the set of tables contained within the reporting schema. Whereas, the analytics role allows all credit operations, for example, SQL inserts, selects, updates, and deletes against all tables in the prod schema. And additionally, just reads in the reporting schema.
Also, both roles allow usage of an appropriately sized virtual warehouse required to actually execute the SQL queries. Finally, the reporting and analytics roles are then assigned out to various users such that each user ends up with a least privileged set of permissions which allows them to operate within the Snowflake account, accomplishing what their data job role requires them to do. In terms of Snowflake's security layers, the data itself is always encrypted end-to-end by default, both at rest and when in transit. This is done at no additional cost. TLS 1.2 connections are used to encrypt data during transit. While at rest, data encryption is performed using AES-256 bit encryption with options to perform client side encryption if required.
Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.
Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).