Snowflake is an insanely cool next generation SaaS data warehousing solution that operates in the cloud!
Engineered from the ground up, Snowflake takes advantage of the elasticity that the cloud provides – and is truly revolutionary in every aspect.
Harnessing the power of the cloud, Snowflake has unique capabilities in the form of unlimited and instant scalability, making it perhaps the ultimate data warehouse solution. Cloud elasticity is very much at the heart of Snowflake – making its unique architecture and value proposition difficult to compete with in the market.
From an end user perspective, Snowflake is incredibly appealing. Building data warehouses and petabyte data scaled solutions without having to worry about on-prem compute and storage issues means your focus remains solely on the data itself and even more importantly, the analytics you derive from
In this course, you'll learn about the many distinguishing features that set Snowflake apart from its competitors.
For any feedback, queries, or suggestions relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Learn about Snowflake and how it can provision cloud-hosted data warehouses
- Learn how to administrate a Snowflake data warehouse
- Learn how to scale Snowflake data warehouses instantly and on-demand
- Learn how to use Snowflake to perform analytics on petabyte scale and beyond datasets
Intended Audience
- Anyone interested
in learning about Snowflake, and the benefits of using it to build a data warehouse in the cloud
Prerequisites
To get the most from this course, it would help to have a basic understanding of:
Basic Cloud and SaaS knowledge
- Basic DBA knowledge
- Basic SQL knowledge
Welcome back. In this lesson, I'll introduce you to the Snowflake online web administration console, the application in which you'll do most of your data analysis and other related Snowflake administration activity. In doing so, I'll point out each of the key features that you should familiarize yourself with. Reviewing and understanding these will help you to expertly navigate within the console. As you'll soon see, the Snowflake UI is super intuitive and makes performing various administration tasks associated with running an enterprise Snowflake hosted data warehouse easy and quick.
To begin with, the Snowflake is divided up into several key functional areas, each of which can be navigated to via the options in the top menu bar. This menu bar contains the following six options going from left to right: Databases, shares, data marketplace, warehouses, worksheets, history. And when operating under the account admin role, an account option is also available. Now, whenever you successfully authenticate into the Snowflake application, the worksheets view will be displayed first. Within this view, you'll have access to all past and present worksheets. Worksheets are literally that; they contain SQL statements which are to be executed. The current user and the role will be displayed always in the top right hand corner. Keep this in mind when and if a query or action performed within the Snowflake app doesn't go to plan.
It is quite possible that the current user currently lacks of the required permissions to perform that particular query or action. Queries can be executed by clicking on the run button at the top of the worksheet or by using the shortcut key sequence command plus return. Each worksheet can be renamed by double clicking on the name within the tab, followed by typing in the new name. Worksheets are divided up into several sub-panes. The main SQL editing pane is an editing area that allows you to author your SQL statements. This area auto saves in the background, ensuring that all of your work is safely restorable. All worksheet modifications are persisted across Snowflake sessions, meaning that you can log in and out and have all of your past queries available and ready to go at a moment's notice.
SQL results are returned in a bottom pane. Results presented within the results pane can be further filtered upon, allowing you to quickly pinpoint and determine that the results are indeed valid for the SQL query that produced them. Another important piece of configuration presented in the worksheet view is the worksheets drop-down menu, located at the top right within the worksheet. Clicking on this reveals the context in which the worksheet's SQL statements will be executed in. Specifically, this allows you to configure each of the following. The virtual warehouse used to provide the compute resource and the database and schema that the SQL query or queries should execute against.
Whenever the currently selected virtual warehouse is live, a green dot is shown next to it, as can be seen here in the screenshot. This is useful to know as it can be used quickly indicate whether or not your virtual warehouse is up and running and whether or not you are currently incurring cost. New worksheets can be added by clicking on the plus icon. Doing this will introduce a new blank canvas into the worksheet collection. You can also open up the worksheet management menu located directly next to the plus icon. From here, you can open other existing worksheets. For example, clicking on the open worksheet menu item, opens the open worksheet model pop up box, providing you with a list of your personal worksheets. Selecting one and then clicking open will recall it into the worksheets area.
All SQL queries executed within a worksheet are recorded and available for viewing within the history pane. Bottom right, available by clicking on the open history link. Hovering over the duration bar of a query within the history reveals interesting insights as to the execution cost of that particular query, including queuing time, compilation time, and execution time. Clicking on the query ID for any recorded query within the history view provides you with a details or metadata view of that particular query. This view will also show you the produced data set, the query execution created at that point in time. Clicking on the profile view will render a graphical view of the execution plan for that particular query. This is more than useful when performing query tuning and optimization.
Most of the time, Snowflake will create an optimal query plan, but knowing that this is available for each cases can be helpful when required. The left hand side tree view displays all objects available within the current Snowflake account for which the current user has permissions to operate on. This hierarchy is database, followed by schema, followed by table or view. Clicking on the database top menu item presents the databases view. This view provides a list of existing databases, and additionally, provides you with options to create, clone-drop, or transfer ownership of a database. You can click on any individual database to see and manage its database objects. When you drill down into a specific database, you are then able to navigate all of its objects, such as tables, views, schemas, stages, file formats, sequences, and pipes.
By default, the tables view is shown with a listing of all existing tables. The table view provides information for each table, including the table name, schema it belongs to, creation time, its owner, the number of rows within it and its overall size. Clicking on the shares option opens up the secure shares pane. Within this pane, inbound and outbound shares are listed, additionally with the option to create additional shares. Shares allow you to easily import and export database objects across Snowflake accounts. The data marketplace option provides you with the ability to source and operate on third party curated data sets. This type of capability provides you with easy access to a diverse and wide ranging set of industry data sets covering categories, such as healthcare, retail, weather, public sector, etc.
Clicking on the warehouses option will open up the virtual warehouses management area. Within this pane, you are presented with a list of existing named virtual warehouses. Each warehouse will be in either of one of two states, suspended or started. Clicking on an existing warehouse displays the editable grant privileges pane on the right hand side. Additional warehouses can be added by clicking on the create link. When done, the create warehouse model pop up box is presented, allowing you to set the new virtual warehouse's name and size. It's also recommended to set and adjust the auto suspend option appropriately to minimize spend on unused compute time.
Enabling the auto resume feature helps to quickly bring the virtual warehouse back online when a query is executed against it. Existing warehouses can be reconfigured by selecting them and then clicking on the configure link. The configure warehouse model pop-up box is presented. Here, you can change its size, the auto suspend maximum idle time, or enable/disable the auto resume option. Each warehouse provides a load-over time chart displaying its compute usage. This view can be accessed by clicking on the virtual warehouses name within the warehouse list view.
Opening the history option displays a list of historic queries that have been executed, ordered with the most recent on top. Every executed query is assigned an internal query ID. The history view can be filtered upon by setting up multiple filters. In the example shown here, a single filter is established. That being where the user is equal to Cloud Academy demo. Each query presented in the list shows its status, query ID, SQL text, warehouse, session ID, start and end times, total duration, bytes scanned, and a few more. Drilling down into a query via its clickable query ID reveals its metadata alongside the actual data that was produced by the execution of the query. It's sometimes useful to know the duration of the execution, the number of bytes scanned, and the number of rows returned, as this may reveal areas where optimization can be applied, such as defining a clustering key to improve table scanning efficiency.
To access the account option, you need to switch into the account admin role. If your current role allows you to do this, then when you have successfully switched over into the account admin role, the account option will be added to the top menu bar next to the history option. Clicking on the account option provides you with access to the following account related areas, usage, billing, users, roles, policies, sessions, resource monitors, and reader accounts. The usage feature, as seen here, provides numerous monthly stats regarding the credit usage within your Snowflake account, including for the current month, a daily breakdown of the Snowflake credits used for all warehouses. Warehouse credit usage, average storage used, and total amount of data transfer.
The following report shows storage usage broken down across days for the selected month. The account users area is used to manage user accounts, which in turn are used to authenticate your users into your Snowflake account. Within the users area, you can create new user accounts by clicking on the create link. The create user model pop-up box is then shown, which will guide you through the process of creating a new user. The roles area is used to manage existing roles, including those that are created by default by Snowflake. This area also allows you to create new roles by clicking on the create link. The create role model pop-up box is displayed when the create link is clicked. Here you can name the new role and establish its parent role.
Selecting policies will take you into the area where network security policies can be established. A network policy consists of one or several allowed and/or blocked IP cider blocks. Establishing this type of configuration will control and filter inbound TCP-based network connections to your Snowflake account. Establishing this configuration is highly recommended as it adds to your overall Snowflake security posture. Creating a new network policy is performed by clicking on the create link and then filling out the create network policy model pop-up box with the required network policy name and allowed and/or blocked IP cider ranges.
Selecting sessions displays a list of all authenticated sessions established against your Snowflake account. This information is highly useful to know since it shows the sessions being established, when they were established, the authentication method being used, and the client network address from which the session traffic originated from. Note, in this screenshot, I've removed this for privacy purposes. The resource monitors area is used to set up limits on credit consumption within your Snowflake account. If you want to be proactive about cost management, then you can set up one or several resource monitors to notify you about credit consumption, so that you can avoid any potential future bill shock. And finally, clicking on the create resource monitor link will display the create resource monitor model pop-up box. Within this, you can configure each of the various options to establish a new resource monitor.
Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.
Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).