The course is part of these learning paths
This course is a quick introduction to Microsoft’s Azure Synapse Analytics. It covers serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines. It does not cover more advanced topics, such as optimization and security, because these topics will be covered in another course.
- Create and use serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines
- Anyone who would like to learn the basics of using Azure Synapse Analytics
- Experience with databases
- Experience with SQL (not mandatory)
- A Microsoft Azure account is recommended if you want to do the demos yourself (sign up for a free trial at https://azure.microsoft.com/free if you don’t have an account)
Okay, the easiest way to get to Synapse Analytics in the Azure Portal is to go to the search bar and just type synapse. There it is, and then we click Add because we need to add a workspace.
All right, now it wants a resource group, and we'll create a new one, I'm going to call it synapse1, and now we need a workspace name, and this does need to be unique across all customers. So I'm going to add CA in front of mine for Cloud Academy, you'll have to put something else in of course. And I'll put it in a region that's close to me, so let's say West US.
Now we need to create a Data Lake Storage Gen2 account, and it's a layer on top of Azure Storage that allows you to have a file system, a hierarchical file system. So we'll create a new one, I'll just call this synapse1 again. Oh, right. I'll call it casynapse1, and for the file system name, here it doesn't need to be unique, we can call it whatever we'd like. I'll just call this one synapse1.
All right, and so this part is for security. So it says that I'll assign myself the Storage Blob Data Contributor role on the Data Lake Storage account because that will allow me to interactively query it in the workspace, and if you want other people to be able to do this, then you need to give them a few permissions like the contributor role on the workspace and the Storage Blob Data Contributor role on the storage account.
Okay, there are some other things that we could set here, like security, but this is enough for us to get started, so we'll just do a review and create, and right now it's validating that all the choices I made are okay, and it gives us an estimated cost here as well, per terabyte. I'll just do create. It can take quite a while so I'm going to fast forward.
Okay, it's finished. So we'll go to the resource group, and you see we have a storage account and a synapse workspace. So we'll click on the workspace, and then we can open Synapse Studio, which is a really nice tool for working with Synapse Analytics.
Now we can immediately start doing some analytics by using the built-in serverless SQL pool. The easiest way to try it is to click on Learn and then click, use samples immediately. The second one uses a serverless SQL pool, so let's click that one.
Okay, this is a SQL script, and it's the way that you interact with a SQL pool, and this one contains SQL commands that look at three open data sets. So there's this one that has data on New York taxi trips, and this one has data on public holidays, and this one has data on weather. And so each one of these statements is very simple, it's just selecting the first 100 rows and displaying those below. They're using the PARQUET format which is a very common format for tables. And down here we have a SQL query that's actually doing some analytics.
So it is taking a look at the taxi trip data, and it's getting the number of rides per year, and it's looking at the years between 2014 and 2019.
Okay, so to run all of these queries, we just click Run up here and the results are going to come down here. It does take a little while, but not too long.
Okay, so it shows the results of the queries as they come in. So it showed the first two, or first three actually. I can go back to query zero, and so these are the first 100 rows of the taxi trips dataset. So it has things like passenger count and trip distance. And then this is the public holidays in different countries.
All right, and then the next one is weather in various places by latitude and longitude, so it has wind speed and temperature and things like that, and now this last one is done. This is the one that was doing some simple analytics work.
So it was calculating the total number of rides per year for each of these years, and we can display that more nicely by clicking on Chart up here, and you can see that the number of rides per year has steadily been going down, and unfortunately down here, it's showing just zero through five for the years, so if we want to fix that, we can go to the category column dropdown here and pick current year, and it'll fix that, so now you can see what the years are.
All right, so in summary, we created a workspace which you need to do before you can do anything else in Synapse Analytics, and then we use the built-in serverless SQL pool to run some queries on some open data sets, and then we displayed the results in a chart, and that's it for this demo.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).