This course is a quick introduction to Microsoft’s Azure Synapse Analytics. It covers serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines. It does not cover more advanced topics, such as optimization and security, because these topics will be covered in another course.
Learning Objectives
- Create and use serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines
Intended Audience
- Anyone who would like to learn the basics of using Azure Synapse Analytics
Prerequisites
- Experience with databases
- Experience with SQL (not mandatory)
- A Microsoft Azure account is recommended if you want to do the demos yourself (sign up for a free trial at https://azure.microsoft.com/free if you don’t have an account)
Resources
Now I'm going to show you how to use a dedicated SQL pool. Unlike the built-in serverless SQL pool, we have to create this one first. Call it sqlpool1. And for the performance level here, you can see that if we leave it where it is, it's going to cost about $12 an hour, which is pretty expensive. We can move this slider down to here and it'll be one 10th of the cost. All right, that's all that we need to set.
There are some additional settings we could set but we don't need to. So we'll just go down and do a review and create. And then create. And as you can see, it can take five to eight minutes to create this. So I'll fast forward. Okay. It's done.
Now let's load data from the New York City taxi trip data set again. So we'll go over to develop and then we need to create a SQL script. Alright, I'll paste in the code. If you want to try this yourself, you can get the code from the bottom of the transcript below.
So this code creates a table and then it loads the taxi data from the open dataset. And then finally it prints off the first 100 rows so that we can see it worked properly. And this is really important. It always defaults to the built-in serverless pool. And if you run it like that then it'll get an error because serverless pools don't work with dedicated SQL pools. So we'll change that. Okay. And then we'll run it. It takes about 40 seconds to run. So I'll fast forward again.
Okay. Other than creating the table, the script didn't do much. So I'm going to create another SQL script that will do some basic analytics. And I'm going to change this immediately, so I don't forget. It's a common mistake.
All right, so this will show us the average trip distance for one passenger, two passengers, etc. And I didn't have to fast forward this time because it queried 2 million rows of data in just a few seconds. Now we can look at this in chart form and if we change the category to passenger count then it'll show the correct labels at the bottom here. And you'll notice something very odd about this data, is that the average trip distance for three passenger trips is very high compared to the other ones. So, if we were really doing analytics on this data we'd probably want to investigate that since that seems a little bit odd.
If you're following along on your own account, then I highly recommend that you delete this dedicated SQL pool so you don't have a lot of charges on your bill. So we'll go here and now you can pause it if you want. And it won't incur any charges for the compute cluster while it's paused. But you'll still be charged for the data you've stored in the data warehouse. But let's just delete it. We have to put in the name of the SQL pool. All right. That's it for this demo.
SQL script 1
CREATE TABLE [dbo].[Trip]
(
[DateID] int NOT NULL,
[MedallionID] int NOT NULL,
[HackneyLicenseID] int NOT NULL,
[PickupTimeID] int NOT NULL,
[DropoffTimeID] int NOT NULL,
[PickupGeographyID] int NULL,
[DropoffGeographyID] int NULL,
[PickupLatitude] float NULL,
[PickupLongitude] float NULL,
[PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DropoffLatitude] float NULL,
[DropoffLongitude] float NULL,
[DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[PassengerCount] int NULL,
[TripDurationSeconds] int NULL,
[TripDistanceMiles] float NULL,
[PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FareAmount] money NULL,
[SurchargeAmount] money NULL,
[TaxAmount] money NULL,
[TipAmount] money NULL,
[TollsAmount] money NULL,
[TotalAmount] money NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
);
COPY INTO [dbo].[Trip]
FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/QID6392_20171107_05910_0.txt.gz'
WITH
(
FILE_TYPE = 'CSV',
FIELDTERMINATOR = '|',
FIELDQUOTE = '',
ROWTERMINATOR='0X0A',
COMPRESSION = 'GZIP'
)
OPTION (LABEL = 'COPY : Load [dbo].[Trip] - Taxi dataset');
SELECT TOP 100 * FROM dbo.Trip
SQL script 2
SELECT PassengerCount,
AVG(TripDistanceMiles) as AvgTripDistance
FROM dbo.Trip
WHERE TripDistanceMiles > 0 AND PassengerCount > 0
GROUP BY PassengerCount
ORDER BY PassengerCount;
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).