The course is part of these learning paths
This course is a quick introduction to Microsoft’s Azure Synapse Analytics. It covers serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines. It does not cover more advanced topics, such as optimization and security, because these topics will be covered in another course.
Learning Objectives
- Create and use serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines
Intended Audience
- Anyone who would like to learn the basics of using Azure Synapse Analytics
Prerequisites
- Experience with databases
- Experience with SQL (not mandatory)
- A Microsoft Azure account is recommended if you want to do the demos yourself (sign up for a free trial at https://azure.microsoft.com/free if you don’t have an account)
Resources
All right. In this demo, I'm going to show how to use Synapse Pipelines to create a data processing pipeline that runs at a scheduled time.
Go to Integrate. And then add a Pipeline. And then we have a list of activities over here. And under Synapse we have Notebook. So we can bring that over onto the canvas. And down here in the Settings we can tell it which notebook we want to run in this pipeline.
The only notebook we have right now is this one we used in the Spark Pool demo. It's not really the sort of notebook you'd wanna run on a regular basis because it analyzes the same data every time.
Normally you'd want to run a notebook that analyzes the latest data. Which would be different every time. But this notebook will be fine for the purposes of this demo.
There are lots of activities you can add to a pipeline such as Copy Data. You can also run analytics jobs and other services such as Databricks and HDInsight. You can even run machine learning jobs.
I'll leave this as a simple pipeline with only one activity in it though because I just want to show you how to create and run a pipeline. If you click Add Trigger, it'll give you a choice of running it now or scheduling it for later. We'll click New/Edit to schedule it for later.
In the drop down menu click New. The default type of trigger is Schedule, which is what we want. There are a few other options too. For example, you can configure the pipeline to run whenever new data arrives in a particular storage location. It sets the start data and time to the current time. Although it might not look like it because the time zone is set to UTC. You can change the time zone to your own. But that doesn't automatically adjust the start time. I'm going to set the date to tomorrow. And the time to 10 P.M. And I'll set the recurrence to once a day. You can also specify an end date if you want. Click OK.
It says we need to publish for the trigger to be activated. That's up here. If you click Publish All, it'll show you that this will publish the pipeline, the trigger, and the notebook. When it's finished publishing this pipeline will be scheduled to run everyday starting tomorrow. And that's it for this demo.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).