Handling Slowly Changing Dimensions With Azure Synapse Analytics Pipelines
The course is part of this learning path
In this course, we’ll explain the concept of slowly changing dimensions and the different approaches to dealing with them. Then we’ll show you how to create a data flow in Azure Synapse Pipelines to update slowly changing dimensions. Even if you’re not particularly interested in how to handle slowly changing dimensions, the demos in this course will give you a good idea of how to create data flows in either Azure Synapse Pipelines or Azure Data Factory since they both use the same interface.
- Understand various methods for handling slowly changing dimensions in databases
- Create a data flow in Azure Synapse Pipelines that updates slowly changing dimensions in a database table
- Anyone who would like to learn about slowly changing dimensions in databases
- Anyone who needs to know how to create data flows in Azure Synapse Analytics Pipelines and Azure Data Factory
- Some knowledge of Azure Synapse Analytics, specifically dedicated SQL pools and Synapse Pipelines (take Introduction to Azure Synapse Analytics if you’re not familiar with this Microsoft service)
- Experience with SQL (not mandatory)
The GitHub repo for this course can be found here.
Now that we’ve created our data flow, we can run it and see if it works. But you might have noticed that there’s no run button here. So how do we run it? Well, we need to create a pipeline and run that.
Go to the Integrate section. Then click the plus sign and select “Pipeline”. It brings up a canvas where we can build our pipeline.
Okay, back to creating the pipeline. Thankfully, we only need to put one activity in our pipeline, and that’s our data flow. In the “Move and transform” section, there’s an activity called “Data flow”. Drag that over to the canvas. Then go to the Settings tab, and in the dropdown, you’ll see the data flow we created called “UpdateCustomer”. These settings specify the size of the cluster to run this pipeline on. We can leave it with the defaults. Bear in mind that this is a separate cluster from the debug cluster we started earlier. With debug mode enabled, we’ll actually be running this pipeline on the debug cluster. When we disable debug mode, then it’ll run the pipeline on this cluster.
Okay, so let’s run it. Click “Debug”. I’ll fast-forward. Great. It succeeded!
Now we can run it for real. First, we need to publish it by clicking “Publish all”. It shows us everything it’s going to publish, which includes the datasets, the data flow, and the script.
Disable debug mode. All right, it’s not obvious how to run the pipeline because there’s no Run button. Instead, we need to create a trigger. And select “Trigger now”. Click OK.
To see what’s happening, we need to go to “Monitor”. It says the pipeline run is in progress. It takes over 4 minutes to run, so I’ll fast-forward.
Okay, it succeeded. Now we should check the destination table and make sure the data’s there. Go back to Develop. Then click the plus sign and select “SQL script”. Paste this line. This’ll just retrieve one row from the table. Remember to change “Connect to” to the SQL pool you created. Then click Run. There’s the record. Notice that her name is “Janet Gates”.
Now let’s try changing the last name to Lopez. Add a new SQL script. And paste this short script.
This’ll change her last name in the source table only, so it won’t affect the destination table. Run it. Whoops, I need to change “Connect to” again. Even though there aren’t any results to show, it says the query executed successfully.
Now we’ll run the pipeline again and see if it updates the destination table with the new data. Then go back to Monitor. I’ll fast-forward again.
Okay, it’s done. Now, I’ll go back to Develop and SQL script 2. And run it again. It says “Lopez” now, so it worked.
If you’re following along on your own account, make sure you delete your dedicated SQL pool, so it doesn’t incur more charges. Go to Manage. You could pause it instead if you want, but bear in mind that you’d still incur charges for the storage it’s using. And that’s it for this demo.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).