This course is a quick introduction to Microsoft’s Azure Synapse Analytics. It covers serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines. It does not cover more advanced topics, such as optimization and security, because these topics will be covered in another course.
- Create and use serverless SQL pools, dedicated SQL pools, Spark pools, and Synapse Pipelines
- Anyone who would like to learn the basics of using Azure Synapse Analytics
- Experience with databases
- Experience with SQL (not mandatory)
- A Microsoft Azure account is recommended if you want to do the demos yourself (sign up for a free trial at https://azure.microsoft.com/free if you don’t have an account)
Let’s do a quick review of what you’ve learned.
If you need a data warehouse, you can create a dedicated SQL pool, which lets you run SQL queries on structured, relational tables. If you want a data lake, then you can create a Spark pool, which lets you use Spark to query both structured and unstructured data. Both of these types of pools are clusters of virtual machines.
When you create a dedicated SQL pool, you specify how many DWUs (or Data Warehousing Units) to allocate. DWUs set the amount of CPU, memory, and I/O in the compute cluster. You can increase or decrease the number of DWUs manually. Storage space is provided by Azure Storage, so it scales independently from the compute cluster. You can pause a SQL pool when it’s not in use. When it’s paused, you won’t pay for the compute cluster, but you’ll still pay for the storage being used by the data warehouse.
Serverless SQL pools don’t have their own storage, and they don’t have access to data in dedicated SQL pools either. They can query structured files in Azure Storage, Azure Open Datasets, and external tables in Azure Storage that were created by a Spark pool. With serverless SQL pools, you only have to pay for the amount of data processed by your queries. When you create an Azure Synapse Workspace, it will automatically create a serverless SQL pool as well.
Synapse Studio provides a nice user interface for working with all types of pools. You can interact with a SQL pool using a SQL script, and you can interact with a Spark pool using a notebook.
Azure Synapse Pipelines is a stripped-down version of another Azure service called Data Factory that lets you create data processing pipelines. These pipelines can be scheduled to run on a regular basis.
Please give this course a rating, and if you have any questions or comments, please let us know. Thanks!
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).