The course is part of this learning path
This introduction for the DP-200 Exam Preparation: Implementing an Azure Data Solution learning path gives an overview of the requirements for the Microsoft DP-200 Exam and how they will be covered.
The three main subject areas are:
- Implementing data storage solutions
- Managing and developing data processing
- Monitoring and optimizing data solutions
Hello and welcome to Implementing an Azure Data Solution. The focus of this learning path is to prepare you for Microsoft's DP-200 exam. If you pass the DP-200 and DP-201 exams, then you'll earn the Microsoft Certified Azure Data Engineer Associate certification. It's quite a mouthful. My name's Guy Hummel and I'm a Microsoft Certified Azure Solutions Architect and Data Engineer.
The DP-200 exam tests your knowledge of three subject areas: implementing data storage solutions, managing and developing data processing, and monitoring and optimizing data solutions. I'm not going to talk about every item in the exam guide, but I'll go over some of the highlights of what you'll need to know. First, I should mention that the DP-200 exam is all about implementation and configuration, so you need to know how to actually configure data services in the Azure portal. In case you're still tempted to skip the hands-on practice because you figure the exam is just going to have multiple-choice questions, then I've got a warning for you. This exam includes tasks that you have to perform in a live lab! Luckily, we have a number of hands-on lab exercises in this learning path to give you some practice, so don't worry.
Okay, the first and biggest section of the exam guide is about implementing data storage solutions. These solutions are divided into non-relational and relational data stores. For many years, Microsoft's primary relational data solution was SQL Server. If you wanted to migrate from an on-premises SQL Server to Azure, you could just run SQL Server in a virtual machine on Azure, but in most cases, you'd be better off using Azure SQL Database instead. The advantage is that it's a managed service with lots of built-in features that make it easy to scale and provide high availability, disaster recovery, and global distribution. And you need to know how to configure all of those features.
SQL Database is not exactly the same as SQL Server, but it's close enough that it shouldn't be too much trouble migrating to it. If you really need full SQL Server compatibility, then you can use SQL Database Managed Instance.
Another relational data storage service is Azure Synapse Analytics, formerly known as Azure SQL Data Warehouse. As you can tell from its name, it's meant for analytics rather than transaction processing. It allows you to store and analyze huge amounts of data. The fastest way to get data into Synapse Analytics is by using Polybase, so it's important to learn the details of how to use it. To make queries as fast and efficient as possible, you need to partition the data store into multiple shards and also use the right distribution method.
Naturally, security is important for both SQL Database and Synapse Analytics, not just for restricting access to data but also for things like applying data masking to credit card numbers or encrypting an entire database.
All right, on to non-relational data stores. These are services that can store unstructured data, such as documents or videos. The most mature Azure service in this category is Blob storage, which is a highly available, highly durable place to put digital objects of any type. Unlike a filesystem, Blob storage has a flat structure. That is, the objects aren't stored in a hierarchy of folders. You can make it look that way through clever naming conventions, but that's really just faking a tree structure. For a true hierarchical structure, you can use Azure Data Lake Storage Gen2, which is actually built on top of Blob storage. It's especially useful for big data processing systems like Azure Databricks. The final non-relational data store you need to know for the exam is Cosmos DB. This is a pretty amazing database system because it can scale globally without sacrificing performance or flexibility. It can even support multiple types of data models, including document, key-value, graph, and wide column. Another surprising feature is the ability to support five different consistency levels ranging from strong to eventual consistency. Don't worry if you don't know what that means yet. You'll find out in this learning path. As with SQL Database and Synapse Analytics, you need to know how to configure partitioning, security, high availability, disaster recovery, and global distribution for Cosmos DB.
The next section of the exam guide is about managing and developing data processing solutions. It's divided into two subsections: batch processing and stream processing. The two most important batch processing services are Azure Data Factory and Azure Databricks. Data Factory makes it easy to copy data from one data store to another, such as from Blob storage to SQL Database. It also makes it easy to transform data, which it accomplishes by using services like Databricks behind the scenes. You can even create complex automated processing pipelines by linking together a series of transformation activities that are kicked off by a trigger that responds to an event.
Azure Databricks is a managed data analytics service. It's based on Apache Spark, which is a very popular open-source analytics and machine learning framework. You can also run Spark jobs on Azure HDInsight, but Databricks is the preferred solution, so it's the one you'll need to be most familiar with for the exam. Some of the Databricks topics covered are data ingestion, clusters, notebooks, jobs, and autoscaling.
The most important stream processing service is Azure Stream Analytics. You need to know how to get data into it from other services, how to process data streams using different windowing functions, and how to output the results to another service.
The final section of the exam guide is about monitoring and optimizing data solutions. The most important service for this section is Azure Monitor, which you can use to monitor and configure alerts for almost every other Azure service. One of the key components of Azure Monitor is Log Analytics, which you can use to implement auditing. The optimization subsection doesn't include new services. Instead, you need to know how to optimize the performance of services like Stream Analytics, SQL Database, and Synapse Analytics. Using the right partitioning method is one of the most important optimization techniques.
This learning path assumes that you already have some basic experience using Microsoft Azure. If you don't have any experience yet, then please take one of our introductory Azure courses, such as Overview of Azure Services, first. Now, are you ready to learn about Azure data engineering? Then let's get started! To get to the next course in this learning path, click on the Learning Path pullout menu on the left side of the page. But please remember to rate this introduction before you go on to the next course. Thanks!
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).