How to Become a Microsoft Certified Azure Data Engineer

Data engineering is one of the most sought-after skills in the job market. According to a 2019 Dice.com report, there was an 88% year-over-year growth in job postings for data engineers, which was the highest growth rate among all technology jobs.

If you want to become a data engineer, then you’ll have to decide which technologies to learn because it’s impossible to be an expert in everything in such a broad field. Microsoft has been a data technology leader for many years, but is it still a top contender? Absolutely. Microsoft has moved very aggressively into the cloud with its Azure services. It has the second-highest market share among cloud providers, and it is growing at nearly twice the rate of Amazon Web Services.

Furthermore, Microsoft is so focused on Azure and its other cloud offerings that it is discontinuing all of its certification exams for Windows Server and SQL Server on June 30, 2020. This is a clear sign that the importance of on-premises technology is rapidly declining.

So what does an Azure data engineer do? Here’s what Microsoft says:

“Azure data engineers are responsible for data-related implementation tasks that include provisioning data storage services, ingesting streaming and batch data, transforming data, implementing security requirements, implementing data retention policies, identifying performance bottlenecks, and accessing external data sources.”

Are you convinced that Azure data engineering is a hot field worth pursuing? Then you can jump right into one of Cloud Academy’s two learning paths: Implementing an Azure Data Solution and Designing an Azure Data Solution. These learning paths combine the theory, technical knowledge, and hands-on practice that you’ll need to earn that certification and feel confident working in a live production environment:

Microsoft Certified: Azure Data Engineer Associate

Microsoft Certified: Azure Data Engineer Associate

 

If you still need some additional convincing, then let’s dive right into the specifics of how to become a Microsoft Certified Azure Data Engineer.

The Exams

To obtain this certification, you need to pass two exams, DP-200 and DP-201. The DP-200 exam focuses on implementation and configuration, while the DP-201 exam focuses on design.

DP-200 Exam

Here are the topics covered in the DP-200 exam and the relative weight of each section:

  • Implement data storage solutions (40-45%)
  • Manage and develop data processing (25-30%)
  • Monitor and optimize data solutions (30-35%)

I’m not going to talk about every item in the exam guide, but I’ll go over some of the highlights of what you’ll need to know. 

The first, and biggest, section of the exam guide is about implementing data storage solutions. These solutions are divided into non-relational and relational datastores. For many years, Microsoft’s primary relational data solution was SQL Server. If you wanted to migrate from an on-premises SQL Server to Azure, you could just run SQL Server in a virtual machine on Azure, but in most cases, you’d be better off using Azure SQL Database instead. 

The advantage is that it’s a managed service with lots of built-in features that make it easy to scale and provide high availability, disaster recovery, and global distribution. And you need to know how to configure all of those features. SQL Database is not exactly the same as SQL Server, but it’s close enough that it shouldn’t be too much trouble migrating to it. If you really need full SQL Server compatibility, then you can use SQL Database Managed Instance.

Another relational data storage service is Azure Synapse Analytics, formerly known as Azure SQL Data Warehouse. As you can tell from its name, it’s meant for analytics rather than transaction processing. It allows you to store and analyze huge amounts of data. The fastest way to get data into Synapse Analytics is by using Polybase, so it’s important to learn the details of how to use it. To make queries as fast and efficient as possible, you need to partition the datastore into multiple shards and also use the right distribution method.

Naturally, security is important for both SQL Database and Synapse Analytics, not just for restricting access to data but also for things like applying data masking to credit card numbers or encrypting an entire database.

That covers relational database services, but how about non-relational datastores? These are services that can store unstructured data, such as documents or videos. The most mature Azure service in this category is Blob storage, which is a highly available, highly durable place to put digital objects of any type. Unlike a filesystem, Blob storage has a flat structure. That is, the objects aren’t stored in a hierarchy of folders. You can make it look that way through clever naming conventions, but that’s really just faking a tree structure.

For a true hierarchical structure, you can use Azure Data Lake Storage Gen2, which is actually built on top of Blob storage. It’s especially useful for big data processing systems like Azure Databricks.

The final non-relational datastore you need to know for the exam is Cosmos DB. This is a pretty amazing database system because it can scale globally without sacrificing performance or flexibility. It can even support multiple types of data models, including document, key-value, graph, and wide column. Another surprising feature is the ability to support five different consistency levels ranging from strong to eventual consistency.

As with SQL Database and Synapse Analytics, you need to know how to configure partitioning, security, high availability, disaster recovery, and global distribution for Cosmos DB.

The next section of the exam guide is about managing and developing data processing solutions. It’s divided into two subsections: batch processing and stream processing. The two most important batch processing services are Azure Data Factory and Azure Databricks.

Data Factory makes it easy to copy data from one datastore to another, such as from Blob storage to SQL Database. It also makes it easy to transform data, which it accomplishes by using services like Databricks behind the scenes. You can even create complex automated processing pipelines by linking together a series of transformation activities that are kicked off by a trigger that responds to an event.

Azure Databricks is a managed data analytics service. It’s based on Apache Spark, which is a very popular open-source analytics and machine learning framework. You can also run Spark jobs on Azure HDInsight, but Databricks is the preferred solution, so it’s the one you’ll need to be most familiar with for the exam. Some of the Databricks topics covered are data ingestion, clusters, notebooks, jobs, and autoscaling.

The most important stream processing service is Azure Stream Analytics. You need to know how to get data into it from other services, how to process data streams using different windowing functions, and how to output the results to another service.

The final section of the exam guide is about monitoring and optimizing data solutions. The most important service for this section is Azure Monitor, which you can use to monitor and configure alerts for almost every other Azure service. One of the key components of Azure Monitor is Log Analytics, which you can use to implement auditing. 

The optimization subsection doesn’t include new services. Instead, you need to know how to optimize the performance of services like Stream Analytics, SQL Database, and Synapse Analytics. Using the right partitioning method is one of the most important optimization techniques.

Finally, I should mention that since the DP-200 exam is all about implementation and configuration, you need to know how to actually configure data services in the Azure portal, so the exam includes tasks that you have to perform in a live lab! If you’re worried about how you’ll get the required level of hands-on practice, see the Preparing for the Exams section below.

DP-201 Exam

Here are the topics covered in the DP-201 exam and the relative weight of each section:

  • Design Azure data storage solutions (40-45%)
  • Design data processing solutions (25-30%)
  • Design for data security and compliance (25-30%)

While the DP-200 exam is all about implementation, the DP-201 exam is about design, so it focuses more on planning and concepts than on getting everything set up.

The first, and biggest, section of the exam guide is about designing data storage solutions. You need to know which Azure services to recommend to meet business requirements. As with DP-200, these solutions are divided into relational datastores, including Azure SQL Database and Azure Synapse Analytics, and non-relational datastores, including Cosmos DB, Data Lake Storage Gen2, and Blob storage.

For all of the above services, you need to know how to design:

  • Data distribution and partitions
  • High scalability, taking into account multiple regions, latency, and throughput
  • Disaster recovery, and
  • High availability

The next section of the exam guide is about designing data processing solutions. It’s divided into batch processing and stream processing. For batch processing, you need to know how to design solutions using Azure Data Factory and Azure Databricks. For stream processing, you need to know how to design solutions using Stream Analytics and Azure Databricks. As you can tell, Azure Databricks is a very important service for data processing since it’s used for both batch and stream processing. You also need to know how to ingest data from other Azure services and how to output the results to other services.

The final section of the exam guide is about data security and compliance. First, you need to know how to secure your datastores. The most important decision is what authentication method to use for various use cases. For example, it’s usually preferable to rely on Azure Active Directory authentication than to embed an access key in your application code. Role-based access control and ACLs (or Access Control Lists) are also important.

The second part of this section deals with designing security for data policies and standards. Some of the topics include:

  • Encryption, such as Transparent Data Encryption
  • Data auditing
  • Data masking, such as obscuring credit card numbers
  • Data privacy and data classification
  • Data retention
  • Archiving, and
  • Purging

Preparing for the Exams

Even if you already have a lot of experience with Azure data services, I recommend you spend a significant amount of time studying for the exams because DP-200 and DP-201 will thoroughly test your knowledge and skills.

To fill in the gaps in your knowledge and to review all of the topics, I recommend taking self-paced courses, getting hands-on experience, and taking practice exams. The easiest way to do that is to go through Cloud Academy’s DP-200 and DP-201 Exam Preparation learning paths. Both of them include video-based courses, hands-on labs, and a practice exam.

Good luck on the exams!

Avatar

Written by

Guy Hummel

Guy is a certified cloud architect on all three of the major public cloud platforms: AWS, Azure, and Google Cloud Platform. He launched his first training website in 1995 and he's been helping people learn IT technologies ever since. Guy’s passion is making complex technology easy to understand.


Related Posts

Alisha Reyes
Alisha Reyes
— March 17, 2020

Cloud Academy’s Blog Digest: How Do AWS Certifications Increase Your Employability, How to Become a Microsoft Certified Azure Data Engineer, and more

With everything going on right now, it's likely that the only thing you've been reading lately is related to the coronavirus pandemic. It's important to stay informed during these times, but it's also good to jump into something that can take your mind off of the current situation for j...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • programming
  • Security
Avatar
Cloud Academy Team
— March 13, 2020

Which Certifications Should I Get?

As we mentioned in an earlier post, the old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and compan...

Read more
  • AWS
  • Azure
  • Certifications
  • Cloud Computing
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— March 7, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Alisha Reyes
Alisha Reyes
— March 6, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Avatar
Thomas Mitchell
— February 27, 2020

5 Steps to Vulnerability Management for Containers

Organizations have begun embracing containers due to their simplicity and to the fact that they allow for a faster development and deployment velocity. Although developers are thrilled with containers because they allow them to deliver solutions more quickly, security teams are sometime...

Read more
  • AZ-500
  • AZ-500 Exam
  • Azure
  • vulnerability management
Avatar
Chandan Patra
— February 21, 2020

Elasticsearch vs. CloudSearch: AWS Cloud Search Choices

Elasticsearch vs. CloudSearch: What's the main difference? Let's compare AWS-based cloud tools: Elasticsearch vs. CloudSearch. While both services use proven technologies, Elasticsearch is more popular, open source, and has a flexible API to use for customization; in comparison, CloudS...

Read more
  • AWS
  • Azure
  • cloudsearch
  • elasticsearch
Avatar
Andrew Larkin
— February 13, 2020

Cloud Academy Content Roadmap Updates

Welcome to our Q1 2020 roadmap. This is the content we plan to build over the next three months, between February 1 - and April 30, 2020. Let's look at some of our roadmap highlights. Atlassian Bamboo for CI/CD We had a lot of requests for practical guides on how to apply DevOps tool...

Read more
  • Artificial Intelligence
  • AWS
  • Azure
  • Docker
  • Google Cloud Platform
  • Kubernetes
  • Machine Learning
Alisha Reyes
Alisha Reyes
— February 7, 2020

New on Cloud Academy: Git Labs, CKA and CKAD Lab Challenges, AWS and Azure Learning Paths, AGILE, and Much More

We just kicked off our first Free Weekend of 2020. This means we've unlocked our Training Library for just 72 hours. Until Sunday at 11:59 pm (PST), you can get unlimited access to our industry-leading learning paths, courses, certification prep exams, and our most popular hands-on labs...

Read more
  • agile
  • AWS
  • Azure
  • Google Cloud Platform
  • Linux
  • OWASP
  • programming
  • red hat
  • scrum
Alisha Reyes
Alisha Reyes
— January 31, 2020

How to Unlock Complimentary Access to Cloud Academy

Are you looking to get trained or certified on AWS, Azure, Google Cloud Platform, DevOps, Cybersecurity, Information Security, Python, Java, or another technical skill? Then you'll want to mark your calendars. Starting Friday, February 7 at 12:00 a.m. PST (3:00 a.m. EST), Cloud Acade...

Read more
  • AWS
  • Azure
  • cloud academy content
  • complimentary access
  • GCP
  • on the house
Alisha Reyes
Alisha Reyes
— January 6, 2020

New on Cloud Academy: Red Hat, Agile, OWASP Labs, Amazon SageMaker Lab, Linux Command Line Lab, SQL, Git Labs, Scrum Master, Azure Architects Lab, and Much More

Happy New Year! We hope you're ready to kick your training in overdrive in 2020 because we have a ton of new content for you. Not only do we have a bunch of new courses, hands-on labs, and lab challenges on AWS, Azure, and Google Cloud, but we also have three new courses on Red Hat, th...

Read more
  • agile
  • AWS
  • Azure
  • Google Cloud Platform
  • Linux
  • OWASP
  • programming
  • red hat
  • scrum
Orion Withrow
Orion Withrow
— December 17, 2019

Azure Security: Best Practices You Need to Know

When it comes to Azure Security best practices, where do you begin? In a lot of ways, Azure is very similar to any other data center. But with that said, Azure can also be very different. Securing Azure can pose many unique challenges. The security of resources hosted in Azure is of the...

Read more
  • Azure
  • azure best practices
  • azure security center
  • Security
Avatar
Guy Hummel
— December 12, 2019

Google Cloud Platform Certification: Preparation and Prerequisites

Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2019, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the second consecuti...

Read more
  • AWS
  • Azure
  • Google Cloud Platform