Mitigating Infrastructure State Drift


Course Introduction
Course Conclusion
Start course

This course provides you with the foundational knowledge required to design an infrastructure and configuration management strategy. It starts by looking at hosting infrastructure — IaaS, PaaS, FaaS, and some modern native app options — before moving on to look at Infrastructure-as-Code. You’ll learn what Infrastructure-as-Code means and the tools and technologies that are used to deploy and manage it.

Next, you'll learn about some of the more common Infrastructure-as-Code tools and technologies: Terraform, Azure Resource Manager, Chef, Puppet, and more. You'll also learn about technical debt and how to deal with it in templates. The course then covers transient infrastructure and its role in the delivery lifecycle, before finishing off by looking at the mitigation of infrastructure state drift.

If you have any feedback relating to this course, please contact us at

Learning Objectives

  • Understand the main compute options available to host your infrastructure
  • Have a solid grasp of Infrastructure-as-Code (IaC) and its related tools and technologies 
  • Learn about technical debt and how to deal with it in templates
  • Learn about transient infrastructure and how it can speed up projects and reduce costs

Intended Audience

This course is intended for those preparing for Microsoft's AZ-400 exam, or anybody who wants to learn about designing an infrastructure and configuration management strategy in Azure.


To get the most from this course, you should have a basic understanding of Microsoft Azure and of DevOps concepts.


Hello. Welcome to Mitigating Infrastructure state Drift. In this lesson, we're going to discuss what infrastructure drift is and ways to mitigate it. 

Self-service is a common offering in the DevOps world – and while it allows for easier deployments for those who need them, it also raises the risk of infrastructure state drift – especially when servers are used to fulfill multiple roles, rather than individual roles. An application server that also functions as a file server would be an example of this. In addition, because cloud computing has made deployment and configuration of infrastructure so easy and fast, it has also accelerated the number of servers that get deployed, while making it possible to completely automate the provisioning process.

Now, as a result of all this ease of use, DevOps teams will sometimes comingle infrastructure deployments with their roles as developers. This often results in environments being spun up in an ad-hoc fashion. When you are spinning infrastructure up in an ad-hoc fashion, it becomes very easy to introduce infrastructure state drift.

Now, before we go any further, let me explain what infrastructure state drift is and what, specifically, causes it.

Drift refers to changes that occur to an environment over time. These changes can, and often do, result in an infrastructure that exists in a non-standard state. This kind of configuration drift can cause all kinds of problems, including security issues. For example, left unchecked, state drift can result in certain users having more access or privileges than they should have. Other problems associated with drift include scenarios where software and hardware changes that have been introduced get forced into production, despite their unique support requirements.

Although the changes that produce infrastructure state drift are not intentionally disruptive, they often become disruptive nonetheless – and worse, such changes are often due to things that can be controlled. For example, poor communication between teams is often a common cause of infrastructure state drift. This is because the lack of communication forces developers or teams of developers to work in a vacuum. When this happens, each developer has his own vision of what configuration is needed – without knowing what the needs of other developers and team members are.

Since everyone is working on their own island, there is often a lack of available (and coherent) documentation. This lack of documentation results in some (or all) team members not understanding why certain changes were made. It also results in uncertainty surrounding ongoing support and testing.

Furthermore, tight (or unrealistic) deadlines will often force teams into a tough spot, where they need to make changes without running those changes though any sort of meaningful testing or approval process. 

To mitigate these issues, and the resulting infrastructure state drift that comes along with them, mitigations processes and tools need to be implemented.

Mitigating infrastructure state drift should start with a configuration management database. Using a configuration management database ensures that all changes are logged, along with a justification for them. 

In addition to the implementation of a configuration management database, team members should be coached to maintain communication with all other team members. This ensures everyone is on the same page.

Limiting manual configuration management is also helpful because it hinders the creation of snowflake systems and infrastructures that require special support and non-standard maintenance.

Auditing and policy enforcement are also critical to mitigating drift because they force compliance with standard configurations. 

Another key to mitigating drift is to automate – because it eliminates the human component of the deployment process. This, of course, reduces the number of configuration mistakes that are made. By automating the deployment process and building it into pipelines, using standard templates, drift can be further mitigated. Whenever the template gets updated, deployments that are created from them are also updated at the same time. This, of course, results in standards adherence.

Microsoft Azure offers several tools and features that can help minimize infrastructure state drift.

For example, Azure Resource Manager helps ensure consistent deployments through the use of JSON templates. 

Desired state configuration, or DSC, is another tool at your disposal. DSC provides an automated way to create baselines that deployments should adhere to.

Another Azure tool that can help mitigate drift is the Azure activity log. By leveraging its logs and query engine, you can search the logs for changes that roll out configurations that are outside the scope of the standard templates. 

Practically speaking, drift can never be totally eliminated. However, it can be mitigated with the right combination of processes, tools, and communication.


About the Author
Learning Paths

Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.

In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.

In his spare time, Tom enjoys camping, fishing, and playing poker.