Why Metrics are Important to DevOps

Start course
1h 1m

Modern software systems are becoming increasingly complex, to meet quality, availability, and security demands. And these systems are changing rapidly to keep up with the needs of end-users. With all of the changes, how do you ensure stability, quality, security, and innovation? In this course, we look at how the DevOps philosophy can provide a holistic way to look at software development, deployment, and operations. And we will provide some tenets to help improve quality and stability.

Course Objectives

You will gain the following skills by completing this course:

  • Learn why automation, culture, and metrics are essential to a successful DevOps project
  • Learn how DevOps can positively impact your business's bottom line
  • Learn which major companies are successfully utilizing DevOps in their own engineering processes

Intended Audience

You should take this course if you are:

  • A newcomer to the DevOps or cloud world
  • Looking to upgrade your skills from a conventional software development career


None specified.

This Course Includes

  • Expert-guided lectures about DevOps
  • 1 hour of high-definition video
  • Solid foundational knowledge for your explorations into DevOps

What You'll Learn

Video lecture What you'll learn
What Is DevOps? In this lecture series, you'll gain a fundamental understanding of DevOps and why it matters.
The Business Value of DevOps Need to justify the business case for DevOps? This is the lecture series for you.
Who's Using DevOps? Find out who's using DevOps in the enterprise - and why their success matters for your own organization.


If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


- Welcome back to our Introduction to DevOps course. I'm Ben Lambert, and I'll be your instructor for this lecture. In this lecture, we're going to talk about a few metrics that are important to DevOps. By the end of the lecture, you should be able to identify some of the metrics that help you to quantify the success of your DevOps practices.

Because DevOps is the philosophy of the efficient development, deployment, and operation of the highest-quality software possible, we need to know what to measure to determine if our pipeline is becoming more efficient and if we're producing higher-quality software. There are a lot of metrics that support your team. Every role on the team will have different metrics to better help them do their job, and all these metrics should be measured and reported back to anyone that needs them. However, for this lecture what we're going to focus on are the metrics specific to determining if your DevOps practices are having any impact. Let's talk about a few of the metrics and why they're useful.

The first metric we'll talk about is going to be frequency of your deployments. Since DevOps is related to producing an efficient development, deployment, and operations pipeline, this metric can help evaluate the health of that pipeline. Ideally, you'll be trending ever upwards for a piece of software that's under active development, until it reaches a kind of natural plateau and becomes a relatively constant number. Next, mean time to recovery, abbreviated MTTR, and it's the average amount of time it takes you to resolve a problem with your production environment. When I say "problem," what I mean is anything that negatively impacts your end users. This could be outages, high and severe level bugs, security holes, etc.

Failure is going to happen, no matter how well we plan or how good our code and infrastructure are. If you're building your software and infrastructure to be as flexible as possible, then dealing with failure becomes easier. Your goal should be to have this number decrease over time. Again, ideally there won't be any failures. However, since we all know that won't happen, knowing how long it takes on average for a failure to be resolved helps to identify potential bottlenecks in the resolution process.

Next up is mean time to discovery, abbreviated MTTD. Again, failure is going to happen. But how long does it take for you to discover those failures? Are you discovering the problem via some sort of automated method, or is your customer finding it for you? Where MTTR begins from the moment of discovery, MTTD begins from the moment any failure is introduced to production. So this is useful because it tells us how quickly we're identifying problems.

Next, we have system availability. Now, even if you're not bound to some sort of uptime agreement with your customers, knowing the uptime of each system that comprises your software as well as the overall uptime percentage is valuable. Having an understanding of the availability of each of the components of your software, load balancers, web servers, CDNs, will help you identify areas that may need some attention from your engineers. For example, if your web servers have an uptime of roughly 90% for the month, this is probably a symptom of a larger problem.

Next is service performance. This, like availability, will help you identify potential problems. You should know at a glance if your systems are performing within the desired thresholds that you've set. For example, how long does it take for a response to come back from your REST APIs? Or how long does it take for your web pages to load? Are you optimizing your website so that people on mobile devices using non-Wi-Fi connectivity aren't pulling down three-megabyte background images? If you track the average response time for your systems and a code deployment significantly impacts those times, then you'll be able to better identify which code change caused the latency.

Next, we have customer complaints. Now, this is something you're probably already tracking. If you're seeing a large percentage of your user base complaining about problems week after week, then you'll need to evaluate what the issues are and how to incorporate preventative measures into your pipeline. As we mentioned in previous lectures, you want to be careful to avoid the blame culture, and instead determine what's going wrong and how to prevent it from going wrong in the future. Ideally, those preventative measures will be something that you can automate if it makes sense to do so.

Our final metric is lead time. Lead time is the time it takes you to go from a feature request to that feature being released. Getting the customers the features they want as quickly as possible without sacrificing quality ties into our goal of efficiency. Also, the faster you can take an idea and put it onto staging servers for review, the faster you can either approve or reject new ideas, allowing you to fail fast enough to make experimentation possible. If it takes you weeks or months to get an idea from concept to running on a staging server, then experimentation becomes unsustainable, and without that experimentation you risk your software becoming stagnant.

Okay. These have all been metrics that pertain to measuring the efficacy of your DevOps efforts. You need to look at these metrics within the context of the holistic system. No single metric should represent the complete picture, and all of these metrics are merely a conversation starter. They should be used to enhance your software pipeline, not to beat up engineers for not meeting some sort of arbitrary goal, such as lines of code per day.

In our next group of lectures, we'll expand on some of these areas more in-depth as we discuss the business value of DevOps. We'll cover better lead times, improved stability and quality, and reduced operational costs, and the first up is improved lead time. All right. Let's get started.

About the Author
Learning Paths

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.

Covered Topics