Measurement, Metrics & Reporting
The DevOps Institute is a collaborative effort between recognized and experienced leaders in the DevOps, InfoSec and ITSM space and acts as a learning community for DevOps practices. This DevOps Foundations course has been developed in partnership with the DevOps Institute to provide you with a common understanding of DevOps goals, business value, vocabulary, concepts, and practices. By completing this course you will gain an understanding of the core DevOps concepts, the essential vocabulary, and the core knowledge and principles that fortify DevOps principles.
This course is made up of 8 lectures and an assessment exam at the end. Upon completion of this course and the exam, students will be prepped and ready to sit the industry-recognized DevOps Institute Foundation certification exam.
- Recognize and explain the core DevOps concepts.
- Understand the principles and practices of infrastructure automation and infrastructure as code.
- Recognize and explain the core roles and responsibilities of a DevOps practice.
- Be prepared for sitting the DevOps institute Foundation certification exam after completing the course and assessment exam.
- Individuals and teams looking to gain an understanding and shared knowledge of core DevOps principles.
- A basic understanding of IT roles and responsibilities. We recommend completing the Considering a Career in Cloud Computing? learning path prior to taking this course.
- [Instructor] Welcome to lecture seven. In this lecture, we will learn more about measurement, metrics, and reporting on DevOps practices. We will start by exploring the importance of DevOps metrics of speed, throughput, and tempo. Then we'll look at quality, stability, and finally culture. Now, as Peter Drucker said, "If you can't measure it, "you can't improve it." So what is the importance of measurement to us? You may find it useful to think of measurement through the three ways framework, and which kinds of metrics tell us more about flow and experimentation and learning, and which ones tell us more about feedback. In the first way, the change in lead time and cycle time is important. The time to value and the time to realization are our key metrics. In the second way, build and test results and change failure rates become more influential as an evidence-based success. And in a third way, the time spent on realizing hypotheses and the time spent on mastery and achievement become more relevant to how we measure success.
- At VersionOne we put a lot of thought in what it takes to create a data-drive DevOps organization, and it really all starts with flow metrics. But we have a challenge in DevOps, and that's the minute we convert out backlog items into source code, and then convert that source code into binary artifacts, we lose all visibility in the flow, so it's very difficult to track a specific backlog item in the form of a binary artifact as it moves through every single stage of our value stream. So creating that capability is really the first step. We call that affiliation. It's the ability to affiliate the specific backlog item with a specific code, and then connect that code to specific artifacts and be able to track those artifacts as they move from phase to phase to phase, and therefore the story moves from phase to phase to phase. Tracking the flow of backlog items across each step in your value stream map is important, but often what people really care about are combinations of backlog items, like features or Epics, so questions, a product owner might have a question about what's the distribution of this Epic across my value stream map right now, and how much of this Epic has already been delivered to my end users? It might be great to understand has every single backlog item in this feature made it to a specific stage in my value stream map? Having that kind of visibility is really important. I think the next step in creating data-driven organizations is to take this real-time visibility and start to convert it into flow metrics and really objective measurements of how your value stream is performing. So the first flow metric that we have to consider is lead time, and that's just how long is it really taking for the average work item to travel from development all the way to the end. So the next set of metrics that we have to track are work in progress or WIP, and I think we do a really a good job of understanding work in progress through development, but the minute we start converting backlog items into code and then convert that code into artifacts or binary objects, it becomes really difficult to track work in progress as those binary artifacts move through each phase of delivery. So being able to understand and visualize work in progress, even after stories get coded into binary artifacts, and understand exactly how much value we have stacking up at every phase of delivery, for example, how much work in progress do we have in staging right now? That's the next set of metrics that I think are really important and really helpful. So some other concepts that we've borrowed from Lean are the notion of touch time versus wait time, where touch time is the amount of time we actually spend adding value to a user story or defect and wait time is the amount of time it spends stationary, waiting for some next step or some next activity. And if you can calculate touch time, if you can calculate the wait time, at a work item level, that starts to provide some really powerful information. If you know the cycle time through a particular phase of delivery, so for example, through the quality assurance phase, if I can express my touch time and my wait time, I can now start to look at how efficient is this phase? What's the ratio of touch time over the overall cycle time of that phase, and that really starts to show us where the waste is, where the friction, and where the opportunities are to start streamlining flow. And so it's good to understand over the course of a release, for example, what percentage of work has been introduced into this release that can't be tied back to specific business value, or what's the percentage of rogue commits, for example. So in addition to simple risk measures like the percentage of rogue commits in a code base, we're starting to imagine some really innovative and exciting ways to think about risk as backlog items move through our value stream maps. And one of the ones that I find most exciting is cyclomatic complexity or fragility, and we've been able to measure the cyclomatic complexity and fragility of our code bases for years, and we've got great tools that help us do that, but what we haven't been able to do until now is start to measure the cyclomatic complexity or the fragility of specific backlog items, specific user stories, or specific features, for example. Or, answer questions like what's the relative fragility of release A compared to release B or release C? So one last class of metrics that I think is interesting with regard to risk is change visibility, and we can now track how dynamic our code base is through various stages of our value stream, and we would expect our code base to be very dynamic in the early stages of our DevOps value stream, but as we get through closer to quality assurance, or certainly beyond the definition of done or beyond the code complete, we would expect that code base to be very stable, so being able to understand the rate of change and how dynamic our code base is as we approach various stages of delivery is really important, because the more change we have later in the value stream, the more risk that we have, and I think there's a lot of study and research around that, but providing that visibility to all stakeholders and being able to measure that objectively I think is really important. So we've talked about some flow metrics and some risk metrics. These are just a couple of examples of how you can begin to build a data-drive DevOps organization, and ultimately, help you get good at getting better.
- [Instructor] So how do we go about measuring success? Here are the four key metrics organization can use to show the effectiveness of their DevOps practices. Showing proof that DevOps practices benefit the organization requires examining factors that influence overall IT performance. In other words, you can't measure Dev and Ops separately. Showing proof that DevOps practices benefit the organization also requires baselining key metrics before starting any transformation, and then again after making improvements. Change lead time is one of the most important metrics as it represents what the customer sees. Cycle time is a more mechanical measure of process capability, and lead time depends on cycle time, but also depends on your willingness to keep a backlog, and our customer's patience and the customer's readiness for delivery. Change failure rate is an important measure of reliability and stability. Now people sometimes think that MTTR and MTRS can be used interchangeably. MTTR is measured from when a component fails until it is repaired. MTTR does not include the time required to recover or restore a service, i.e. data may need to be recovered before a service is fully restored and delivered to normal functionality. Now while it may be hard to assign a dollar value, there are always measurements we can use for culture. A few representative metrics include stress levels, collaboration levels, the ability to attract and retain talent, turnover, employee morale, responsiveness to change are some of the metrics we can use to dipstick or measure how our culture is going. Now many metrics are measured differently from organization to organization, and even team to team within an organization, so it's important to form and agree to a common language and set of definitions so that metrics can be shared across teams. A key metric for the technology teams to provide to the business is the change lead or cycle time. How long does it take them to get the thing that they asked for? People need to agree whether there is a difference between lead and cycle time and when to start measuring this metric. This metric can highlight the differences between a waterfall-like approach versus an agile approach. Most organizations will have none or virtually no metrics or so many metrics that they have become meaningless, and they will need to place a new focus on identifying the ones that are key with the acknowledgment that they may need to change. So here are a few tips for which measurements to focus on. Focus on the outcomes and the value. Capability is better than maturity because we are never done. Lines of code are dangerous to measure because often the simplest solution is the more elegant and the less likely to result in a technical dip. Measure velocity in an agile team to keep with estimation and cadence, but don't compare velocity across teams since they will all be doing different things in slightly different ways, and so it's dangerous and likely to drive the wrong sort of behaviors. Measuring purely on utilization can also miss out on embedding experimentation and learning in your culture, and it may not tackle technical dip, and it could also encourage burnout in the culture. Team and global metrics help drive visibility and transparency, and consequently collaboration, shared accountability, shared goals and vision. How do you measure unplanned work is always a question you need to be asking. Let's look at the Societe Generale case study. They identified that it's important to establish two sets of indicators. The first is the transformation itself. In other words, you need to measure how fast you're moving towards the transformation goal. The second indicator is about the business value. What is the time to market from idea to production, including sprint velocity and quality? This pyramid from Gartner isn't a maturity pyramid. You don't aim for the top. It's a hierarchy of importance or the metrics placed on it, and it elevates the business metrics. So the Gartner DevOps Metrics Pyramid we want to read from the bottom up. So operational efficiency is one of the key metrics you want to have. Service quality and service velocity are the other areas that we would like to have a lot of focus on. Organizational effectiveness is the area that can drive more business effectiveness. Customer value is always a key metric. And our business performance. Ultimately our DevOps pyramid should inform and drive business performance. Okay, that brings us to the end of lecture seven. I'll see you at the next lecture.
Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built 70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+ years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.