Measuring DevOps Success: What, Where, and How

The DevOps methodology relates technical and organization practices so it’s difficult to simply ascribe a number and say “our organization is a B+ on DevOps!” Things don’t work that way. A better approach identifies intended outcomes and measurable characteristics for each outcome. Let’s consider a common scenario: a team adopts DevOps to increase their velocity. High velocity is a common trait among DevOps teams, plus it may be measured, tracked, and used to prioritize decisions. In this post, we’ll look at the traits and measurements of DevOps success.

Before we dive into it, check out Cloud Academy’s Recipe for DevOps Success webinar in collaboration with Capital One and don’t forget to have a look at Cloud Roster, the job role matrix that shows you what kind of technologies that DevOps professionals should be familiar with. If you’re responsible for implementing DevOps practices at your company, we also suggest reading The Four Tactics for Cultural Change in DevOps Adoption.

DevOps in Four Metrics

The DevOps Handbook defines three primary principles in a DevOps value stream. First, the principle of flow calls for fast feedback from development through production. This manifests as continuous delivery (or deployment). Second, the principle of feedback calls for fast feedback from production back to development. The idea is to understand production operations, such as performance, end-user activity, or outages, and react accordingly. This manifests itself as tracking and monitoring real-time telemetries like time series data, metrics, and logs. The third principle calls for experimentation and continuous learning. DevOps does not have an end state. It’s similar to a game where the goal is to just keep playing but such that you get better each time. Practically speaking this manifests itself more of an organizational zeitgeist that appears in the way team members approach their work, handle outages, and seek out improvements which may directly or indirectly impact other areas.

Accelerate correlates these ideas and technical practices with four metrics:

  1. Lead Time: The time from code written to entering production
  2. Deployment Frequency: How often deploys happen
  3. Mean-Time-To-Recover (MTTR): How quickly can teams restore service after production outages
  4. Change Fail Rate: What percentage of deploys result in service impairment or an outage

The first two metrics capture velocity and the second two measure quality. Applying the third principle (continuous experimentation and learning) should drive positive long term trends in all four metrics.

Measuring Lead Time

Lead time measures the time required to implement, test and deliver a single bit of functionality. Measuring lead time requires starting a clock when development starts and stopping it when said code enters production. Gathering this data requires integrating a feature/issue tracker and/or source control data. The final implementation depends on how your team works. Here’s one possible implementation.

Consider a Kanban board with three columns: Todo, Work In Progress and Done. Work-in-Progress is defined as any kind of ongoing work. Done is defined as delivered in production. Each item in the board has an ID. Team members move cards across the board as work completes. A developer takes a card from Todo into WIP when they begin work. Eventually, it’s deployed and a product owner verifies the functionality in production and moves the card to Done. The team may integrate with the Kanban board’s API to retrieve the timestamp when the card entered WIP and entered done, subtract the timestamps, then record the value. Values should also be tagged with “feature” or “bug” for grouping and analysis.

A second implementation could use pull-requests. A developer may open a PR when they begin work and close it when it’s verified in production. The lead time could be calculated from the time of the first commit (or when the PR was opened) to when it was merged (signifying that code has been accepted in production). This solution depends on how the team works with branches.

The technical process for measuring lead time in your team is really an implementation detail. It’s more important to pick the start and stop points and automate the measurement from there. If it’s not automated, then it won’t happen.

Measuring Deployment Frequency

Deployment frequency is the easiest to measure out of the bunch. It’s a simple counter increment each time a deploy happens. Technically speaking this could be pulled from your pipeline provider’s API or tracked via dedicated step at the end of the pipeline. Values should be tagged with the project and team member. Tracking this value yields a “deploys per day” metric. You may take it even deeper by going to into “deploys per day per developer” when things really start to pick up.

You should also consider tracking a correlated metric. Batch size strongly correlates with velocity. Smaller batches are easier to implement, test and deploy. This is intuitive. Put a 10,000 line PR to code review compared to a 1,000 line PR. We know which one will be reviewed and merged faster. This metric could be measured as the total diff in each PR.

Measuring Mean Time to Recover

MTTR is measured as the time from when impairment occurs until the time it’s resolved, then the averaging of all those values.

Any outage or pager software, such as PagerDuty, VictorOps, or Opsgenie, will provide this value for you. If you don’t already have software in place, then you’ll need to track the values somehow. Manual entry can suffice here (assuming the production issues are seldom enough) until an automated system is available.

Measuring Change Failure Rate

Change failure rate is measured as the total failed deploys divided by the total deploys. The second metric (Deployment Frequency) provides that value. That leaves the first value. A simple implementation may just count the number of outages or production issues coming from monitoring software (again, such as Pagerduty or VictorOps). That may be fine to start but ignores a subtle distinction. This metric should reveal the flaws in your deployment pipeline and not the outside world. Code and changes the team introduces and the outside world may both cause failures, but we’re only interested in one. Consider the case where your infrastructure provider goes down (yes, even AWS goes down and it takes large portions of the internet down with it). That shouldn’t count against you. If you can, it may be worth it to flag production issues as such and tag them with the related system. This data can be used to calculate the global failure and also the failure rate per system.

Conclusion: Setting High-Performance Targets

Now your team is equipped with the 4 metrics of IT performance. Let’s see how those stack up against industry leaders. Accelerate’s 2017 research shows high value for each metric:

  • Deployment Frequency: On-demand (multiple deploys per day)
  • Lead Time: Less than an hour (Small batches required!)
  • MTTR: Less than one hour
  • Change Failure Rate: 0-15%

Your team may not be hitting targets right away. However, the team can accomplish anything with a direction, measured progress, and strong leadership. These topics also dovetail into Site Reliability Engineering, the discipline of improving reliability and removing toil from everyday work. You’ll find that good SREs care deeply about these numbers.

I’ll leave with a quote from Accelerate that emphasizes the importance of these metrics and false premise that going fast means sacrificing stability or speed1:

Astonishingly, these results demonstrate that there is no tradeoff between improving performance and achieving higher levels of stability and quality. Rather, high performers do better at all of these measures. This is precisely what the Agile and Lean movements predict, but much dogma in our industry still rests on the false assumption that moving faster means trading off against other performance goals, rather than enabling and reinforcing them.

Now you’re equipped with context and data. It’s time to start. CloudAcademy has courses and learning paths to guide you towards DevOps success. The DevOps Fundamentals learning path will guide you through implementing the first principle (principle of flow) and the second principle (principle of feedback). After that remember that DevOps has no end goal. Focus on iterating, experimenting, learning and improving. Any team can achieve DevOps success.

  1. Forsgren Ph.D., Nicole. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution Press. Kindle Edition. ↩︎

 

Avatar

Written by

Adam Hawkins

Passionate traveler (currently in Bangalore, India), Trance addict, Devops, Continuous Deployment advocate. I lead the SRE team at Saltside where we manage ~400 containers in production. I also manage Slashdeploy.


Related Posts

Avatar
Adam Hawkins
— September 13, 2019

How Google, HP, and Etsy Succeed with DevOps

DevOps is currently well developed, and there are many examples of companies adopting it to improve their existing practices and explore new frontiers. In this article, we'll take a look at case studies and use cases from Google, HP, and Etsy. These companies are having success with Dev...

Read more
  • Continuous Learning
  • DevOps
  • Velocity
Chris Gambino
Chris Gambino
— August 28, 2019

How to Accelerate Development in the Cloud

Understanding how to accelerate development in the cloud can prevent typical challenges that developers face in a traditional enterprise. While there are many benefits to switching to a cloud-first model, the most immediate one is accelerated development and testing. The road blocks tha...

Read more
  • deploy
  • deployment acceleration
  • development
  • DevOps
Avatar
Adam Hawkins
— August 9, 2019

DevSecOps: How to Secure DevOps Environments

Security has been a friction point when discussing DevOps. This stems from the assumption that DevOps teams move too fast to handle security concerns. This makes sense if Information Security (InfoSec) is separate from the DevOps value stream, or if development velocity exceeds the band...

Read more
  • AWS
  • cloud security
  • DevOps
  • DevSecOps
  • Security
Valery Calderón Briz
Valery Calderón Briz
— August 8, 2019

Understanding Python Datetime Handling

Communicating dates and times with another person is pretty simple... right? “See you at 6 o’clock on Monday” sounds understandable. But was it a.m. or p.m.? And was your friend in the same time zone as you when you said that? When we need to use and store dates and times on Pytho...

Read more
  • DevOps
  • Python
  • Python datetime
  • Unix timestamp
Alisha Reyes
Alisha Reyes
— July 22, 2019

Cloud Academy’s Blog Digest: July 2019

July has been a very exciting month for us at Cloud Academy. On July 10, we officially joined forces with QA, the UK’s largest B2B skills provider (read the announcement). Over the coming weeks, you will see additions from QA’s massive catalog of 500+ certification courses and 1500+ ins...

Read more
  • AWS
  • Azure
  • Cloud Academy
  • Cybersecurity
  • DevOps
  • Kubernetes
Avatar
Adam Hawkins
— July 17, 2019

How to Become a DevOps Engineer

The DevOps Handbook introduces DevOps as a framework for improving the process for converting a business hypothesis into a technology-enabled service that delivers value to the customer. This process is called the value stream. Accelerate finds that applying DevOps principles of flow, f...

Read more
  • AWS
  • AWS Certifications
  • DevOps
  • DevOps Foundation Certification
  • Engineer
  • Kubernetes
Avatar
Adam Hawkins
— July 9, 2019

Top 20 Open Source Tools for DevOps Success

Open source tools perform a very specific task, and the source code is openly published for use or modification free of charge. I've written about DevOps multiple times on this blog. I reiterate the point that DevOps is not about specific tools. It's a philosophy for building and improv...

Read more
  • Ansible
  • Chef
  • configuration management
  • DevOps
  • devops tools
  • Docker
  • infrastructure-as-code
  • Kubernetes
  • telemetry
Avatar
Adam Hawkins
— July 2, 2019

DevOps: Scaling Velocity and Increasing Quality

All software teams strive to build better software and ship it faster. That's a competitive edge required to survive in the Age of Software. DevOps is the best methodology to leverage that competitive advantage, ultimately allowing practitioners to accelerate software delivery and raise...

Read more
  • continuous delivery
  • DevOps
  • software
Avatar
Adam Hawkins
— June 13, 2019

Continuous Deployment: What’s the Point?

Continuous Deployment is the pinnacle of high-performance software development. Continuous deployment teams deploy every commit that passes tests to production, and there's nothing faster than that. Even though you'll see the "CD" term thrown around the internet, continuous deployment a...

Read more
  • Development & Deploy
  • DevOps
Avatar
Adam Hawkins
— May 31, 2019

DevOps Telemetry: Open Source vs Cloud vs Third Party

The DevOps principle of feedback calls for business, application, and infrastructure telemetry. While telemetry is important for engineers when debugging production issues or setting base operational conditions, it is also important to product owners and business stakeholders because it...

Read more
  • Analytics
  • DevOps
Avatar
Adam Hawkins
— April 16, 2019

The Convergence of DevOps

IT has changed over the past 10 years with the adoption of cloud computing, continuous delivery, and significantly better telemetry tools. These technologies have spawned an entirely new container ecosystem, demonstrated the importance of strong security practices, and have been a catal...

Read more
  • DevOps
  • Security
Avatar
Adam Hawkins
— March 21, 2019

How DevOps Increases System Security

The perception of DevOps and its role in the IT industry has changed over the last five years due to research, adoption, and experimentation. Accelerate: The Science of Lean Software and DevOps by Gene Kim, Jez Humble, and Nicole Forsgren makes data-backed predictions about how DevOps p...

Read more
  • DevOps
  • Security