What DevOps Means for Risk Management

What Does DevOps Mean for Risk Management?

Adopting DevOps makes the unfamiliar uneasy in two areas. One, they see an inherently risky choice between speed and quality and second, they are concerned that the quick iterations of DevOps may break compliance rules or introduce security vulnerabilities. Both concerns stem from the core principle of DevOps, which transforms organizational culture and advocates for faster cadence and cycle time. It’s possible to see how moving faster produces less stable systems. This post explores the ways DevOps mitigates risk and yields safer and more stable systems. If you’re generally interested in DevOps, you can also check out my other post on How DevOps Transforms Software Testing.

Choosing Between Speed and Quality is Wrong

In business, outcomes can be cheap, fast, and good, but you’re only allowed to pick two. This framework assesses three common dimensions of project management. It’s intuitive if you assume each is exclusive and that’s where DevOps differs. You can take our word for it.

Accelerate: The Science of Lean Software and DevOps – Building and Scaling High Performing Technology Organizations quantifies IT performance and identifies the best practices. Martin Fowler, Chief Scientist at Thoughtworks, refutes the framing of speed over quality directly in the foreword. He says:

This huge increase in responsiveness does not come at a cost in stability, since these organizations find their updates cause failures at a fraction of the rate of their less-performing peers, and these failures are usually fixed within the hour. Their evidence refutes the bimodal IT notion that you have to choose between speed and stability—instead, speed depends on stability, so good IT practices give you both.1

Fowler identifies the dependencies between speed and stability (a dimension of quality). Fowler refers to two important findings in Accelerate. High performing teams have:

  • 170 times faster MTTR (mean time to recover) from downtime.2
  • 5 times lower change failure rate (or 1/5 as likely as a chance to fail). 3

These two data points demonstrate that DevOps teams are less risky because deploys fail less often and when they do, issues are resolved quickly. Contrast this with systems that are deployed a few times a year where failed changes are more likely and outages are far more costly. How does DevOps do it?

Reduce Batch Sizes with Trunk-Based Development

The first goal of DevOps is to reduce the time needed from commit to production. This is called “lead time”. Reducing the batch size is the simplest way to drop lead times.

Imagine batch size as the size of a commit. Working incrementally through smaller commits is easier to understand, develop, and test. Additionally, small commits are more easily verified in production. If the only modification is a 5-line method change that correlates with increased load or other negative operational conditions, then it’s easy to locate the problem. On the flip side, if the change was 5,000 lines across multiple different areas, then the problem becomes much more difficult to isolate and identify.

Reducing batch sizes is the first step. Aligned to this, DevOps encourages changing the relationship with source control. The DevOps Handbook encourages trunk-based development, which means developers check in their code to “trunk” (or master, or mainline) at least once a day. Trunk-based development keeps commits smaller since larger ones will be harder to integrate each day. Most importantly, trunk-based development combined with continuous integration ensures that each commit keeps the entire system in a releasable state.

This mitigates risk in two ways. First, since every commit is kept in a deployable state, there’s no need for separate test and stabilization phases at the end of the project. These late-stage phases are the riskiest and tend to negatively impact delivery. Second, trunk-based development lays the foundation for automated deployment pipelines which expand over time to add increasingly rigorous tests.

By the way, Cloud Academy has also created a DevOps Playbook, divided into Part 1 and Part 2: check them out if you’re interested in learning more about DevOps.

Automating InfoSec

The fast-paced world of DevOps appears at odds with the slower moving world of Information Security (InfoSec). This originates from common processes that push the concerns of InfoSec to the tail end of projects, making security resolution more difficult and costly. This is true of any part of SDLC, but often more difficult with InfoSec compliance where releases must be verified before going into production. Small numbers of InfoSec engineers can also exacerbate the problem. James Wicket, one of the creators of the GauntIt security tool and organizer of DevOps Days Austin says:

One interpretation of DevOps is that it came from the need to enable developers productivity, because as the number of developers grew, there weren’t enough Ops people to handle all the resulting deployment work. This shortage is even worse in InfoSec—the ratio of engineers in Development, Operations, and InfoSec in a typical technology organization is 100:10:1.4

Operations and development have faced similar issues. The DevOps solution is to automate as much as possible from environment provisioning and software deployment. Automation makes processes robust, correct and frees up engineering time for other work. DevOps offers the same solution for InfoSec: first “shift left” by engaging InfoSec goals with feature teams as early as possible in the process. Second, automate compliance testing in the deployment pipeline as much as possible. This frees up InfoSec staff for more exploratory work, exposes concerns to the whole team, and most importantly, ensures each change is compliant.

The DevOps Handbook recommends some ways to start:

  1. Add static analysis tools to the deployment pipeline. Static analysis can catch coding style errors and also identify security vulnerabilities likes calls to blacklisted system methods like exec.
  2. Add vulnerability scanning to the deployment pipeline. Vulnerability scanning vets application dependencies and system packages for known security vulnerabilities. This can catch Docker images with unpatched OpenSSL packages or unpatched frameworks like Ruby on Rails.
  3. Add dynamic analysis tools such as OWASP ZAP or Arachni that test running applications for known vulnerabilities.
  4. Integrate InfoSec and another production telemetry. Examples include counters on password resets, logins, or second-factor challenges. Other examples may be core dumps or malformed database queries (indicating an attack). Integrating this information into telemetry emphasizes the “shift left” since the entire team has access to the telemetry, thus more people can understand and diagnose security issues in real time.

These are starting points towards the goal of integrating InfoSec objectives into the team’s daily work. Done well, this increases developer and operations efficiency while increasing security.

Adopting these practices also requires committing to continuous improvement. Teams starting out can adopt these practices and rule out an entire class of InfoSec regressions. As the team learns over time, the tests increase in rigor and continually raise the quality floor and ultimately reducing risk across the SDLC.

This post has covered two areas: speed over stability and InfoSec. However, the risk isn’t exclusive to two areas. Technical debt is arguably the riskiest part of a long-term project and applying the DevOps principle of automation may help teams in a new way.

Mitigating Risk in Dependency Upgrades

GitHub recently announced they completed their Rails upgrade from 3.2 to 5.2. Rails 3.2 was released in January 2012 and 5.2 was released in April 2018. GitHub built a system to run the application in different Rails versions allowing an incremental upgrade from 3.x, to 4.x, and finally to 5.x.

The Rails upgrade took a year and a half. This was for a few reasons. First, Rails upgrades weren’t always smooth and some versions had major breaking changes. Rails improved the upgrade process for the 5 series so this meant that while 3.2 to 4.2 took 1 year, 4.2 to 5.2 only took 5 months.5

Software upgrades are a necessary evil and they can be downright sinister when put off. GitHub made it more difficult for themselves by delaying upgrades which created a situation where a major upgrade could not be completed in a single go. This problem is also discussed in “Building Evolutionary Architectures” by Thoughtworks employees Neal Ford, Rebecca Parsons, and Patrick Kua.

The authors apply DevOps automation to dependency updates. They propose “fluid dependencies”. The idea is that the deployment pipeline can detect a fluid dependency and attempt a build with the latest version of that dependency. If the build passes, then the application may be upgraded. The deployment pipeline can also automate the commit processes to make dependency upgrades seamless. This approach removes a chore from the backlog and mitigates un-checked technical debt that can occur from not upgrading dependencies. Unfortunately, there are no such tools available right now, but the idea is worth exploring.

Conclusion

This post explored the ways in which DevOps can be used to mitigate risk across the SDLC. First, it addresses the misconception that IT teams must choose between speed and quality and risk associated with moving fast (and breaking things). DevOps done well provides both speed and quality. Second, the post covers how applying the DevOps mindset of automation and incorporating a “shift left” mindset in InfoSec. Shifting left with automation brings InfoSec concerns to the forefront of everyone within the team, whilst testing automation ensures everyone’s changes are always in compliance. Lastly, the post touched on the idea of mitigating and possibly eliminating risks around critical dependency upgrades with “fluid dependencies”.

Adopting all these practices may not eliminate risk completely, however they are proven to reduce and minimize risk, speed up cycle time, and improve quality. So DevOps isn’t the riskiest thing to try, right now it’s just the way modern IT business is done! Are you ready to implement DevOps now? Get inspired by the 10 Ingredients for DevOps Transformation with Mark Andersen.

  1. Forsgren PhD, Nicole. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (Kindle Locations 146-149). IT Revolution Press. Kindle Edition. ↩︎
  2. Forsgren PhD, Nicole. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (Kindle Location 434). IT Revolution Press. Kindle Edition. ↩︎
  3. Forsgren PhD, Nicole. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (Kindle Location 434). IT Revolution Press. Kindle Edition. ↩︎
  4. Kim, Gene; Humble, Jez; Debois, Patrick; Willis, John. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Kindle Locations 5570-5573). IT Revolution Press. Kindle Edition. ↩︎
  5. https://githubengineering.com/upgrading-github-from-rails-3-2-to-5-2/ ↩︎

If you want to learn more about DevOps, you might also like: 

Cloud Academy