What DevOps Means for Risk Management

What Does DevOps Mean for Risk Management?

Adopting DevOps makes the unfamiliar uneasy in two areas. One, they see an inherently risky choice between speed and quality and second, they are concerned that the quick iterations of DevOps may break compliance rules or introduce security vulnerabilities. Both concerns stem from the core principle of DevOps, which transforms organizational culture and advocates for faster cadence and cycle time. It’s possible to see how moving faster produces less stable systems. This post explores the ways DevOps mitigates risk and yields safer and more stable systems. If you’re generally interested in DevOps, you can also check out my other post on How DevOps Transforms Software Testing.

Choosing Between Speed and Quality is Wrong

In business, outcomes can be cheap, fast, and good, but you’re only allowed to pick two. This framework assesses three common dimensions of project management. It’s intuitive if you assume each is exclusive and that’s where DevOps differs. You can take our word for it.

Accelerate: The Science of Lean Software and DevOps – Building and Scaling High Performing Technology Organizations quantifies IT performance and identifies the best practices. Martin Fowler, Chief Scientist at Thoughtworks, refutes the framing of speed over quality directly in the foreword. He says:

This huge increase in responsiveness does not come at a cost in stability, since these organizations find their updates cause failures at a fraction of the rate of their less-performing peers, and these failures are usually fixed within the hour. Their evidence refutes the bimodal IT notion that you have to choose between speed and stability—instead, speed depends on stability, so good IT practices give you both.1

Fowler identifies the dependencies between speed and stability (a dimension of quality). Fowler refers to two important findings in Accelerate. High performing teams have:

  • 170 times faster MTTR (mean time to recover) from downtime.2
  • 5 times lower change failure rate (or 1/5 as likely as a chance to fail). 3

These two data points demonstrate that DevOps teams are less risky because deploys fail less often and when they do, issues are resolved quickly. Contrast this with systems that are deployed a few times a year where failed changes are more likely and outages are far more costly. How does DevOps do it?

Reduce Batch Sizes with Trunk-Based Development

The first goal of DevOps is to reduce the time needed from commit to production. This is called “lead time”. Reducing the batch size is the simplest way to drop lead times.

Imagine batch size as the size of a commit. Working incrementally through smaller commits is easier to understand, develop, and test. Additionally, small commits are more easily verified in production. If the only modification is a 5-line method change that correlates with increased load or other negative operational conditions, then it’s easy to locate the problem. On the flip side, if the change was 5,000 lines across multiple different areas, then the problem becomes much more difficult to isolate and identify.

Reducing batch sizes is the first step. Aligned to this, DevOps encourages changing the relationship with source control. The DevOps Handbook encourages trunk-based development, which means developers check in their code to “trunk” (or master, or mainline) at least once a day. Trunk-based development keeps commits smaller since larger ones will be harder to integrate each day. Most importantly, trunk-based development combined with continuous integration ensures that each commit keeps the entire system in a releasable state.

This mitigates risk in two ways. First, since every commit is kept in a deployable state, there’s no need for separate test and stabilization phases at the end of the project. These late-stage phases are the riskiest and tend to negatively impact delivery. Second, trunk-based development lays the foundation for automated deployment pipelines which expand over time to add increasingly rigorous tests.

By the way, Cloud Academy has also created a DevOps Playbook, divided into Part 1 and Part 2: check them out if you’re interested in learning more about DevOps.

Automating InfoSec

The fast-paced world of DevOps appears at odds with the slower moving world of Information Security (InfoSec). This originates from common processes that push the concerns of InfoSec to the tail end of projects, making security resolution more difficult and costly. This is true of any part of SDLC, but often more difficult with InfoSec compliance where releases must be verified before going into production. Small numbers of InfoSec engineers can also exacerbate the problem. James Wicket, one of the creators of the GauntIt security tool and organizer of DevOps Days Austin says:

One interpretation of DevOps is that it came from the need to enable developers productivity, because as the number of developers grew, there weren’t enough Ops people to handle all the resulting deployment work. This shortage is even worse in InfoSec—the ratio of engineers in Development, Operations, and InfoSec in a typical technology organization is 100:10:1.4

Operations and development have faced similar issues. The DevOps solution is to automate as much as possible from environment provisioning and software deployment. Automation makes processes robust, correct and frees up engineering time for other work. DevOps offers the same solution for InfoSec: first “shift left” by engaging InfoSec goals with feature teams as early as possible in the process. Second, automate compliance testing in the deployment pipeline as much as possible. This frees up InfoSec staff for more exploratory work, exposes concerns to the whole team, and most importantly, ensures each change is compliant.

The DevOps Handbook recommends some ways to start:

  1. Add static analysis tools to the deployment pipeline. Static analysis can catch coding style errors and also identify security vulnerabilities likes calls to blacklisted system methods like exec.
  2. Add vulnerability scanning to the deployment pipeline. Vulnerability scanning vets application dependencies and system packages for known security vulnerabilities. This can catch Docker images with unpatched OpenSSL packages or unpatched frameworks like Ruby on Rails.
  3. Add dynamic analysis tools such as OWASP ZAP or Arachni that test running applications for known vulnerabilities.
  4. Integrate InfoSec and another production telemetry. Examples include counters on password resets, logins, or second-factor challenges. Other examples may be core dumps or malformed database queries (indicating an attack). Integrating this information into telemetry emphasizes the “shift left” since the entire team has access to the telemetry, thus more people can understand and diagnose security issues in real time.

These are starting points towards the goal of integrating InfoSec objectives into the team’s daily work. Done well, this increases developer and operations efficiency while increasing security.

Adopting these practices also requires committing to continuous improvement. Teams starting out can adopt these practices and rule out an entire class of InfoSec regressions. As the team learns over time, the tests increase in rigor and continually raise the quality floor and ultimately reducing risk across the SDLC.

This post has covered two areas: speed over stability and InfoSec. However, the risk isn’t exclusive to two areas. Technical debt is arguably the riskiest part of a long-term project and applying the DevOps principle of automation may help teams in a new way.

Mitigating Risk in Dependency Upgrades

GitHub recently announced they completed their Rails upgrade from 3.2 to 5.2. Rails 3.2 was released in January 2012 and 5.2 was released in April 2018. GitHub built a system to run the application in different Rails versions allowing an incremental upgrade from 3.x, to 4.x, and finally to 5.x.

The Rails upgrade took a year and a half. This was for a few reasons. First, Rails upgrades weren’t always smooth and some versions had major breaking changes. Rails improved the upgrade process for the 5 series so this meant that while 3.2 to 4.2 took 1 year, 4.2 to 5.2 only took 5 months.5

Software upgrades are a necessary evil and they can be downright sinister when put off. GitHub made it more difficult for themselves by delaying upgrades which created a situation where a major upgrade could not be completed in a single go. This problem is also discussed in “Building Evolutionary Architectures” by Thoughtworks employees Neal Ford, Rebecca Parsons, and Patrick Kua.

The authors apply DevOps automation to dependency updates. They propose “fluid dependencies”. The idea is that the deployment pipeline can detect a fluid dependency and attempt a build with the latest version of that dependency. If the build passes, then the application may be upgraded. The deployment pipeline can also automate the commit processes to make dependency upgrades seamless. This approach removes a chore from the backlog and mitigates un-checked technical debt that can occur from not upgrading dependencies. Unfortunately, there are no such tools available right now, but the idea is worth exploring.

Conclusion

This post explored the ways in which DevOps can be used to mitigate risk across the SDLC. First, it addresses the misconception that IT teams must choose between speed and quality and risk associated with moving fast (and breaking things). DevOps done well provides both speed and quality. Second, the post covers how applying the DevOps mindset of automation and incorporating a “shift left” mindset in InfoSec. Shifting left with automation brings InfoSec concerns to the forefront of everyone within the team, whilst testing automation ensures everyone’s changes are always in compliance. Lastly, the post touched on the idea of mitigating and possibly eliminating risks around critical dependency upgrades with “fluid dependencies”.

Adopting all these practices may not eliminate risk completely, however they are proven to reduce and minimize risk, speed up cycle time, and improve quality. So DevOps isn’t the riskiest thing to try, right now it’s just the way modern IT business is done! Are you ready to implement DevOps now? Get inspired by the 10 Ingredients for DevOps Transformation with Mark Andersen.

  1. Forsgren PhD, Nicole. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (Kindle Locations 146-149). IT Revolution Press. Kindle Edition. ↩︎
  2. Forsgren PhD, Nicole. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (Kindle Location 434). IT Revolution Press. Kindle Edition. ↩︎
  3. Forsgren PhD, Nicole. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (Kindle Location 434). IT Revolution Press. Kindle Edition. ↩︎
  4. Kim, Gene; Humble, Jez; Debois, Patrick; Willis, John. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Kindle Locations 5570-5573). IT Revolution Press. Kindle Edition. ↩︎
  5. https://githubengineering.com/upgrading-github-from-rails-3-2-to-5-2/ ↩︎

If you want to learn more about DevOps, you might also like: 

Avatar

Written by

Adam Hawkins

Passionate traveler (currently in Bangalore, India), Trance addict, Devops, Continuous Deployment advocate. I lead the SRE team at Saltside where we manage ~400 containers in production. I also manage Slashdeploy.


Related Posts

Massimo Maino
Massimo Maino
— October 23, 2020

Using Docker to Deploy and Optimize WordPress at Scale

Here at Cloud Academy, we use WordPress to serve our blog and product/public pages, such as the home page, the pricing page, etc. Why WordPress? With WordPress, the marketing and content teams can quickly and easily change the look & feel and the content of the pages, without rein...

Read more
  • DevOps
  • Wordpress
Joe Nemer
Joe Nemer
— October 14, 2020

New Content: AWS Data Analytics – Specialty Certification, Azure AI-900 Certification, Plus New Learning Paths, Courses, Labs, and More

This month our Content Team released two big certification Learning Paths: the AWS Certified Data Analytics - Speciality, and the Azure AI Fundamentals AI-900. In total, we released four new Learning Paths, 16 courses, 24 assessments, and 11 labs.  New content on Cloud Academy At any ...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Joe Nemer
Joe Nemer
— September 15, 2020

New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More

This past month our Content Team served up a heaping spoonful of new and updated content. Not only did our experts release the brand new Azure DP-100 Certification Learning Path, but they also created 18 new hands-on labs — and so much more! New content on Cloud Academy At any time, y...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Simran Arora
Simran Arora
— August 21, 2020

Docker Image Security: Get it in Your Sights

For organizations and individuals alike, the adoption of Docker is increasing exponentially with no signs of slowing down. Why is this? Because Docker provides a whole host of features that make it easy to create, deploy, and manage your applications. This useful technology is especiall...

Read more
  • DevOps
  • Docker
  • Security
Avatar
Andrew Larkin
— August 18, 2020

Constant Content: Cloud Academy’s Q3 2020 Roadmap

Hello —  Andy Larkin here, VP of Content at Cloud Academy. I am pleased to release our roadmap for the next three months of 2020 — August through October. Let me walk you through the content we have planned for you and how this content can help you gain skills, get certified, and...

Read more
  • alibaba
  • AWS
  • Azure
  • content roadmap
  • Content updates
  • DevOps
  • GCP
  • Google Cloud
  • New content
Alisha Reyes
Alisha Reyes
— August 5, 2020

New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More

This month our Content Team did an amazing job at publishing and updating a ton of new content. Not only did our experts release the brand new AZ-303 and AZ-304 Certification Learning Paths, but they also created 16 new hands-on labs — and so much more! New content on Cloud Academy At...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Alisha Reyes
Alisha Reyes
— July 2, 2020

New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More

This month, our Content Team released a whopping 13 new labs in real cloud environments! If you haven't tried out our labs, you might not understand why we think that number is so impressive. Our labs are not “simulated” experiences — they are real cloud environments using accounts on A...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Alisha Reyes
Alisha Reyes
— June 11, 2020

New Content: AZ-500 and AZ-400 Updates, 3 Google Professional Exam Preps, Practical ML Learning Path, C# Programming, and More

This month, our Content Team released tons of new content and labs in real cloud environments. Not only that, but we introduced our very first highly interactive "Office Hours" webinar. This webinar, Acing the AWS Solutions Architect Associate Certification, started with a quick overvie...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Luca Casartelli
Luca Casartelli
— June 1, 2020

DevOps: Why Is It Important to Decouple Deployment From Release?

Deployment and release In enterprise organizations, releases are the final step of a long process that, historically, could take months — or even worse — years. Small companies and startups aren’t immune to this. Minimum viable product (MVP) over MVP and fast iterations could lead to t...

Read more
  • decoupling
  • Deployment
  • DevOps
  • engineering
  • Release
Luca Casartelli
Luca Casartelli
— May 14, 2020

DevOps Principles: My Journey as a Software Engineer

I spent the last month reading The DevOps Handbook, a great book regarding DevOps principles, and how tech organizations evolved and succeeded in applying them. As a software engineer, you may think that DevOps is a bunch of people that deploy your code on production, and who are alw...

Read more
  • DevOps
  • DevOps principles
Michael Dehoyos
Michael Dehoyos
— May 13, 2020

Linux and DevOps: The Most Suitable Distribution

Modern Linux and DevOps have much in common from a philosophy perspective. Both are focused on functionality, scalability, as well as on the constant possibility of growth and improvement. While Windows may still be the most widely used operating system, and by extension the most common...

Read more
  • DevOps
  • Linux
Avatar
Logan Rakai
— April 7, 2020

How to Effectively Use Azure DevOps

Azure DevOps is a suite of services that collaborate on software development following DevOps principles. The services in Azure DevOps are: Azure Repos for hosting Git repositories for source control of your code Azure Boards for planning and tracking your work using proven agil...

Read more
  • Azure
  • DevOps