How DevOps Increases System Security

The perception of DevOps and its role in the IT industry has changed over the last five years due to research, adoption, and experimentation. Accelerate: The Science of Lean Software and DevOps by Gene Kim, Jez Humble, and Nicole Forsgren makes data-backed predictions about how DevOps principles and practices yield better software in almost any measurable way and more successful businesses. Their research, along with others such as James Wickett and Josh Corman, former CTO of Sonatype and respected information security researcher, has centered around the concept of incorporating information security objectives into DevOps (a set of practices and principles they termed “Rugged DevOps”). Dr. Tapabrata Pal, Director and Platform Engineering Technical Fellow at Capital One, came up with similar ideas and described their processes as DevOpsSec, having dispelled the myth that DevOps and system security are orthogonal.

In fact, it’s the opposite. DevOps practices done right increases system security in the same way that continuous delivery increases stability.

The Three Ways of DevOps describe continuous delivery, production to development feedback, and constant learning. Continuous delivery requires developing software in incrementally small changes and verifying each change with automated tests across a deployment pipeline. The computerized pipeline offers teams multiple ways to improve security when compared to software development without an automated deployment pipeline.

Security issues are like any other software regression. They may be tested for so that they don’t occur in production. There are multiple ways to apply automated testing to InfoSec:

  • Scan container or VM images for known software vulnerabilities and fail builds that contain known problematic packages
  • Run static analysis tools for calls to potentially dangerous system calls and fail builds accordingly
  • Lint code for plain text secrets like API tokens or SSH keys and fail builds consequently
  • Run end-to-end tests, like those from OSWAP, against build artifacts

Adding these tests to the deployment pipeline dramatically increases security since it’s automated: this is known as a “shift left“. It ensures software is secure from the start, automatically, and throughout the pipeline.

Organizations often do not have enough InfoSec engineers to go around. That creates negative consequences since InfoSec checks are pushed to the end of the process and may only happen when there’s enough capacity. Consider for a moment just running your existing automated test suite when there was an extra engineer in the team. Accepting that proposition for automated functional testing is ludicrous in modern IT, why allow the same for InfoSec testing? Adding InfoSec tests to the pipeline verifies each change and scales out with the organization. The deployment pipeline is a bigger force for change than a few engineers. More importantly, adding tests exposes issues to everyone and shifts responsibility to the code author to patch the regression.

Automated tests ensure known regressions do not enter production. However, they do not guard against attacks and other malicious activity in production. Teams need to track and alert on telemetry data that indicates malicious activity or other red flags in production. This is the second way of DevOps that establishes feedback from production to development. Teams already have production telemetry for latency, request count, and active users, and so on, so InfoSec telemetry should be integrated as well. Examples include:

  • SSH connections
  • User logins
  • Password resets
  • Malicious SQL queries
  • Malformed requests that may indicate probing or other malicious activity
  • Email address (or additional login information) changes
  • Billing or payment information changes
  • Infrastructure security group or firewall changes
  • XSS attacks
  • Infrastructure changes such as network, new system users, or modified file checksums
  • Privilege escalation (e.g., sudo calls)

This kind of telemetry data is critical to understanding how the system is being used in production. Based on this insight, teams can action by adding regression tests to the pipeline having identified potential problems, resulting in an increased security posture for production. More importantly, it increases visibility. Security changes are more likely to occur when a team realizes they’re under attack.

Nick Galberth from Etsy echoes this sentiment1 after graphing security telemetry:

“One of the results of showing this graph was that developers realized that they were being attacked all the time! And that was awesome because it changed how developers thought about the security of their code as they were writing the code.”

This practice also aids scenarios where pre-production testing and compliance checks are not enough. Accelerate includes a troubling case study of an ATM vendor that demonstrates production InfoSec telemetry’s value. The company noticed their ATMs were put into maintenance mode at unscheduled times. This allowed the attacker to physically extract cash from the machine. A developer installed the backdoor years ago. Apparently, backdoors of this type are difficult or near impossible to detect beforehand. However, the production telemetry detected the anomaly and alerted the team. The team proactively found the fraud and resolved the issue before the scheduled cash audit process.

These examples demonstrate how DevOps practices improve system security. First, like any other aspect of software, add automated tests to the deployment pipeline. Second, add production telemetry to production to direct development changes. The third way calls for learning and experimentation to further improve the software development process. Unfortunately, sometimes teams will miss this aspect. DevOps establishes feedback loops, and the third way continuously improves them to reduce toil, reduce bugs, and/or adapt to changing conditions.

Compliance and auditing is a common pain point. It slows down the process since documentation has to be produced and manual reviews are required. This doesn’t have to halt the process. Automation can drastically improve the compliance and auditing process by removing toil. The Google SRE Book defines toil as “the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” Accelerate includes a case study of 18F and Cloud.gov.

The case study demonstrates a government organization implementing an automated process for writing system security plans (SSP) and obtaining a right to operate from the designated authority. The SSP plans must be reviewed. They’re often a hundred of pages and highly detailed. Creating and maintaining them manually is impossible in a dynamic cloud environment. 18F created a tool that automatically generates an SSP into YML which can be transformed into PDFs or published as GitBooks for internal and external review saving immeasurable amounts of man-hours (and increasing happiness in the process). Private sector IT companies tend to have a more relaxed level of regulation. Regardless, the same compliance and auditing techniques can be and should be leveraged to reduce ongoing effort and toil.

Similar approaches may be used in downstream auditing and compliance processes. Given the production telemetry systems contain InfoSec data, they may be exposed to auditors in a self-service way during reviews. Auditors can check control like appropriate logging or specific event handling. The deployment pipeline also provides a complete change history for the application in production. It’s possible to generate compliance reports using the code, the deployment pipeline, and other automation. This approach again reduces toil for all involved, increases accuracy, and ideally leads to more completed audits.

DevOps is the best way for modern IT to build, test, and ship software. The Three Ways provide a framework for understanding how and why to approach software development problems. Changing and improving InfoSec is not so different than what the cloud and continuous delivery did to software. Everything stems from the idea that increasing frequency decreases difficulty. It saw teams go from deploying quarterly to measuring deploys-per-day per developer. That’s an astonishing velocity improvement. It can affect the same change by applying the three ways to InfoSec outcomes: automated testing, production telemetry, and continuous learning and improvement. Applying all three builds a culture of continuous verification that ultimately raises the security floor across the industry. That sounds like a textbook case of increasing security today and in the future.

  1. Kim, Gene; Humble, Jez; Debois, Patrick; Willis, John. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Kindle Locations 5850-5852). IT Revolution Press. Kindle Edition. ↩︎

If you liked this post, you’ll enjoy:

Cloud Academy