The perception of DevOps and its role in the IT industry has changed over the last five years due to research, adoption, and experimentation. Accelerate: The Science of Lean Software and DevOps by Gene Kim, Jez Humble, and Nicole Forsgren makes data-backed predictions about how DevOps principles and practices yield better software in almost any measurable way and more successful businesses. Their research, along with others such as James Wickett and Josh Corman, former CTO of Sonatype and respected information security researcher, has centered around the concept of incorporating information security objectives into DevOps (a set of practices and principles they termed “Rugged DevOps”). Dr. Tapabrata Pal, Director and Platform Engineering Technical Fellow at Capital One, came up with similar ideas and described their processes as DevOpsSec, having dispelled the myth that DevOps and system security are orthogonal.
In fact, it’s the opposite. DevOps practices done right increases system security in the same way that continuous delivery increases stability.
The Three Ways of DevOps describe continuous delivery, production to development feedback, and constant learning. Continuous delivery requires developing software in incrementally small changes and verifying each change with automated tests across a deployment pipeline. The computerized pipeline offers teams multiple ways to improve security when compared to software development without an automated deployment pipeline.
Security issues are like any other software regression. They may be tested for so that they don’t occur in production. There are multiple ways to apply automated testing to InfoSec:
- Scan container or VM images for known software vulnerabilities and fail builds that contain known problematic packages
- Run static analysis tools for calls to potentially dangerous system calls and fail builds accordingly
- Lint code for plain text secrets like API tokens or SSH keys and fail builds consequently
- Run end-to-end tests, like those from OSWAP, against build artifacts
Adding these tests to the deployment pipeline dramatically increases security since it’s automated: this is known as a “shift left“. It ensures software is secure from the start, automatically, and throughout the pipeline.
Organizations often do not have enough InfoSec engineers to go around. That creates negative consequences since InfoSec checks are pushed to the end of the process and may only happen when there’s enough capacity. Consider for a moment just running your existing automated test suite when there was an extra engineer in the team. Accepting that proposition for automated functional testing is ludicrous in modern IT, why allow the same for InfoSec testing? Adding InfoSec tests to the pipeline verifies each change and scales out with the organization. The deployment pipeline is a bigger force for change than a few engineers. More importantly, adding tests exposes issues to everyone and shifts responsibility to the code author to patch the regression.
Automated tests ensure known regressions do not enter production. However, they do not guard against attacks and other malicious activity in production. Teams need to track and alert on telemetry data that indicates malicious activity or other red flags in production. This is the second way of DevOps that establishes feedback from production to development. Teams already have production telemetry for latency, request count, and active users, and so on, so InfoSec telemetry should be integrated as well. Examples include:
- SSH connections
- User logins
- Password resets
- Malicious SQL queries
- Malformed requests that may indicate probing or other malicious activity
- Email address (or additional login information) changes
- Billing or payment information changes
- Infrastructure security group or firewall changes
- XSS attacks
- Infrastructure changes such as network, new system users, or modified file checksums
- Privilege escalation (e.g., sudo calls)
This kind of telemetry data is critical to understanding how the system is being used in production. Based on this insight, teams can action by adding regression tests to the pipeline having identified potential problems, resulting in an increased security posture for production. More importantly, it increases visibility. Security changes are more likely to occur when a team realizes they’re under attack.
Nick Galberth from Etsy echoes this sentiment1 after graphing security telemetry:
“One of the results of showing this graph was that developers realized that they were being attacked all the time! And that was awesome because it changed how developers thought about the security of their code as they were writing the code.”
This practice also aids scenarios where pre-production testing and compliance checks are not enough. Accelerate includes a troubling case study of an ATM vendor that demonstrates production InfoSec telemetry’s value. The company noticed their ATMs were put into maintenance mode at unscheduled times. This allowed the attacker to physically extract cash from the machine. A developer installed the backdoor years ago. Apparently, backdoors of this type are difficult or near impossible to detect beforehand. However, the production telemetry detected the anomaly and alerted the team. The team proactively found the fraud and resolved the issue before the scheduled cash audit process.
These examples demonstrate how DevOps practices improve system security. First, like any other aspect of software, add automated tests to the deployment pipeline. Second, add production telemetry to production to direct development changes. The third way calls for learning and experimentation to further improve the software development process. Unfortunately, sometimes teams will miss this aspect. DevOps establishes feedback loops, and the third way continuously improves them to reduce toil, reduce bugs, and/or adapt to changing conditions.
Compliance and auditing is a common pain point. It slows down the process since documentation has to be produced and manual reviews are required. This doesn’t have to halt the process. Automation can drastically improve the compliance and auditing process by removing toil. The Google SRE Book defines toil as “the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” Accelerate includes a case study of 18F and Cloud.gov.
The case study demonstrates a government organization implementing an automated process for writing system security plans (SSP) and obtaining a right to operate from the designated authority. The SSP plans must be reviewed. They’re often a hundred of pages and highly detailed. Creating and maintaining them manually is impossible in a dynamic cloud environment. 18F created a tool that automatically generates an SSP into YML which can be transformed into PDFs or published as GitBooks for internal and external review saving immeasurable amounts of man-hours (and increasing happiness in the process). Private sector IT companies tend to have a more relaxed level of regulation. Regardless, the same compliance and auditing techniques can be and should be leveraged to reduce ongoing effort and toil.
Similar approaches may be used in downstream auditing and compliance processes. Given the production telemetry systems contain InfoSec data, they may be exposed to auditors in a self-service way during reviews. Auditors can check control like appropriate logging or specific event handling. The deployment pipeline also provides a complete change history for the application in production. It’s possible to generate compliance reports using the code, the deployment pipeline, and other automation. This approach again reduces toil for all involved, increases accuracy, and ideally leads to more completed audits.
DevOps is the best way for modern IT to build, test, and ship software. The Three Ways provide a framework for understanding how and why to approach software development problems. Changing and improving InfoSec is not so different than what the cloud and continuous delivery did to software. Everything stems from the idea that increasing frequency decreases difficulty. It saw teams go from deploying quarterly to measuring deploys-per-day per developer. That’s an astonishing velocity improvement. It can affect the same change by applying the three ways to InfoSec outcomes: automated testing, production telemetry, and continuous learning and improvement. Applying all three builds a culture of continuous verification that ultimately raises the security floor across the industry. That sounds like a textbook case of increasing security today and in the future.
- Kim, Gene; Humble, Jez; Debois, Patrick; Willis, John. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Kindle Locations 5850-5852). IT Revolution Press. Kindle Edition. ↩︎
If you liked this post, you’ll enjoy:
How Google, HP, and Etsy Succeed with DevOps
DevOps is currently well developed, and there are many examples of companies adopting it to improve their existing practices and explore new frontiers. In this article, we'll take a look at case studies and use cases from Google, HP, and Etsy. These companies are having success with Dev...
How to Accelerate Development in the Cloud
Understanding how to accelerate development in the cloud can prevent typical challenges that developers face in a traditional enterprise. While there are many benefits to switching to a cloud-first model, the most immediate one is accelerated development and testing. The road blocks tha...
DevSecOps: How to Secure DevOps Environments
Security has been a friction point when discussing DevOps. This stems from the assumption that DevOps teams move too fast to handle security concerns. This makes sense if Information Security (InfoSec) is separate from the DevOps value stream, or if development velocity exceeds the band...
Understanding Python Datetime Handling
Communicating dates and times with another person is pretty simple... right? “See you at 6 o’clock on Monday” sounds understandable. But was it a.m. or p.m.? And was your friend in the same time zone as you when you said that? When we need to use and store dates and times on Pytho...
Cloud Academy’s Blog Digest: July 2019
July has been a very exciting month for us at Cloud Academy. On July 10, we officially joined forces with QA, the UK’s largest B2B skills provider (read the announcement). Over the coming weeks, you will see additions from QA’s massive catalog of 500+ certification courses and 1500+ ins...
How to Become a DevOps Engineer
The DevOps Handbook introduces DevOps as a framework for improving the process for converting a business hypothesis into a technology-enabled service that delivers value to the customer. This process is called the value stream. Accelerate finds that applying DevOps principles of flow, f...
Top 20 Open Source Tools for DevOps Success
Open source tools perform a very specific task, and the source code is openly published for use or modification free of charge. I've written about DevOps multiple times on this blog. I reiterate the point that DevOps is not about specific tools. It's a philosophy for building and improv...
DevOps: Scaling Velocity and Increasing Quality
All software teams strive to build better software and ship it faster. That's a competitive edge required to survive in the Age of Software. DevOps is the best methodology to leverage that competitive advantage, ultimately allowing practitioners to accelerate software delivery and raise...
Continuous Deployment: What’s the Point?
Continuous Deployment is the pinnacle of high-performance software development. Continuous deployment teams deploy every commit that passes tests to production, and there's nothing faster than that. Even though you'll see the "CD" term thrown around the internet, continuous deployment a...
DevOps Telemetry: Open Source vs Cloud vs Third Party
The DevOps principle of feedback calls for business, application, and infrastructure telemetry. While telemetry is important for engineers when debugging production issues or setting base operational conditions, it is also important to product owners and business stakeholders because it...
The Convergence of DevOps
IT has changed over the past 10 years with the adoption of cloud computing, continuous delivery, and significantly better telemetry tools. These technologies have spawned an entirely new container ecosystem, demonstrated the importance of strong security practices, and have been a catal...
Measuring DevOps Success: What, Where, and How
The DevOps methodology relates technical and organization practices so it's difficult to simply ascribe a number and say "our organization is a B+ on DevOps!" Things don't work that way. A better approach identifies intended outcomes and measurable characteristics for each outcome. Let'...