The perception of DevOps and its role in the IT industry has changed over the last five years due to research, adoption, and experimentation. Accelerate: The Science of Lean Software and DevOps by Gene Kim, Jez Humble, and Nicole Forsgren makes data-backed predictions about how DevOps principles and practices yield better software in almost any measurable way and more successful businesses. Their research, along with others such as James Wickett and Josh Corman, former CTO of Sonatype and respected information security researcher, has centered around the concept of incorporating information security objectives into DevOps (a set of practices and principles they termed “Rugged DevOps”). Dr. Tapabrata Pal, Director and Platform Engineering Technical Fellow at Capital One, came up with similar ideas and described their processes as DevOpsSec, having dispelled the myth that DevOps and system security are orthogonal.
In fact, it’s the opposite. DevOps practices done right increases system security in the same way that continuous delivery increases stability.
The Three Ways of DevOps describe continuous delivery, production to development feedback, and constant learning. Continuous delivery requires developing software in incrementally small changes and verifying each change with automated tests across a deployment pipeline. The computerized pipeline offers teams multiple ways to improve security when compared to software development without an automated deployment pipeline.
Security issues are like any other software regression. They may be tested for so that they don’t occur in production. There are multiple ways to apply automated testing to InfoSec:
- Scan container or VM images for known software vulnerabilities and fail builds that contain known problematic packages
- Run static analysis tools for calls to potentially dangerous system calls and fail builds accordingly
- Lint code for plain text secrets like API tokens or SSH keys and fail builds consequently
- Run end-to-end tests, like those from OSWAP, against build artifacts
Adding these tests to the deployment pipeline dramatically increases security since it’s automated: this is known as a “shift left“. It ensures software is secure from the start, automatically, and throughout the pipeline.
Organizations often do not have enough InfoSec engineers to go around. That creates negative consequences since InfoSec checks are pushed to the end of the process and may only happen when there’s enough capacity. Consider for a moment just running your existing automated test suite when there was an extra engineer in the team. Accepting that proposition for automated functional testing is ludicrous in modern IT, why allow the same for InfoSec testing? Adding InfoSec tests to the pipeline verifies each change and scales out with the organization. The deployment pipeline is a bigger force for change than a few engineers. More importantly, adding tests exposes issues to everyone and shifts responsibility to the code author to patch the regression.
Automated tests ensure known regressions do not enter production. However, they do not guard against attacks and other malicious activity in production. Teams need to track and alert on telemetry data that indicates malicious activity or other red flags in production. This is the second way of DevOps that establishes feedback from production to development. Teams already have production telemetry for latency, request count, and active users, and so on, so InfoSec telemetry should be integrated as well. Examples include:
- SSH connections
- User logins
- Password resets
- Malicious SQL queries
- Malformed requests that may indicate probing or other malicious activity
- Email address (or additional login information) changes
- Billing or payment information changes
- Infrastructure security group or firewall changes
- XSS attacks
- Infrastructure changes such as network, new system users, or modified file checksums
- Privilege escalation (e.g., sudo calls)
This kind of telemetry data is critical to understanding how the system is being used in production. Based on this insight, teams can action by adding regression tests to the pipeline having identified potential problems, resulting in an increased security posture for production. More importantly, it increases visibility. Security changes are more likely to occur when a team realizes they’re under attack.
Nick Galberth from Etsy echoes this sentiment1 after graphing security telemetry:
“One of the results of showing this graph was that developers realized that they were being attacked all the time! And that was awesome because it changed how developers thought about the security of their code as they were writing the code.”
This practice also aids scenarios where pre-production testing and compliance checks are not enough. Accelerate includes a troubling case study of an ATM vendor that demonstrates production InfoSec telemetry’s value. The company noticed their ATMs were put into maintenance mode at unscheduled times. This allowed the attacker to physically extract cash from the machine. A developer installed the backdoor years ago. Apparently, backdoors of this type are difficult or near impossible to detect beforehand. However, the production telemetry detected the anomaly and alerted the team. The team proactively found the fraud and resolved the issue before the scheduled cash audit process.
These examples demonstrate how DevOps practices improve system security. First, like any other aspect of software, add automated tests to the deployment pipeline. Second, add production telemetry to production to direct development changes. The third way calls for learning and experimentation to further improve the software development process. Unfortunately, sometimes teams will miss this aspect. DevOps establishes feedback loops, and the third way continuously improves them to reduce toil, reduce bugs, and/or adapt to changing conditions.
Compliance and auditing is a common pain point. It slows down the process since documentation has to be produced and manual reviews are required. This doesn’t have to halt the process. Automation can drastically improve the compliance and auditing process by removing toil. The Google SRE Book defines toil as “the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” Accelerate includes a case study of 18F and Cloud.gov.
The case study demonstrates a government organization implementing an automated process for writing system security plans (SSP) and obtaining a right to operate from the designated authority. The SSP plans must be reviewed. They’re often a hundred of pages and highly detailed. Creating and maintaining them manually is impossible in a dynamic cloud environment. 18F created a tool that automatically generates an SSP into YML which can be transformed into PDFs or published as GitBooks for internal and external review saving immeasurable amounts of man-hours (and increasing happiness in the process). Private sector IT companies tend to have a more relaxed level of regulation. Regardless, the same compliance and auditing techniques can be and should be leveraged to reduce ongoing effort and toil.
Similar approaches may be used in downstream auditing and compliance processes. Given the production telemetry systems contain InfoSec data, they may be exposed to auditors in a self-service way during reviews. Auditors can check control like appropriate logging or specific event handling. The deployment pipeline also provides a complete change history for the application in production. It’s possible to generate compliance reports using the code, the deployment pipeline, and other automation. This approach again reduces toil for all involved, increases accuracy, and ideally leads to more completed audits.
DevOps is the best way for modern IT to build, test, and ship software. The Three Ways provide a framework for understanding how and why to approach software development problems. Changing and improving InfoSec is not so different than what the cloud and continuous delivery did to software. Everything stems from the idea that increasing frequency decreases difficulty. It saw teams go from deploying quarterly to measuring deploys-per-day per developer. That’s an astonishing velocity improvement. It can affect the same change by applying the three ways to InfoSec outcomes: automated testing, production telemetry, and continuous learning and improvement. Applying all three builds a culture of continuous verification that ultimately raises the security floor across the industry. That sounds like a textbook case of increasing security today and in the future.
- Kim, Gene; Humble, Jez; Debois, Patrick; Willis, John. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Kindle Locations 5850-5852). IT Revolution Press. Kindle Edition. ↩︎
If you liked this post, you’ll enjoy:
New Content: Platforms, Programming, and DevOps – Something for Everyone
This month our team of expert certification specialists released three new or updated learning paths, 16 courses, 13 hands-on labs, and four lab challenges! New content on Cloud Academy You can always visit our Content Roadmap to see what’s just released as well as what’s coming soon....
New Content: Focus on DevOps and Programming Content this Month
This month our team of expert certification specialists released 12 new or updated learning paths, 15 courses, 25 hands-on labs, and four lab challenges! New content on Cloud Academy You can always visit our Content Roadmap to see what’s just released as well as what’s coming soon. Ja...
New Content: Get Ready for the CISM Cert Exam & Learn About Alibaba, Plus All the AWS, GCP, and Azure Courses You Know You Can Count On
This month our team of intrepid certification specialists released five learning paths, seven courses, 19 hands-on labs, and three lab challenges! One particularly interesting new learning path is Certified Information Security Manager (CISM) Foundations. After completing this learn...
New Content: AWS Terraform, Java Programming Lab Challenges, Azure DP-900 & DP-300 Certification Exam Prep, Plus Plenty More Amazon, Google, Microsoft, and Big Data Courses
This month our Content Team continues building the catalog of courses for everyone learning about AWS, GCP, and Microsoft Azure. In addition, this month’s updates include several Java programming lab challenges and a couple of courses on big data. In total, we released five new learning...
Using Docker to Deploy and Optimize WordPress at Scale
Here at Cloud Academy, we use WordPress to serve our blog and product/public pages, such as the home page, the pricing page, etc. Why WordPress? With WordPress, the marketing and content teams can quickly and easily change the look & feel and the content of the pages, without rein...
New Content: AWS Data Analytics – Specialty Certification, Azure AI-900 Certification, Plus New Learning Paths, Courses, Labs, and More
This month our Content Team released two big certification Learning Paths: the AWS Certified Data Analytics - Speciality, and the Azure AI Fundamentals AI-900. In total, we released four new Learning Paths, 16 courses, 24 assessments, and 11 labs. New content on Cloud Academy At any ...
New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More
This past month our Content Team served up a heaping spoonful of new and updated content. Not only did our experts release the brand new Azure DP-100 Certification Learning Path, but they also created 18 new hands-on labs — and so much more! New content on Cloud Academy At any time, y...
Docker Image Security: Get it in Your Sights
For organizations and individuals alike, the adoption of Docker is increasing exponentially with no signs of slowing down. Why is this? Because Docker provides a whole host of features that make it easy to create, deploy, and manage your applications. This useful technology is especiall...
Constant Content: Cloud Academy’s Q3 2020 Roadmap
Hello — Andy Larkin here, VP of Content at Cloud Academy. I am pleased to release our roadmap for the next three months of 2020 — August through October. Let me walk you through the content we have planned for you and how this content can help you gain skills, get certified, and...
New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More
This month our Content Team did an amazing job at publishing and updating a ton of new content. Not only did our experts release the brand new AZ-303 and AZ-304 Certification Learning Paths, but they also created 16 new hands-on labs — and so much more! New content on Cloud Academy At...
New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More
This month, our Content Team released a whopping 13 new labs in real cloud environments! If you haven't tried out our labs, you might not understand why we think that number is so impressive. Our labs are not “simulated” experiences — they are real cloud environments using accounts on A...
New Content: AZ-500 and AZ-400 Updates, 3 Google Professional Exam Preps, Practical ML Learning Path, C# Programming, and More
This month, our Content Team released tons of new content and labs in real cloud environments. Not only that, but we introduced our very first highly interactive "Office Hours" webinar. This webinar, Acing the AWS Solutions Architect Associate Certification, started with a quick overvie...