10 Ingredients for DevOps Transformation with Mark Andersen
At Capital One, DevOps is about delivering high quality, working software, faster. This means software that is reliable, secure, usable, and perfor...Learn More
Testing is arguably the most important aspect of software development. Whether manual or automated, testing ensures the software works as expected. Broken software causes production outages, unsatisfied customers, refunds, decreased trust, or even complete financial collapse. Testing minimizes these types of negative consequences and when done well, enables teams to reach increasingly higher quality thresholds.
DevOps transforms testing by promoting it to a critical concern across all phases of the SDLC and by shifting the responsibilities onto all engineers. DevOps also encourages engineers to answer questions like where and how to test more aspects of their software. This impacts workflows across teams, the deployment pipeline, and encourages exploratory testing. This post covers how DevOps transforms the perspective on software quality and what it means in practice.
The DevOps Handbook defines DevOps with three principles and their associated feedback loops. The Principle of Flow builds a fast feedback loop from development to production by establishing an automated deployment pipeline with tests that check production fitness using trunk-based development.
Trunk-based development coupled with automated testing is the best way to achieve fast feedback from development to production since it drives down batches sizes and ensures all changes are in working order. Adopting this workflow assumes that branches are shorted lived and each commit is tested.
Trunk-based development transforms testing workflows since work happens in a single shared space. There’s not much to understand since commits are simple, but any commit can bring the deployment pipeline to a screeching halt. This workflow avoids merge hell and potential development conflicts.
Here’s an example: performance testing can only happen against an integrated environment, so without it, the tests would happen far later in the process with more negative impact if things go wrong. Trunk-based development enables a “shift left” for any kind of testing, thus providing faster feedback on build quality, enabling faster iterations and ultimately increasing the frequency of production deploys.
This workflow forces teams to adopt a Definition of Done similar to the one found in the DevOps Handbook:
“At the end of each development interval, we must have integrated, tested, working, and potentially shippable code, demonstrated in a production-like environment, created from trunk using a one-click process, and validated with automated tests.”
The Definition of Done removes the need for separate test and stabilization phases towards the end of projects since testing happens continuously. Once testing is automated, teams can turn their attention to identifying and improving other quality indicators earlier in the deployment pipeline.
Security and compliance checks have traditionally taken place at the end of development and have been done manually. Adopting DevOps integrates infosec into everyone’s daily work as part of the automated deployment pipeline. The shift left also causes teams to engage with infosec concerns as early as possible.
Today, it’s possible to test and mitigate a host of infosec issues by adding the following tests to the deployment pipeline:
exec. Examples of tools to perform static analysis include CodeClimate and Brakeman.
Applying these kinds of tests provides immediate and fast feedback on a variety of possible infosec issues. The practice also frees up engineers to focus on different software quality practices. Here’s a story from Etsy on how they took steps to proactively identify security issues in their production environment:
The engineering team added metrics for abnormal production operational events like core dumps or segmentation faults, database syntax errors to indicate potential SQL injection attacks, suspicious SQL queries, and password resets. They graphed the results in real time and found they were being attacked far more often than they thought. Here’s the project lead discussing the impact on their team:
“One of the results of showing this graph was that developers realized that they were being attacked all the time! And that was awesome, because it changed how developers thought about the security of their code as they were writing the code.”
Changing the organization’s relationship to code affects how the code is tested and a careful eye for software quality can confirm or deny a team’s assumptions. This example is not something typically associated with software testing and that’s the point. DevOps changes the way the entire team approaches verifying and testing their software. Modern teams are using fault injection techniques like chaos engineering to build more resilient systems.
Netflix pioneered chaos engineering. The Principles of Chaos describes chaos engineering as:
“…the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.”
The practice involves random (or targeted) destructive actions in a production environment to stress test the environment’s reliability. The simplest chaos is randomly killing production instances and seeing how the system behaves. Other forms of chaos are increasing network latency or shutting off access to external systems.
This exercise not only builds more reliability into systems, but it teaches the team how to repair their system. Michael Nygaard refers to the “Volkswagen Microbus” paradox in Release It! (2nd Edition):
“You learn how to fix the things that often break. You don’t learn how to fix the things that rarely break. But that means when they do break, the situation is likely to be more dire. We want a continuous low level of breakage to make sure our system can handle the big things.”
Attempting to bucket chaos engineering with a specific engineering skill set is challenging because it doesn’t fit a specific set of skills. The engineer must understand the system, infrastructure, and have the engineering chops to back it all up. Also, resolving faults found through chaos engineering is not a single person’s responsibility, but rather that of the team. Software testing is no longer purely focused on functionality requirements. It is increasingly moving towards identifying unknowns and adherence to non-functional requirements. It may be obvious that engineers should know how to repair their systems, but they can’t learn to do it without practice. Chaos engineering is an interesting approach to creating a regression test for those sort of non-functional requirements.
The adoption of chaos engineering indicates how DevOps is transforming software testing and the team’s approach to ensuring high-quality software.
DevOps shifts responsibility away from specific individuals to a shared responsibility model backed by automation. That’s news for those working in traditional QA teams, especially if they’re doing manual testing or don’t have much software engineering experience. DevOps obviates the need for dedicated manual QA staff. It also forces every team member to become a software engineer. All forms of automation require writing code, so if traditional QA staff don’t learn to code then they’ll be out in the cold. DevOps replaces that face of QA with a more useful, analytical and exploratory one.DevOps shifts responsibility away from specific individuals to a shared responsibility model backed by automation. Click To Tweet
Teams will always need engineers to explore ways to break their systems since that’s a fundamentally creative and experimental process. Experimenting and learning is a key component of DevOps. The DevOps Handbook defines it as the “Third Way”:
“practices that create opportunities for learning, as quickly, frequently, cheaply, and as soon as possible. This includes creating learnings from accidents and failures, which are inevitable when we work within complex systems, as well as organizing and designing our systems of work so that we are constantly experimenting and learning, continually making our systems safer.”
Chaos engineering is an example of constant experimentation and learning from real-world operations to make systems safer. DevOps is orientating software testing in a way that facilitates this. There’s something powerful there. Testing isn’t an activity confined to a specific team, feature, or part of an application. It goes wherever the deployment pipeline goes, be it infosec compliance, functional testing, or fault injection. It’s the test’s job to ensure that the deployment pipeline keeps moving and is regression-free. No matter where you are in your DevOps journey — beginner or advanced, just make sure you can write the code and tests to keep up.
The DevOps methodology relates technical and organization practices so it's difficult to simply ascribe a number and say "our organization is a B+ on DevOps!" Things don't work that way. A better approach identifies intended outcomes and measurable characteristics for each outcome. Let'...
2019 DevOps and Automation PredictionsWe recently released our 2019 predictions for cloud computing and are doing the same here for DevOps and automation predictions.2018 was a great year for software, and DevOps falls somewhere on the slope of enlightenment on the Gartner Hype Cy...
Automated deployment pipelines empower teams to ship better software faster. The best pipelines do more than deploy software; they also ensure the entire system is regression-free. Our deployment pipelines must keep up with the shifting realities in software architecture. Applications a...
Agile development used to be front and center in the conversation about software development. Now, DevOps has taken over the conversation. How do agile and DevOps relate? Both ideas began as ways to improve different aspects of software development. Agile embraced the changing nature of...
Much has been written and discussed about SRE (Site Reliability Engineering) from what it is, how to do it, and how it's the same (or different) as DevOps. Google coined the term, defined the profession, and wrote the book on it. Their "Site Reliability Engineering" book covers the idea...
What Does DevOps Mean for Risk Management?Adopting DevOps makes the unfamiliar uneasy in two areas. One, they see an inherently risky choice between speed and quality and second, they are concerned that the quick iterations of DevOps may break compliance rules or introduce security vu...
Containers can help fragment monoliths into logical, easier to use workloads. The AWS Summit New York was held on July 17 and Cloud Academy sponsored my trip to the event. As someone who covers enterprise cloud technologies and services, the recent Amazon Web Services event was an insig...
Many organizations approach digital transformation and DevOps adoption with the belief that simply by selecting and using the right tools, they will achieve higher levels of automation and gain massive efficiencies as a result. While DevOps adoption does require new tools and processes,...
Ongoing threats of data breaches and cyber attacks remain top of mind for every team responsible for securing cloud workloads and applications, especially with the challenge of managing secrets including passwords, tokens, API keys, certificates, and more. Complexity is especially notab...
Enterprises are leveraging a variety of open source products including operating systems, code libraries, software, and applications for a range of business use cases. While using open source comes with cost, flexibility, and speed advantages, it can also pose some unique security chall...
Thanks to DevOps practices, enterprise IT is faster and more agile. Automation in the form of automated builds, tests, and releases plays a significant role in achieving those benefits and creates the foundation for Continuous Integration/Continuous Deployment (CI/CD) pipelines. However...
In the IT world, failure is inevitable. A server might go down, an app may fail, etc. Does your team know what to do during a major outage? Do you know what instances may cause a larger systems failure? Chaos engineering, or chaos as a service, will help you fail responsibly.It almo...