When it comes to delivering software projects, there must be an easier way. At least that’s the idea behind DevOps. But what is DevOps? And why is it important? These are the questions I’ll try to answer in this post, which is based on our webinar Getting Started with DevOps.
Before I answer the question “what is DevOps?” I’d like to start with another question: What problem is DevOps trying to solve?
In large companies, it’s common to structure teams according to their role. You may have teams for development, QA, database management, operations, and security, among others. It’s also common that each team may be working in complete isolation from the others. Unfortunately, these types of workplace silos are common. Developers write code and perhaps some documentation for how to install it. They send it across to the QA team who does a bit of testing and goes back and forth with the developers to fix bugs. At some point, the code will be sent over the proverbial wall where it will be handed off to operations to run it in production.
Based on my experience, most companies allocate a week or so for QA, but leave little time for developers to fix bugs. This silo pattern is extremely inefficient. As developers, the only connection that we have to code running in production is that we have a queue of tickets for production bug fixes. We are so disconnected from our code that fixing bugs is how we know what’s going on with it in production. At the same time, QA spends a lot of time performing tests that that should be automated. Meanwhile, operations has to support code that is written by developers who may not even know what the production environment looks like. And this makes it tough for operations folks to do their jobs well.
So what is DevOps?
The good news is that there’s a better way: DevOps. If you search for “what is DevOps?” you’re going to see a lot of opinions. What you won’t find, however, is any formal generally agreed-upon definition. Why? I suspect that the lack of a formal definition offers the flexibility required for companies to leverage the principles of DevOps without having to adhere to a strict definition.
While strict and clear definitions have their advantages, a downside is that they can create a rigid environment. As a result, if you need to deviate from a definition, it can cause some friction among team members who follow it to the letter. Now, I can’t be sure if DevOps was created with this flexibility by design or if it was just lucky. However, either way, it’s better for all of us.
So, since there’s no agreed-upon definition for what DevOps is, I’ll share mine.
I call DevOps a philosophy of the efficient development, deployment, and operation of the highest quality software possible. I call it a philosophy because it’s really a system of thinking with a primary concern for developing, deploying, and operating high-quality software.
If you consider development, deployment, and operations as a pipeline for your code to flow through, then DevOps is about looking at that pipeline from a holistic perspective. The goal of looking at the pipeline holistically is to find ways to make it more efficient and produce higher quality products.
What is DevOps CAMS?
The logical question is then, how does DevOps help increase the efficiency of the pipeline and increase the quality of your software? It encompasses both of these through some generally agreed-upon principles. By generally agreed-upon, what I mean is that there is at least some level of consensus in the technical community that these are a good thing. They are often abbreviated as CAMS: Culture, Automation, Measurement, and Sharing. Let’s dig into each of these terms.
What is DevOps culture?
DevOps as a name is the combination of the abbreviations “dev” for development and “ops” for operations. Even the name suggests that DevOps is about bringing teams together. DevOps culture is about breaking down the silos that prohibit teams from collaborating. As the level of collaboration increases, so does the potential for improved quality and efficiency.
Breaking down silos means more collaboration between teams. It can also mean that company values may need to change. This sort of change tends to happen from the top down. Some important values are collaboration, quality, efficiency, security, empathy, and transparency. If your company doesn’t value these things, then it’s likely that no amount of technology is going to help.
For example, if quality isn’t a company value, then as an engineer, you likely won’t get the time you need to create unit tests, integration tests, and automated acceptance tests. The same goes for any of the values we mentioned. If it isn’t a company value, then it’s not important to the company. And, it will typically go ignored until it’s time to place blame when something goes wrong.
Automation removes all of the obstacles that prevent us, as engineers, from delivering awesome new features. Once you start automating, however, it can be really easy to just go crazy and try to automate absolutely everything. This would be a mistake. Trying to automate things such as user acceptance testing is usually more effort than it’s worth. If you’re new to automation, a good place to start is with automating a continuous integration process. It should build your software and run the unit test on each commit, then notify the team of either success or failure.
A failed build should result in holding off on any new commits that aren’t intended to fix the build. The goal is to prioritize keeping your software in a working state over doing new work, even if it can be a difficult thing to do.
Once you have automated the continuous integration process, you can start to automate a continuous delivery process. Each successful build artifact produced by the continuous integration server should be deployed to some environment that mirrors production. This is where automated system acceptance tests, as well as any automated non-functional tests should be run. Examples of non-functional tests includes things such as load testing and security audits, among others.
At this point, if all of the automated tests were successful, then any manual tests can be run against the staged environment.
This is different from the old school way of testing, because you’re only having people test versions of your software that have already passed all of the automated tests. Deploying code into production containing bugs that are easy to test for is costly and embarrassing. Having an automated pipeline helps to catch such things early, long before they make it into production. Once manual testing is complete, the build is considered ready to be released. At this point, it’s a business decision to deploy, plan a scheduled release, etc.
Automation is an important part of getting the code into production quickly and efficiently. It’s also important for managing infrastructure. Being able to manage infrastructure in code is really one of my favorite parts of DevOps because you can codify how your infrastructure should be laid out, and I find that incredibly valuable.
Measurement in DevOps
You can’t make informed decisions if you don’t know what’s really going on with your software and infrastructure. That’s why measurement is so important. Earlier on, I said that DevOps is about improving the efficiency of the dev, deployment, and operations pipeline. Therefore, you need to know what is going on, how your pipeline is performing, and how your infrastructure is behaving. There are a lot of monitoring solutions on the market for different layers of the technology stack. Taking the time to research the option that’s best for you will pay off in the form recovering from problems faster. This can help you go from reactive to proactive, with the right level of monitoring.
What is DevOps Sharing?
The concept of sharing in DevOps means that you should be sharing the problems and solutions that you’re working on throughout your company. When you run into challenges, you should talk to the different departments and people in your company. And you should share your solutions with these same groups so that everyone agrees on a shared solution and it prevents other teams for having to re-engineer the wheel. Consider sharing as a facilitator for collaboration and transparency.
What is DevOps and why is it important?
The old school ways for developing software just didn’t scale well. It took too long to deliver too little. This hurts experimentation because it takes too long to do anything.
In contrast, DevOps scales better because you should be able to push a button and release to a production environment or any other environment with little or no downtime. Because you’re performing automated tests starting with the most granular unit test and all the way up to acceptance tests, they serve as gateways that will allow you to prevent easily testable issues from making it all the way through to production.
Because you’re able to get code into production so quickly, you have time to experiment. This could be in the form of A/B testing or in creating a proof of concept to deploy to a staging environment for some exploratory testing. This is what the term “fail fast” refers to.
If you can get a concept to the people who want to actually use it, you’ll have a faster turnaround for implementing their eventual feedback into your application. Even if the responses are negative, at least you will not have invested so much time. Failing fast is one of the things that DevOps will allow you to do.
DevOps at work
I’d like to close this post with a real-world example: Etsy.
Most of the companies that are able to deliver software quickly and to scale well are practicing some form of DevOps. They may not call it DevOps or anything at all because some of the companies have grown organically into what we would now refer to as DevOps. Etsy falls into this category. I really like this as an example because I think most of us as engineers have worked on a platform like the one that Etsy started out with – check out their engineering blog https://codeascraft.com/.
In 2008, Etsy was structured in that siloed way that I talked about earlier. Dev teams wrote code, DBAs handled database stuff, and ops teams deployed. They were deploying twice a week, which is pretty good even by today’s standards. And, they were experiencing a lot of the deployment problems that I think are pretty common. Their early architectural choices were hindering their ability to scale. One example is that all of their business logic was stored in SQL procedures on one large database server. This was becoming a problem because they were generating a lot of site traffic.
Etsy recognized that they could do better and they identified silos as a problem. The way that they chose to solve it was using a designated operations engineer. What that meant was that each development team would have a designated ops person who would take part in all meetings and planning. In this way, developers understand operational concerns, and the designated ops engineer can serve as an advocate to the rest of the ops team for that project. This allowed developers to gain some insight into production, and operation to have insight into development.
Some of the things that they started using to help improve efficiency were to adopt Chef for configuration management. This allowed them to provision servers and probably some infrastructure. They switched using MySQL with master-master replication from Postgres, and the replication was really important because it allowed them to scale their database horizontally. To get business logic out of the database and into code they started using an ORM, which has a lot of value including the fact that now your database changes are versioned because your code is already under version control. They started using Feature Flags to allow features that aren’t complete to still be deployed without breaking anything.
They also removed the roadblocks that tend to slow down new developers. They created a self-service feature that allowed developers to get a dev VM up and running quickly that mirrors the tech stack that they’re using. All of this resulted in continuous integration, continuous delivery process that allowed them to deploy to production over 50 times a day, and they substantially increased their uptime. Five out of the last six months had 100% uptime and one month, I think it was April 2016, had 99.95% uptime. That’s according to pingdom.com. With the right people, processes, and tools, creating something like this is possible.
Etsy isn’t alone in this. Disney, Netflix, Amazon, Adobe, Facebook, IBM, and many others are using these DevOps practices to efficiently deliver high-quality software and their stories are all pretty similar. Moving your company towards DevOps will take effort. However, it pays off in the form of higher quality software, increased deployment frequency, fewer production bugs, etc.
If you’re not sure where to start, I recommend that you start by implementing a monitoring solution. Once you have quantifiable data, you can then start using that data to identify the bottlenecks in your development, deployment, and operations pipeline.
I’ll leave you with one final, albeit, unoriginal thought: Companies such as Etsy, Netflix, Amazon, and others, that are referred to as “tech unicorns” are really just horses with good PR.
This post is based on an excerpt of our webinar, Getting Started with DevOps. You can view the full list of webinars here.
A full introduction on DevOps is also available for you to explore in our Cloud Computing Course Section.
New Content: AWS VPC & CloudFormation Playgrounds, Alibaba Lab Challenges and more
New Content in the Training Library In April, our Content Team released three new or updated learning paths, 15 courses, 18 hands-on labs, and six lab challenges! You can always find the latest content additions, as well as insight into what content we’re working on next, on our Conte...
New Content: Platforms, Programming, and DevOps – Something for Everyone
This month our team of expert certification specialists released three new or updated learning paths, 16 courses, 13 hands-on labs, and four lab challenges! New content on Cloud Academy You can always visit our Content Roadmap to see what’s just released as well as what’s coming soon....
Mastering AWS Organizations Service Control Policies
Service Control Policies (SCPs) are IAM-like policies to manage permissions in AWS Organizations. SCPs restrict the actions allowed for accounts within the organization making each one of them compliant with your guidelines. SCPs are not meant to grant permissions; you should consider ...
New Content: Focus on DevOps and Programming Content this Month
This month our team of expert certification specialists released 12 new or updated learning paths, 15 courses, 25 hands-on labs, and four lab challenges! New content on Cloud Academy You can always visit our Content Roadmap to see what’s just released as well as what’s coming soon. Ja...
New Content: Get Ready for the CISM Cert Exam & Learn About Alibaba, Plus All the AWS, GCP, and Azure Courses You Know You Can Count On
This month our team of intrepid certification specialists released five learning paths, seven courses, 19 hands-on labs, and three lab challenges! One particularly interesting new learning path is Certified Information Security Manager (CISM) Foundations. After completing this learn...
Which Certifications Should I Get?
The old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and companies. With all that in mind, the s...
The 12 AWS Certifications: Which is Right for You and Your Team?
As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing. As the market leader and most ma...
AWS Certified Solutions Architect Associate: A Study Guide
Want to take a really impactful step in your technical career? Explore the AWS Solutions Architect Associate certificate. Its new version (SAA-C02) was released on March 23, 2020. The AWS Solutions Architect - Associate Certification (or Sol Arch Associate for short) offers some ...
New Content: AWS Terraform, Java Programming Lab Challenges, Azure DP-900 & DP-300 Certification Exam Prep, Plus Plenty More Amazon, Google, Microsoft, and Big Data Courses
This month our Content Team continues building the catalog of courses for everyone learning about AWS, GCP, and Microsoft Azure. In addition, this month’s updates include several Java programming lab challenges and a couple of courses on big data. In total, we released five new learning...
Where Should You Be Focusing Your AWS Security Efforts?
Another day, another re:Invent session! This time I listened to Stephen Schmidt’s session, “AWS Security: Where we've been, where we're going.” Amongst covering the highlights of AWS security during 2020, a number of newly added AWS features/services were discussed, including: AWS Audit...
AWS re:Invent: 2020 Keynote Top Highlights and More
We’ve gotten through the first five days of the special all-virtual 2020 edition of AWS re:Invent. It’s always a really exciting time for practitioners in the field to see what features and services AWS has cooked up for the year ahead. This year’s conference is a marathon and not a...
WARNING: Great Cloud Content Ahead
At Cloud Academy, content is at the heart of what we do. We work with the world’s leading cloud and operations teams to develop video courses and learning paths that accelerate teams and drive digital transformation. First and foremost, we listen to our customers’ needs and we stay ahea...