When it comes to delivering software projects, there must be an easier way. At least that’s the idea behind DevOps. But what is DevOps? And why is it important? These are the questions I’ll try to answer in this post, which is based on our webinar Getting Started with DevOps.
Before I answer the question “what is DevOps?” I’d like to start with another question: What problem is DevOps trying to solve?
In large companies, it’s common to structure teams according to their role. You may have teams for development, QA, database management, operations, and security, among others. It’s also common that each team may be working in complete isolation from the others. Unfortunately, these types of workplace silos are common. Developers write code and perhaps some documentation for how to install it. They send it across to the QA team who does a bit of testing and goes back and forth with the developers to fix bugs. At some point, the code will be sent over the proverbial wall where it will be handed off to operations to run it in production.
Based on my experience, most companies allocate a week or so for QA, but leave little time for developers to fix bugs. This silo pattern is extremely inefficient. As developers, the only connection that we have to code running in production is that we have a queue of tickets for production bug fixes. We are so disconnected from our code that fixing bugs is how we know what’s going on with it in production. At the same time, QA spends a lot of time performing tests that that should be automated. Meanwhile, operations has to support code that is written by developers who may not even know what the production environment looks like. And this makes it tough for operations folks to do their jobs well.
So what is DevOps?
The good news is that there’s a better way: DevOps. If you search for “what is DevOps?” you’re going to see a lot of opinions. What you won’t find, however, is any formal generally agreed-upon definition. Why? I suspect that the lack of a formal definition offers the flexibility required for companies to leverage the principles of DevOps without having to adhere to a strict definition.
While strict and clear definitions have their advantages, a downside is that they can create a rigid environment. As a result, if you need to deviate from a definition, it can cause some friction among team members who follow it to the letter. Now, I can’t be sure if DevOps was created with this flexibility by design or if it was just lucky. However, either way, it’s better for all of us.
So, since there’s no agreed-upon definition for what DevOps is, I’ll share mine.
I call DevOps a philosophy of the efficient development, deployment, and operation of the highest quality software possible. I call it a philosophy because it’s really a system of thinking with a primary concern for developing, deploying, and operating high-quality software.
If you consider development, deployment, and operations as a pipeline for your code to flow through, then DevOps is about looking at that pipeline from a holistic perspective. The goal of looking at the pipeline holistically is to find ways to make it more efficient and produce higher quality products.
What is DevOps CAMS?
The logical question is then, how does DevOps help increase the efficiency of the pipeline and increase the quality of your software? It encompasses both of these through some generally agreed-upon principles. By generally agreed-upon, what I mean is that there is at least some level of consensus in the technical community that these are a good thing. They are often abbreviated as CAMS: Culture, Automation, Measurement, and Sharing. Let’s dig into each of these terms.
What is DevOps culture?
DevOps as a name is the combination of the abbreviations “dev” for development and “ops” for operations. Even the name suggests that DevOps is about bringing teams together. DevOps culture is about breaking down the silos that prohibit teams from collaborating. As the level of collaboration increases, so does the potential for improved quality and efficiency.
Breaking down silos means more collaboration between teams. It can also mean that company values may need to change. This sort of change tends to happen from the top down. Some important values are collaboration, quality, efficiency, security, empathy, and transparency. If your company doesn’t value these things, then it’s likely that no amount of technology is going to help.
For example, if quality isn’t a company value, then as an engineer, you likely won’t get the time you need to create unit tests, integration tests, and automated acceptance tests. The same goes for any of the values we mentioned. If it isn’t a company value, then it’s not important to the company. And, it will typically go ignored until it’s time to place blame when something goes wrong.
Automation removes all of the obstacles that prevent us, as engineers, from delivering awesome new features. Once you start automating, however, it can be really easy to just go crazy and try to automate absolutely everything. This would be a mistake. Trying to automate things such as user acceptance testing is usually more effort than it’s worth. If you’re new to automation, a good place to start is with automating a continuous integration process. It should build your software and run the unit test on each commit, then notify the team of either success or failure.
A failed build should result in holding off on any new commits that aren’t intended to fix the build. The goal is to prioritize keeping your software in a working state over doing new work, even if it can be a difficult thing to do.
Once you have automated the continuous integration process, you can start to automate a continuous delivery process. Each successful build artifact produced by the continuous integration server should be deployed to some environment that mirrors production. This is where automated system acceptance tests, as well as any automated non-functional tests should be run. Examples of non-functional tests includes things such as load testing and security audits, among others.
At this point, if all of the automated tests were successful, then any manual tests can be run against the staged environment.
This is different from the old school way of testing, because you’re only having people test versions of your software that have already passed all of the automated tests. Deploying code into production containing bugs that are easy to test for is costly and embarrassing. Having an automated pipeline helps to catch such things early, long before they make it into production. Once manual testing is complete, the build is considered ready to be released. At this point, it’s a business decision to deploy, plan a scheduled release, etc.
Automation is an important part of getting the code into production quickly and efficiently. It’s also important for managing infrastructure. Being able to manage infrastructure in code is really one of my favorite parts of DevOps because you can codify how your infrastructure should be laid out, and I find that incredibly valuable.
Measurement in DevOps
You can’t make informed decisions if you don’t know what’s really going on with your software and infrastructure. That’s why measurement is so important. Earlier on, I said that DevOps is about improving the efficiency of the dev, deployment, and operations pipeline. Therefore, you need to know what is going on, how your pipeline is performing, and how your infrastructure is behaving. There are a lot of monitoring solutions on the market for different layers of the technology stack. Taking the time to research the option that’s best for you will pay off in the form recovering from problems faster. This can help you go from reactive to proactive, with the right level of monitoring.
What is DevOps Sharing?
The concept of sharing in DevOps means that you should be sharing the problems and solutions that you’re working on throughout your company. When you run into challenges, you should talk to the different departments and people in your company. And you should share your solutions with these same groups so that everyone agrees on a shared solution and it prevents other teams for having to re-engineer the wheel. Consider sharing as a facilitator for collaboration and transparency.
What is DevOps and why is it important?
The old school ways for developing software just didn’t scale well. It took too long to deliver too little. This hurts experimentation because it takes too long to do anything.
In contrast, DevOps scales better because you should be able to push a button and release to a production environment or any other environment with little or no downtime. Because you’re performing automated tests starting with the most granular unit test and all the way up to acceptance tests, they serve as gateways that will allow you to prevent easily testable issues from making it all the way through to production.
Because you’re able to get code into production so quickly, you have time to experiment. This could be in the form of A/B testing or in creating a proof of concept to deploy to a staging environment for some exploratory testing. This is what the term “fail fast” refers to.
If you can get a concept to the people who want to actually use it, you’ll have a faster turnaround for implementing their eventual feedback into your application. Even if the responses are negative, at least you will not have invested so much time. Failing fast is one of the things that DevOps will allow you to do.
DevOps at work
I’d like to close this post with a real-world example: Etsy.
Most of the companies that are able to deliver software quickly and to scale well are practicing some form of DevOps. They may not call it DevOps or anything at all because some of the companies have grown organically into what we would now refer to as DevOps. Etsy falls into this category. I really like this as an example because I think most of us as engineers have worked on a platform like the one that Etsy started out with – check out their engineering blog https://codeascraft.com/.
In 2008, Etsy was structured in that siloed way that I talked about earlier. Dev teams wrote code, DBAs handled database stuff, and ops teams deployed. They were deploying twice a week, which is pretty good even by today’s standards. And, they were experiencing a lot of the deployment problems that I think are pretty common. Their early architectural choices were hindering their ability to scale. One example is that all of their business logic was stored in SQL procedures on one large database server. This was becoming a problem because they were generating a lot of site traffic.
Etsy recognized that they could do better and they identified silos as a problem. The way that they chose to solve it was using a designated operations engineer. What that meant was that each development team would have a designated ops person who would take part in all meetings and planning. In this way, developers understand operational concerns, and the designated ops engineer can serve as an advocate to the rest of the ops team for that project. This allowed developers to gain some insight into production, and operation to have insight into development.
Some of the things that they started using to help improve efficiency were to adopt Chef for configuration management. This allowed them to provision servers and probably some infrastructure. They switched using MySQL with master-master replication from Postgres, and the replication was really important because it allowed them to scale their database horizontally. To get business logic out of the database and into code they started using an ORM, which has a lot of value including the fact that now your database changes are versioned because your code is already under version control. They started using Feature Flags to allow features that aren’t complete to still be deployed without breaking anything.
They also removed the roadblocks that tend to slow down new developers. They created a self-service feature that allowed developers to get a dev VM up and running quickly that mirrors the tech stack that they’re using. All of this resulted in continuous integration, continuous delivery process that allowed them to deploy to production over 50 times a day, and they substantially increased their uptime. Five out of the last six months had 100% uptime and one month, I think it was April 2016, had 99.95% uptime. That’s according to pingdom.com. With the right people, processes, and tools, creating something like this is possible.
Etsy isn’t alone in this. Disney, Netflix, Amazon, Adobe, Facebook, IBM, and many others are using these DevOps practices to efficiently deliver high-quality software and their stories are all pretty similar. Moving your company towards DevOps will take effort. However, it pays off in the form of higher quality software, increased deployment frequency, fewer production bugs, etc.
If you’re not sure where to start, I recommend that you start by implementing a monitoring solution. Once you have quantifiable data, you can then start using that data to identify the bottlenecks in your development, deployment, and operations pipeline.
I’ll leave you with one final, albeit, unoriginal thought: Companies such as Etsy, Netflix, Amazon, and others, that are referred to as “tech unicorns” are really just horses with good PR.
This post is based on an excerpt of our webinar, Getting Started with DevOps. You can view the full list of webinars here.
A full introduction on DevOps is also available for you to explore in our Cloud Computing Course Section.
WARNING: Great Cloud Content Ahead
At Cloud Academy, content is at the heart of what we do. We work with the world’s leading cloud and operations teams to develop video courses and learning paths that accelerate teams and drive digital transformation. First and foremost, we listen to our customers’ needs and we stay ahea...
Excelling in AWS, Azure, and Beyond – How Danut Prisacaru Prepares for the Future
Meet Danut Prisacaru. Danut has been a Software Architect for the past 10 years and has been involved in Software Engineering for 30 years. He’s passionate about software and learning, and jokes that coding is basically the only thing he can do well (!). We think his enthusiasm shines t...
New Content: AWS Data Analytics – Specialty Certification, Azure AI-900 Certification, Plus New Learning Paths, Courses, Labs, and More
This month our Content Team released two big certification Learning Paths: the AWS Certified Data Analytics - Speciality, and the Azure AI Fundamentals AI-900. In total, we released four new Learning Paths, 16 courses, 24 assessments, and 11 labs. New content on Cloud Academy At any ...
New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More
This past month our Content Team served up a heaping spoonful of new and updated content. Not only did our experts release the brand new Azure DP-100 Certification Learning Path, but they also created 18 new hands-on labs — and so much more! New content on Cloud Academy At any time, y...
AWS Certification Practice Exam: What to Expect from Test Questions
If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers 12 certifications that cover major cloud roles including Solutions Architect, De...
Overcoming Unprecedented Business Challenges with AWS
From auto-scaling applications with high availability to video conferencing that’s used by everyone, every day — cloud technology has never been more popular or in-demand. But what does this mean for experienced cloud professionals and the challenges they face as they carve out a new p...
Constant Content: Cloud Academy’s Q3 2020 Roadmap
Hello — Andy Larkin here, VP of Content at Cloud Academy. I am pleased to release our roadmap for the next three months of 2020 — August through October. Let me walk you through the content we have planned for you and how this content can help you gain skills, get certified, and...
New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More
This month our Content Team did an amazing job at publishing and updating a ton of new content. Not only did our experts release the brand new AZ-303 and AZ-304 Certification Learning Paths, but they also created 16 new hands-on labs — and so much more! New content on Cloud Academy At...
Blog Digest: Which Certifications Should I Get?, The 12 Microsoft Azure Certifications, 6 Ways to Prevent a Data Breach, and More
This month, we were excited to announce that Cloud Academy was recognized in the G2 Summer 2020 reports! These reports highlight the top-rated solutions in the industry, as chosen by the source that matters most: customers. We're grateful to have been nominated as a High Performer in se...
Which Certifications Should I Get?
The old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and companies. With all that in mind, the s...
New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More
This month, our Content Team released a whopping 13 new labs in real cloud environments! If you haven't tried out our labs, you might not understand why we think that number is so impressive. Our labs are not “simulated” experiences — they are real cloud environments using accounts on A...
Kickstart Your Tech Training With a Free Week on Cloud Academy
Are you looking to make a jump in your technical career? Want to get trained or certified on AWS, Azure, Google Cloud Platform, DevOps, Kubernetes, Python, or another in-demand skill? Then you'll want to mark your calendar. Starting Monday, June 22 at 12:00 a.m. PDT (3:00 a.m. EDT), ...