When it comes to delivering software projects, there must be an easier way. At least that’s the idea behind DevOps. But what is DevOps? And why is it important? These are the questions I’ll try to answer in this post, which is based on our webinar Getting Started with DevOps.
Before I answer the question “what is DevOps?” I’d like to start with another question: What problem is DevOps trying to solve?
In large companies, it’s common to structure teams according to their role. You may have teams for development, QA, database management, operations, and security, among others. It’s also common that each team may be working in complete isolation from the others. Unfortunately, these types of workplace silos are common. Developers write code and perhaps some documentation for how to install it. They send it across to the QA team who does a bit of testing and goes back and forth with the developers to fix bugs. At some point, the code will be sent over the proverbial wall where it will be handed off to operations to run it in production.
Based on my experience, most companies allocate a week or so for QA, but leave little time for developers to fix bugs. This silo pattern is extremely inefficient. As developers, the only connection that we have to code running in production is that we have a queue of tickets for production bug fixes. We are so disconnected from our code that fixing bugs is how we know what’s going on with it in production. At the same time, QA spends a lot of time performing tests that that should be automated. Meanwhile, operations has to support code that is written by developers who may not even know what the production environment looks like. And this makes it tough for operations folks to do their jobs well.
So what is DevOps?
The good news is that there’s a better way: DevOps. If you search for “what is DevOps?” you’re going to see a lot of opinions. What you won’t find, however, is any formal generally agreed-upon definition. Why? I suspect that the lack of a formal definition offers the flexibility required for companies to leverage the principles of DevOps without having to adhere to a strict definition.
While strict and clear definitions have their advantages, a downside is that they can create a rigid environment. As a result, if you need to deviate from a definition, it can cause some friction among team members who follow it to the letter. Now, I can’t be sure if DevOps was created with this flexibility by design or if it was just lucky. However, either way, it’s better for all of us.
So, since there’s no agreed-upon definition for what DevOps is, I’ll share mine.
I call DevOps a philosophy of the efficient development, deployment, and operation of the highest quality software possible. I call it a philosophy because it’s really a system of thinking with a primary concern for developing, deploying, and operating high-quality software.
If you consider development, deployment, and operations as a pipeline for your code to flow through, then DevOps is about looking at that pipeline from a holistic perspective. The goal of looking at the pipeline holistically is to find ways to make it more efficient and produce higher quality products.
What is DevOps CAMS?
The logical question is then, how does DevOps help increase the efficiency of the pipeline and increase the quality of your software? It encompasses both of these through some generally agreed-upon principles. By generally agreed-upon, what I mean is that there is at least some level of consensus in the technical community that these are a good thing. They are often abbreviated as CAMS: Culture, Automation, Measurement, and Sharing. Let’s dig into each of these terms.
What is DevOps culture?
DevOps as a name is the combination of the abbreviations “dev” for development and “ops” for operations. Even the name suggests that DevOps is about bringing teams together. DevOps culture is about breaking down the silos that prohibit teams from collaborating. As the level of collaboration increases, so does the potential for improved quality and efficiency.
Breaking down silos means more collaboration between teams. It can also mean that company values may need to change. This sort of change tends to happen from the top down. Some important values are collaboration, quality, efficiency, security, empathy, and transparency. If your company doesn’t value these things, then it’s likely that no amount of technology is going to help.
For example, if quality isn’t a company value, then as an engineer, you likely won’t get the time you need to create unit tests, integration tests, and automated acceptance tests. The same goes for any of the values we mentioned. If it isn’t a company value, then it’s not important to the company. And, it will typically go ignored until it’s time to place blame when something goes wrong.
Automation removes all of the obstacles that prevent us, as engineers, from delivering awesome new features. Once you start automating, however, it can be really easy to just go crazy and try to automate absolutely everything. This would be a mistake. Trying to automate things such as user acceptance testing is usually more effort than it’s worth. If you’re new to automation, a good place to start is with automating a continuous integration process. It should build your software and run the unit test on each commit, then notify the team of either success or failure.
A failed build should result in holding off on any new commits that aren’t intended to fix the build. The goal is to prioritize keeping your software in a working state over doing new work, even if it can be a difficult thing to do.
Once you have automated the continuous integration process, you can start to automate a continuous delivery process. Each successful build artifact produced by the continuous integration server should be deployed to some environment that mirrors production. This is where automated system acceptance tests, as well as any automated non-functional tests should be run. Examples of non-functional tests includes things such as load testing and security audits, among others.
At this point, if all of the automated tests were successful, then any manual tests can be run against the staged environment.
This is different from the old school way of testing, because you’re only having people test versions of your software that have already passed all of the automated tests. Deploying code into production containing bugs that are easy to test for is costly and embarrassing. Having an automated pipeline helps to catch such things early, long before they make it into production. Once manual testing is complete, the build is considered ready to be released. At this point, it’s a business decision to deploy, plan a scheduled release, etc.
Automation is an important part of getting the code into production quickly and efficiently. It’s also important for managing infrastructure. Being able to manage infrastructure in code is really one of my favorite parts of DevOps because you can codify how your infrastructure should be laid out, and I find that incredibly valuable.
Measurement in DevOps
You can’t make informed decisions if you don’t know what’s really going on with your software and infrastructure. That’s why measurement is so important. Earlier on, I said that DevOps is about improving the efficiency of the dev, deployment, and operations pipeline. Therefore, you need to know what is going on, how your pipeline is performing, and how your infrastructure is behaving. There are a lot of monitoring solutions on the market for different layers of the technology stack. Taking the time to research the option that’s best for you will pay off in the form recovering from problems faster. This can help you go from reactive to proactive, with the right level of monitoring.
What is DevOps Sharing?
The concept of sharing in DevOps means that you should be sharing the problems and solutions that you’re working on throughout your company. When you run into challenges, you should talk to the different departments and people in your company. And you should share your solutions with these same groups so that everyone agrees on a shared solution and it prevents other teams for having to re-engineer the wheel. Consider sharing as a facilitator for collaboration and transparency.
What is DevOps and why is it important?
The old school ways for developing software just didn’t scale well. It took too long to deliver too little. This hurts experimentation because it takes too long to do anything.
In contrast, DevOps scales better because you should be able to push a button and release to a production environment or any other environment with little or no downtime. Because you’re performing automated tests starting with the most granular unit test and all the way up to acceptance tests, they serve as gateways that will allow you to prevent easily testable issues from making it all the way through to production.
Because you’re able to get code into production so quickly, you have time to experiment. This could be in the form of A/B testing or in creating a proof of concept to deploy to a staging environment for some exploratory testing. This is what the term “fail fast” refers to.
If you can get a concept to the people who want to actually use it, you’ll have a faster turnaround for implementing their eventual feedback into your application. Even if the responses are negative, at least you will not have invested so much time. Failing fast is one of the things that DevOps will allow you to do.
DevOps at work
I’d like to close this post with a real-world example: Etsy.
Most of the companies that are able to deliver software quickly and to scale well are practicing some form of DevOps. They may not call it DevOps or anything at all because some of the companies have grown organically into what we would now refer to as DevOps. Etsy falls into this category. I really like this as an example because I think most of us as engineers have worked on a platform like the one that Etsy started out with – check out their engineering blog https://codeascraft.com/.
In 2008, Etsy was structured in that siloed way that I talked about earlier. Dev teams wrote code, DBAs handled database stuff, and ops teams deployed. They were deploying twice a week, which is pretty good even by today’s standards. And, they were experiencing a lot of the deployment problems that I think are pretty common. Their early architectural choices were hindering their ability to scale. One example is that all of their business logic was stored in SQL procedures on one large database server. This was becoming a problem because they were generating a lot of site traffic.
Etsy recognized that they could do better and they identified silos as a problem. The way that they chose to solve it was using a designated operations engineer. What that meant was that each development team would have a designated ops person who would take part in all meetings and planning. In this way, developers understand operational concerns, and the designated ops engineer can serve as an advocate to the rest of the ops team for that project. This allowed developers to gain some insight into production, and operation to have insight into development.
Some of the things that they started using to help improve efficiency were to adopt Chef for configuration management. This allowed them to provision servers and probably some infrastructure. They switched using MySQL with master-master replication from Postgres, and the replication was really important because it allowed them to scale their database horizontally. To get business logic out of the database and into code they started using an ORM, which has a lot of value including the fact that now your database changes are versioned because your code is already under version control. They started using Feature Flags to allow features that aren’t complete to still be deployed without breaking anything.
They also removed the roadblocks that tend to slow down new developers. They created a self-service feature that allowed developers to get a dev VM up and running quickly that mirrors the tech stack that they’re using. All of this resulted in continuous integration, continuous delivery process that allowed them to deploy to production over 50 times a day, and they substantially increased their uptime. Five out of the last six months had 100% uptime and one month, I think it was April 2016, had 99.95% uptime. That’s according to pingdom.com. With the right people, processes, and tools, creating something like this is possible.
Etsy isn’t alone in this. Disney, Netflix, Amazon, Adobe, Facebook, IBM, and many others are using these DevOps practices to efficiently deliver high-quality software and their stories are all pretty similar. Moving your company towards DevOps will take effort. However, it pays off in the form of higher quality software, increased deployment frequency, fewer production bugs, etc.
If you’re not sure where to start, I recommend that you start by implementing a monitoring solution. Once you have quantifiable data, you can then start using that data to identify the bottlenecks in your development, deployment, and operations pipeline.
I’ll leave you with one final, albeit, unoriginal thought: Companies such as Etsy, Netflix, Amazon, and others, that are referred to as “tech unicorns” are really just horses with good PR.
This post is based on an excerpt of our webinar, Getting Started with DevOps. You can view the full list of webinars here.
A full introduction on DevOps is also available for you to explore in our Cloud Computing Course Section.
Real-Time Application Monitoring with Amazon Kinesis
Amazon Kinesis is a real-time data streaming service that makes it easy to collect, process, and analyze data so you can get quick insights and react as fast as possible to new information. With Amazon Kinesis you can ingest real-time data such as application logs, website clickstre...
Google Vision vs. Amazon Rekognition: A Vendor-Neutral Comparison
Google Cloud Vision and Amazon Rekognition offer a broad spectrum of solutions, some of which are comparable in terms of functional details, quality, performance, and costs. This post is a fact-based comparative analysis on Google Vision vs. Amazon Rekognition and will focus on the tech...
New on Cloud Academy: CISSP, AWS, Azure, & DevOps Labs, Python for Beginners, and more…
As Hurricane Dorian intensifies, it looks like Floridians across the entire state might have to hunker down for another big one. If you've gone through a hurricane, you know that preparing for one is no joke. You'll need a survival kit with plenty of water, flashlights, batteries, and n...
Amazon Route 53: Why You Should Consider DNS Migration
What Amazon Route 53 brings to the DNS table Amazon Route 53 is a highly available and scalable Domain Name System (DNS) service offered by AWS. It is named by the TCP or UDP port 53, which is where DNS server requests are addressed. Like any DNS service, Route 53 handles domain regist...
How to Unlock Complimentary Access to Cloud Academy
Are you looking to get trained or certified on AWS, Azure, Google Cloud Platform, DevOps, Cloud Security, Python, Java, or another technical skill? Then you'll want to mark your calendars for August 23, 2019. Starting Friday at 12:00 a.m. PDT (3:00 a.m. EDT), Cloud Academy is offering c...
What Exactly Is a Cloud Architect and How Do You Become One?
One of the buzzwords surrounding the cloud that I'm sure you've heard is "Cloud Architect." In this article, I will outline my understanding of what a cloud architect does and I'll analyze the skills and certifications necessary to become one. I will also list some of the types of jobs ...
Boto: Using Python to Automate AWS Services
Boto allows you to write scripts to automate things like starting AWS EC2 instances Boto is a Python package that provides programmatic connectivity to Amazon Web Services (AWS). AWS offers a range of services for dynamically scaling servers including the core compute service, Elastic...
Content Roadmap: AZ-500, ITIL 4, MS-100, Google Cloud Associate Engineer, and More
Last month, Cloud Academy joined forces with QA, the UK’s largest B2B skills provider, and it put us in an excellent position to solve a massive skills gap problem. As a result of this collaboration, you will see our training library grow with additions from QA’s massive catalog of 500+...
DevSecOps: How to Secure DevOps Environments
Security has been a friction point when discussing DevOps. This stems from the assumption that DevOps teams move too fast to handle security concerns. This makes sense if Information Security (InfoSec) is separate from the DevOps value stream, or if development velocity exceeds the band...
Test Your Cloud Knowledge on AWS, Azure, or Google Cloud Platform
Cloud skills are in demand | In today's digital era, employers are constantly seeking skilled professionals with working knowledge of AWS, Azure, and Google Cloud Platform. According to the 2019 Trends in Cloud Transformation report by 451 Research: Business and IT transformations re...
Disadvantages of Cloud Computing
If you want to deliver digital services of any kind, you’ll need to estimate all types of resources, not the least of which are CPU, memory, storage, and network connectivity. Which resources you choose for your delivery — cloud-based or local — is up to you. But you’ll definitely want...
Google Cloud vs AWS: A Comparison (or can they be compared?)
The "Google Cloud vs AWS" argument used to be a common discussion among our members, but is this still really a thing? You may already know that there are three major players in the public cloud platforms arena: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)...