5 Best DevOps Tools You Need to Know
Here Are 5 Best DevOps Tools Designed to Streamline Your Engineering PipelineAs the DevOps industry continues to grow, there are a variety of too...Learn More
When it comes to delivering software projects, there must be an easier way. At least that’s the idea behind DevOps. But what is DevOps? And why is it important? These are the questions I’ll try to answer in this post, which is based on our webinar Getting Started with DevOps.
Before I answer the question “what is DevOps?” I’d like to start with another question: What problem is DevOps trying to solve?
In large companies, it’s common to structure teams according to their role. You may have teams for development, QA, database management, operations, and security, among others. It’s also common that each team may be working in complete isolation from the others. Unfortunately, these types of workplace silos are common. Developers write code and perhaps some documentation for how to install it. They send it across to the QA team who does a bit of testing and goes back and forth with the developers to fix bugs. At some point, the code will be sent over the proverbial wall where it will be handed off to operations to run it in production.
Based on my experience, most companies allocate a week or so for QA, but leave little time for developers to fix bugs. This silo pattern is extremely inefficient. As developers, the only connection that we have to code running in production is that we have a queue of tickets for production bug fixes. We are so disconnected from our code that fixing bugs is how we know what’s going on with it in production. At the same time, QA spends a lot of time performing tests that that should be automated. Meanwhile, operations has to support code that is written by developers who may not even know what the production environment looks like. And this makes it tough for operations folks to do their jobs well.
The good news is that there’s a better way: DevOps. If you search for “what is DevOps?” you’re going to see a lot of opinions. What you won’t find, however, is any formal generally agreed-upon definition. Why? I suspect that the lack of a formal definition offers the flexibility required for companies to leverage the principles of DevOps without having to adhere to a strict definition.
While strict and clear definitions have their advantages, a downside is that they can create a rigid environment. As a result, if you need to deviate from a definition, it can cause some friction among team members who follow it to the letter. Now, I can’t be sure if DevOps was created with this flexibility by design or if it was just lucky. However, either way, it’s better for all of us.
So, since there’s no agreed-upon definition for what DevOps is, I’ll share mine.
I call DevOps a philosophy of the efficient development, deployment, and operation of the highest quality software possible. I call it a philosophy because it’s really a system of thinking with a primary concern for developing, deploying, and operating high-quality software.
If you consider development, deployment, and operations as a pipeline for your code to flow through, then DevOps is about looking at that pipeline from a holistic perspective. The goal of looking at the pipeline holistically is to find ways to make it more efficient and produce higher quality products.
The logical question is then, how does DevOps help increase the efficiency of the pipeline and increase the quality of your software? It encompasses both of these through some generally agreed-upon principles. By generally agreed-upon, what I mean is that there is at least some level of consensus in the technical community that these are a good thing. They are often abbreviated as CAMS: Culture, Automation, Measurement, and Sharing. Let’s dig into each of these terms.
DevOps as a name is the combination of the abbreviations “dev” for development and “ops” for operations. Even the name suggests that DevOps is about bringing teams together. DevOps culture is about breaking down the silos that prohibit teams from collaborating. As the level of collaboration increases, so does the potential for improved quality and efficiency.
Breaking down silos means more collaboration between teams. It can also mean that company values may need to change. This sort of change tends to happen from the top down. Some important values are collaboration, quality, efficiency, security, empathy, and transparency. If your company doesn’t value these things, then it’s likely that no amount of technology is going to help.
For example, if quality isn’t a company value, then as an engineer, you likely won’t get the time you need to create unit tests, integration tests, and automated acceptance tests. The same goes for any of the values we mentioned. If it isn’t a company value, then it’s not important to the company. And, it will typically go ignored until it’s time to place blame when something goes wrong.
Automation removes all of the obstacles that prevent us, as engineers, from delivering awesome new features. Once you start automating, however, it can be really easy to just go crazy and try to automate absolutely everything. This would be a mistake. Trying to automate things such as user acceptance testing is usually more effort than it’s worth. If you’re new to automation, a good place to start is with automating a continuous integration process. It should build your software and run the unit test on each commit, then notify the team of either success or failure.
A failed build should result in holding off on any new commits that aren’t intended to fix the build. The goal is to prioritize keeping your software in a working state over doing new work, even if it can be a difficult thing to do.
Once you have automated the continuous integration process, you can start to automate a continuous delivery process. Each successful build artifact produced by the continuous integration server should be deployed to some environment that mirrors production. This is where automated system acceptance tests, as well as any automated non-functional tests should be run. Examples of non-functional tests includes things such as load testing and security audits, among others.
At this point, if all of the automated tests were successful, then any manual tests can be run against the staged environment.
This is different from the old school way of testing, because you’re only having people test versions of your software that have already passed all of the automated tests. Deploying code into production containing bugs that are easy to test for is costly and embarrassing. Having an automated pipeline helps to catch such things early, long before they make it into production. Once manual testing is complete, the build is considered ready to be released. At this point, it’s a business decision to deploy, plan a scheduled release, etc.
Automation is an important part of getting the code into production quickly and efficiently. It’s also important for managing infrastructure. Being able to manage infrastructure in code is really one of my favorite parts of DevOps because you can codify how your infrastructure should be laid out, and I find that incredibly valuable.
You can’t make informed decisions if you don’t know what’s really going on with your software and infrastructure. That’s why measurement is so important. Earlier on, I said that DevOps is about improving the efficiency of the dev, deployment, and operations pipeline. Therefore, you need to know what is going on, how your pipeline is performing, and how your infrastructure is behaving. There are a lot of monitoring solutions on the market for different layers of the technology stack. Taking the time to research the option that’s best for you will pay off in the form recovering from problems faster. This can help you go from reactive to proactive, with the right level of monitoring.
The concept of sharing in DevOps means that you should be sharing the problems and solutions that you’re working on throughout your company. When you run into challenges, you should talk to the different departments and people in your company. And you should share your solutions with these same groups so that everyone agrees on a shared solution and it prevents other teams for having to re-engineer the wheel. Consider sharing as a facilitator for collaboration and transparency.
The old school ways for developing software just didn’t scale well. It took too long to deliver too little. This hurts experimentation because it takes too long to do anything.
In contrast, DevOps scales better because you should be able to push a button and release to a production environment or any other environment with little or no downtime. Because you’re performing automated tests starting with the most granular unit test and all the way up to acceptance tests, they serve as gateways that will allow you to prevent easily testable issues from making it all the way through to production.
Because you’re able to get code into production so quickly, you have time to experiment. This could be in the form of A/B testing or in creating a proof of concept to deploy to a staging environment for some exploratory testing. This is what the term “fail fast” refers to.
If you can get a concept to the people who want to actually use it, you’ll have a faster turnaround for implementing their eventual feedback into your application. Even if the responses are negative, at least you will not have invested so much time. Failing fast is one of the things that DevOps will allow you to do.
I’d like to close this post with a real-world example: Etsy.
Most of the companies that are able to deliver software quickly and to scale well are practicing some form of DevOps. They may not call it DevOps or anything at all because some of the companies have grown organically into what we would now refer to as DevOps. Etsy falls into this category. I really like this as an example because I think most of us as engineers have worked on a platform like the one that Etsy started out with – check out their engineering blog https://codeascraft.com/.
In 2008, Etsy was structured in that siloed way that I talked about earlier. Dev teams wrote code, DBAs handled database stuff, and ops teams deployed. They were deploying twice a week, which is pretty good even by today’s standards. And, they were experiencing a lot of the deployment problems that I think are pretty common. Their early architectural choices were hindering their ability to scale. One example is that all of their business logic was stored in SQL procedures on one large database server. This was becoming a problem because they were generating a lot of site traffic.
Etsy recognized that they could do better and they identified silos as a problem. The way that they chose to solve it was using a designated operations engineer. What that meant was that each development team would have a designated ops person who would take part in all meetings and planning. In this way, developers understand operational concerns, and the designated ops engineer can serve as an advocate to the rest of the ops team for that project. This allowed developers to gain some insight into production, and operation to have insight into development.
Some of the things that they started using to help improve efficiency were to adopt Chef for configuration management. This allowed them to provision servers and probably some infrastructure. They switched using MySQL with master-master replication from Postgres, and the replication was really important because it allowed them to scale their database horizontally. To get business logic out of the database and into code they started using an ORM, which has a lot of value including the fact that now your database changes are versioned because your code is already under version control. They started using Feature Flags to allow features that aren’t complete to still be deployed without breaking anything.
They also removed the roadblocks that tend to slow down new developers. They created a self-service feature that allowed developers to get a dev VM up and running quickly that mirrors the tech stack that they’re using. All of this resulted in continuous integration, continuous delivery process that allowed them to deploy to production over 50 times a day, and they substantially increased their uptime. Five out of the last six months had 100% uptime and one month, I think it was April 2016, had 99.95% uptime. That’s according to pingdom.com. With the right people, processes, and tools, creating something like this is possible.
Etsy isn’t alone in this. Disney, Netflix, Amazon, Adobe, Facebook, IBM, and many others are using these DevOps practices to efficiently deliver high-quality software and their stories are all pretty similar. Moving your company towards DevOps will take effort. However, it pays off in the form of higher quality software, increased deployment frequency, fewer production bugs, etc.
If you’re not sure where to start, I recommend that you start by implementing a monitoring solution. Once you have quantifiable data, you can then start using that data to identify the bottlenecks in your development, deployment, and operations pipeline.
I’ll leave you with one final, albeit, unoriginal thought: Companies such as Etsy, Netflix, Amazon, and others, that are referred to as “tech unicorns” are really just horses with good PR.
This post is based on an excerpt of our webinar, Getting Started with DevOps. You can view the full list of webinars here.
A full introduction on DevOps is also available for you to explore in our Cloud Computing Course Section.
AWS's WaitCondition can be used with CloudFormation templates to ensure required resources are running.As you may already be aware, AWS CloudFormation is used for infrastructure automation by allowing you to write JSON templates to automatically install, configure, and bootstrap your ...
As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in the cloud.As the market leader and most mature p...
The announcements at re:Invent just keep on coming! Let’s look at what benefits these two new EC2 instance types offer and how these two new instances could be of benefit to you. If you're not too familiar with Amazon EC2, you might want to familiarize yourself by creating your first Am...
Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2018, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the first time. In t...
In order to understand AWS VPC egress filtering methods, you first need to understand that security on AWS is governed by a shared responsibility model where both vendor and subscriber have various operational responsibilities. AWS assumes responsibility for the underlying infrastructur...
Is it possible to create an S3 FTP file backup/transfer solution, minimizing associated file storage and capacity planning administration headache?FTP (File Transfer Protocol) is a fast and convenient way to transfer large files over the Internet. You might, at some point, have conf...
Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs).Microservices have become increasingly popular over the past few years. The modular architectural style,...
There are many use cases for tags, but what are the best practices for tagging AWS resources? In order for your organization to effectively manage resources (and your monthly AWS bill), you need to implement and adopt a thoughtful tagging strategy that makes sense for your business. The...
Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...
One of the main promises of cloud computing is access to nearly endless capacity. However, it doesn’t come cheap. With the introduction of Spot Instances for Amazon Web Services’ Elastic Compute Cloud (AWS EC2) in 2009, spot instances have been a way for major cloud providers to sell sp...
A Comparison of Machine Learning Services on AWS, Azure, and Google CloudArtificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to beli...
The AWS Command Line Interface (CLI) is for managing your AWS services from a terminal session on your own client, allowing you to control and configure multiple AWS services.So you’ve been using AWS for awhile and finally feel comfortable clicking your way through all the services....