5 Best DevOps Tools You Need to Know
Here Are 5 Best DevOps Tools Designed to Streamline Your Engineering PipelineAs the DevOps industry continues to grow, there are a variety of too...Learn More
When it comes to delivering software projects, there must be an easier way. At least that’s the idea behind DevOps. But what is DevOps? And why is it important? These are the questions I’ll try to answer in this post, which is based on our webinar Getting Started with DevOps.
Before I answer the question “what is DevOps?” I’d like to start with another question: What problem is DevOps trying to solve?
In large companies, it’s common to structure teams according to their role. You may have teams for development, QA, database management, operations, and security, among others. It’s also common that each team may be working in complete isolation from the others. Unfortunately, these types of workplace silos are common. Developers write code and perhaps some documentation for how to install it. They send it across to the QA team who does a bit of testing and goes back and forth with the developers to fix bugs. At some point, the code will be sent over the proverbial wall where it will be handed off to operations to run it in production.
Based on my experience, most companies allocate a week or so for QA, but leave little time for developers to fix bugs. This silo pattern is extremely inefficient. As developers, the only connection that we have to code running in production is that we have a queue of tickets for production bug fixes. We are so disconnected from our code that fixing bugs is how we know what’s going on with it in production. At the same time, QA spends a lot of time performing tests that that should be automated. Meanwhile, operations has to support code that is written by developers who may not even know what the production environment looks like. And this makes it tough for operations folks to do their jobs well.
The good news is that there’s a better way: DevOps. If you search for “what is DevOps?” you’re going to see a lot of opinions. What you won’t find, however, is any formal generally agreed-upon definition. Why? I suspect that the lack of a formal definition offers the flexibility required for companies to leverage the principles of DevOps without having to adhere to a strict definition.
While strict and clear definitions have their advantages, a downside is that they can create a rigid environment. As a result, if you need to deviate from a definition, it can cause some friction among team members who follow it to the letter. Now, I can’t be sure if DevOps was created with this flexibility by design or if it was just lucky. However, either way, it’s better for all of us.
So, since there’s no agreed-upon definition for what DevOps is, I’ll share mine.
I call DevOps a philosophy of the efficient development, deployment, and operation of the highest quality software possible. I call it a philosophy because it’s really a system of thinking with a primary concern for developing, deploying, and operating high-quality software.
If you consider development, deployment, and operations as a pipeline for your code to flow through, then DevOps is about looking at that pipeline from a holistic perspective. The goal of looking at the pipeline holistically is to find ways to make it more efficient and produce higher quality products.
The logical question is then, how does DevOps help increase the efficiency of the pipeline and increase the quality of your software? It encompasses both of these through some generally agreed-upon principles. By generally agreed-upon, what I mean is that there is at least some level of consensus in the technical community that these are a good thing. They are often abbreviated as CAMS: Culture, Automation, Measurement, and Sharing. Let’s dig into each of these terms.
DevOps as a name is the combination of the abbreviations “dev” for development and “ops” for operations. Even the name suggests that DevOps is about bringing teams together. DevOps culture is about breaking down the silos that prohibit teams from collaborating. As the level of collaboration increases, so does the potential for improved quality and efficiency.
Breaking down silos means more collaboration between teams. It can also mean that company values may need to change. This sort of change tends to happen from the top down. Some important values are collaboration, quality, efficiency, security, empathy, and transparency. If your company doesn’t value these things, then it’s likely that no amount of technology is going to help.
For example, if quality isn’t a company value, then as an engineer, you likely won’t get the time you need to create unit tests, integration tests, and automated acceptance tests. The same goes for any of the values we mentioned. If it isn’t a company value, then it’s not important to the company. And, it will typically go ignored until it’s time to place blame when something goes wrong.
Automation removes all of the obstacles that prevent us, as engineers, from delivering awesome new features. Once you start automating, however, it can be really easy to just go crazy and try to automate absolutely everything. This would be a mistake. Trying to automate things such as user acceptance testing is usually more effort than it’s worth. If you’re new to automation, a good place to start is with automating a continuous integration process. It should build your software and run the unit test on each commit, then notify the team of either success or failure.
A failed build should result in holding off on any new commits that aren’t intended to fix the build. The goal is to prioritize keeping your software in a working state over doing new work, even if it can be a difficult thing to do.
Once you have automated the continuous integration process, you can start to automate a continuous delivery process. Each successful build artifact produced by the continuous integration server should be deployed to some environment that mirrors production. This is where automated system acceptance tests, as well as any automated non-functional tests should be run. Examples of non-functional tests includes things such as load testing and security audits, among others.
At this point, if all of the automated tests were successful, then any manual tests can be run against the staged environment.
This is different from the old school way of testing, because you’re only having people test versions of your software that have already passed all of the automated tests. Deploying code into production containing bugs that are easy to test for is costly and embarrassing. Having an automated pipeline helps to catch such things early, long before they make it into production. Once manual testing is complete, the build is considered ready to be released. At this point, it’s a business decision to deploy, plan a scheduled release, etc.
Automation is an important part of getting the code into production quickly and efficiently. It’s also important for managing infrastructure. Being able to manage infrastructure in code is really one of my favorite parts of DevOps because you can codify how your infrastructure should be laid out, and I find that incredibly valuable.
You can’t make informed decisions if you don’t know what’s really going on with your software and infrastructure. That’s why measurement is so important. Earlier on, I said that DevOps is about improving the efficiency of the dev, deployment, and operations pipeline. Therefore, you need to know what is going on, how your pipeline is performing, and how your infrastructure is behaving. There are a lot of monitoring solutions on the market for different layers of the technology stack. Taking the time to research the option that’s best for you will pay off in the form recovering from problems faster. This can help you go from reactive to proactive, with the right level of monitoring.
The concept of sharing in DevOps means that you should be sharing the problems and solutions that you’re working on throughout your company. When you run into challenges, you should talk to the different departments and people in your company. And you should share your solutions with these same groups so that everyone agrees on a shared solution and it prevents other teams for having to re-engineer the wheel. Consider sharing as a facilitator for collaboration and transparency.
The old school ways for developing software just didn’t scale well. It took too long to deliver too little. This hurts experimentation because it takes too long to do anything.
In contrast, DevOps scales better because you should be able to push a button and release to a production environment or any other environment with little or no downtime. Because you’re performing automated tests starting with the most granular unit test and all the way up to acceptance tests, they serve as gateways that will allow you to prevent easily testable issues from making it all the way through to production.
Because you’re able to get code into production so quickly, you have time to experiment. This could be in the form of A/B testing or in creating a proof of concept to deploy to a staging environment for some exploratory testing. This is what the term “fail fast” refers to.
If you can get a concept to the people who want to actually use it, you’ll have a faster turnaround for implementing their eventual feedback into your application. Even if the responses are negative, at least you will not have invested so much time. Failing fast is one of the things that DevOps will allow you to do.
I’d like to close this post with a real-world example: Etsy.
Most of the companies that are able to deliver software quickly and to scale well are practicing some form of DevOps. They may not call it DevOps or anything at all because some of the companies have grown organically into what we would now refer to as DevOps. Etsy falls into this category. I really like this as an example because I think most of us as engineers have worked on a platform like the one that Etsy started out with – check out their engineering blog https://codeascraft.com/.
In 2008, Etsy was structured in that siloed way that I talked about earlier. Dev teams wrote code, DBAs handled database stuff, and ops teams deployed. They were deploying twice a week, which is pretty good even by today’s standards. And, they were experiencing a lot of the deployment problems that I think are pretty common. Their early architectural choices were hindering their ability to scale. One example is that all of their business logic was stored in SQL procedures on one large database server. This was becoming a problem because they were generating a lot of site traffic.
Etsy recognized that they could do better and they identified silos as a problem. The way that they chose to solve it was using a designated operations engineer. What that meant was that each development team would have a designated ops person who would take part in all meetings and planning. In this way, developers understand operational concerns, and the designated ops engineer can serve as an advocate to the rest of the ops team for that project. This allowed developers to gain some insight into production, and operation to have insight into development.
Some of the things that they started using to help improve efficiency were to adopt Chef for configuration management. This allowed them to provision servers and probably some infrastructure. They switched using MySQL with master-master replication from Postgres, and the replication was really important because it allowed them to scale their database horizontally. To get business logic out of the database and into code they started using an ORM, which has a lot of value including the fact that now your database changes are versioned because your code is already under version control. They started using Feature Flags to allow features that aren’t complete to still be deployed without breaking anything.
They also removed the roadblocks that tend to slow down new developers. They created a self-service feature that allowed developers to get a dev VM up and running quickly that mirrors the tech stack that they’re using. All of this resulted in continuous integration, continuous delivery process that allowed them to deploy to production over 50 times a day, and they substantially increased their uptime. Five out of the last six months had 100% uptime and one month, I think it was April 2016, had 99.95% uptime. That’s according to pingdom.com. With the right people, processes, and tools, creating something like this is possible.
Etsy isn’t alone in this. Disney, Netflix, Amazon, Adobe, Facebook, IBM, and many others are using these DevOps practices to efficiently deliver high-quality software and their stories are all pretty similar. Moving your company towards DevOps will take effort. However, it pays off in the form of higher quality software, increased deployment frequency, fewer production bugs, etc.
If you’re not sure where to start, I recommend that you start by implementing a monitoring solution. Once you have quantifiable data, you can then start using that data to identify the bottlenecks in your development, deployment, and operations pipeline.
I’ll leave you with one final, albeit, unoriginal thought: Companies such as Etsy, Netflix, Amazon, and others, that are referred to as “tech unicorns” are really just horses with good PR.
This post is based on an excerpt of our webinar, Getting Started with DevOps. You can view the full list of webinars here.
A full introduction on DevOps is also available for you to explore in our Cloud Computing Course Section.
Learn how Aviatrix’s intelligent orchestration and control eliminates unwanted tradeoffs encountered when deploying Palo Alto Networks VM-Series Firewalls with AWS Transit Gateway.Deploying any next generation firewall in a public cloud environment is challenging, not because of the f...
Use AWS Config the Right Way for Successful ComplianceIt’s well-known that AWS Config is a powerful service for monitoring all changes across your resources. As AWS Config has constantly evolved and improved over the years, it has transformed into a true powerhouse for monitoring your...
Cloud Academy is a proud sponsor of the 2019 AWS Summits in Atlanta, London, and Chicago. We hope you plan to attend these free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. These events are all about learning. You can learn how t...
The AWS cloud platform has made it easier than ever to be flexible, efficient, and cost-effective. However, monitoring your AWS infrastructure is the key to getting all of these benefits. Realizing these benefits requires that you follow AWS best practices which constantly change as AWS...
Amazon Web Services’ resource offerings are constantly changing, and staying on top of their evolution can be a challenge. Elastic Cloud Compute (EC2) instances are one of their core resource offerings, and they form the backbone of most cloud deployments. EC2 instances provide you with...
Before migrating domains to Amazon's Route53, we should first make sure we properly understand how DNS worksWhile we'll get to AWS's Route53 Domain Name System (DNS) service in the second part of this series, I thought it would be helpful to first make sure that we properly understand...
As businesses expand their footprint on AWS and utilize more services to build and deploy their applications, it becomes apparent that multiple AWS accounts are required to manage the environment and infrastructure. A multi-account strategy is beneficial for a number of reasons as ...
AWS's WaitCondition can be used with CloudFormation templates to ensure required resources are running.As you may already be aware, AWS CloudFormation is used for infrastructure automation by allowing you to write JSON templates to automatically install, configure, and bootstrap your ...
Massive migration to the public cloud is changing architecture patterns, operating principles, and governance models. That means new approaches are vital to get a handle on soaring cloud spend. Because the cloud’s short-term billing cycles call for financial discipline, you must empower...
As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing.As the market leader and most ma...
As Head of Content at Cloud Academy I work closely with our customers and my domain leads to prioritize quarterly content plans that will achieve the best outcomes for our customers.We started 2018 with two content objectives: To show customer teams how to use Cloud Services to solv...
“Add GPU acceleration to any Amazon EC2 instance for faster inference at much lower cost (up to 75% savings)”So you’ve just kicked off the training phase of your multilayered deep neural network. The training phase is leveraging Amazon EC2 P3 instances to keep the training time to a...