What is DevOps?
The Business Value of DevOps
Who's using DevOps?
The course is part of these learning paths
Modern software systems are becoming increasingly complex, to meet quality, availability, and security demands. And these systems are changing rapidly to keep up with the needs of end-users. With all of the changes, how do you ensure stability, quality, security, and innovation? In this course, we look at how the DevOps philosophy can provide a holistic way to look at software development, deployment, and operations. And provide some tenets to help improve quality, and stability.
You will gain the following skills by completing this course:
- Why automation, culture, and metrics are essential to a successful DevOps project.
- How DevOps can positively impact your business's bottom line.
- Learn which major companies are successfully utilizing DevOps in their own engineering processes.
You should take this course if you are:
- A newcomer to the DevOps or cloud world.
- Looking to upgrade your skills from a conventional software development career.
This Course Includes
- Expert-guided lectures about DevOps.
- 1 hour of high-definition video.
- Solid foundational knowledge for your explorations into DevOps.
What You'll Learn
|Video Lecture||What You'll Learn|
|What Is DevOps?||In this lecture series, you'll gain a fundamental understanding of DevOps and why it matters.|
|The Business Value of DevOps||Need to justify the business case for DevOps? This is the lecture series for you.|
|Who's Using DevOps?||Find out who's using DevOps in the enterprise - and why their success matters for your own organization.|
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Welcome back to our Introduction to DevOps course. I'm Ben Lambert and I'll be your instructor for this lecture. In this lecture, we're going to talk further about how DevOps can improve lead time. By the end of this lecture, you should know what lead time is and how a solid DevOps strategy can improve it.
Over the next few lectures, we're going to look at the same scenario through a few different lenses, the lens for this lecture being lead time to get a better understanding of how DevOps solves a business problem. The problem DevOps solves, as we've talked about in previous lectures, is that it mitigates the potential instabilities caused by constant change.
Developers create change. New features, bug fixes, code refactoring, it all introduces potential for instability, which is something that your operations team wants to prevent. So how do you balance the two? By setting up an automated environment that allows you to detect potential issues as fast as possible. Mistakes are going to happen. However, if you catch them early, you can prevent them from causing problems in production. Catching mistakes in production is expensive. Catching mistakes minutes after a developer commits their code or even before they commit their code is not.
In the context of software development and deployment, lead time is the total time that it takes for an idea to go from request to being released.
So let's define our scenario and talk about how DevOps can improve lead time. We're going to review a fictitious company called Acme Products Unlimited, or APU, and how they went from monthly code deployments to deploying dozens of times per day. APU is the leading manufacturer in pest management products. They had some quality issues a few years back, but those have since been resolved. They've created an online marketplace, and it turns out it's a huge hit. Traffic continues to increase and they have problems keeping their site up and running. They have a backlog of features that they want to get developed and released to the users, including a forum for users to chat about what works for them. They also want to release their latest annoying bird detection and capture devices.
However, they're concerned about pushing any changes because historically deployments have brought down the site for hours at a time. Their current process is to develop for a couple of weeks, and after resolving all of the merged conflicts, they hand over the code to QA for testing. The QA team sends back a list of defects after a few days of testing for developers to resolve. The developers stop what they're doing and resolve the bugs and send it back to QA. This back and forth happens a couple of times, and when QA are happy they sign off on the changes. Once QA has signed off, the code is sent over to the operations team. Operations reviews the change document, and if they don't have any questions, they schedule a time, usually on a Saturday, to have the developers join them to deploy.
Deployment Saturday comes around and the developers and operations show up early, coffee in hand, to start in on the deployment. The operations team pulls one of the web servers out of the load balancer, connects into it, and issues a Git pull command in the application directory. The latest code begins downloading, and in a few seconds it's complete. They open up a browser and enter the IP address of the newly updated web server and click Enter. The browser thinks about it for a second and then usually returns a 500 Error with no additional info because debug mode is turned off. So depending on the engineer, they either hit the logs in that server or they enable debug mode so they can see the error remotely. The problem at APU often comes down to some new dependency being introduced or some database change that a developer forgot to mention. The operations team and the development team gather around the screen for a little while and they inevitably resolve the issue and get the rest of the servers up and running. Or, they roll back until they can spend more time reviewing what went wrong so they can try again another day.
Typically while development and QA are doing their back and forth, operations is working on keeping the site up and running as best they can. Lately, this has meant configuring new virtual machines to act as web servers. So hopefully, for the sake of your sanity, none of this sounds familiar to you. We can see right away that APU lacks the culture required for a successful DevOps plan. They have silos working on phases of a project and then they hand it off to another silo without much collaboration. They also lack any sort of automation. They're performing manual source-based deployments and tying up a lot of time and resources.
Infrequent deployments are usually a symptom of an inefficient deployment process. Oftentimes companies feel that deployments are intrinsically difficult, so they try and do them as infrequently as possible. Now, this may seem logical, but it's actually the problem. The way to improve deployments is to perform them as often as possible. If you do deployments once per month, then it may not seem worth the investment in time and resources to automate the process. If there's an essential action that is problematic, then the answer is to figure out how to perform that action so often that it is no longer a problem.
Let's look back at our scenario company one year later, after the new CTO has adopted a DevOps philosophy. APU had a big culture change starting a year ago when the new CTO came in, and he pushed his DevOps philosophy. His first item was to start measuring everything. He established a baseline for lead time, up time, how often deployments were being done, MTTD, and MTTR, just to name a few.
Once he had his metrics, he started forming autonomous cross-functional teams, teams comprised of developers, QA, security, and operations engineers. No longer would there be a hand-off model. The teams each worked on their own product from start to finish. If you created it, you ran it in production. He had his engineers implement a fully automated continuous integration and continuous delivery pipeline.
The continuous integration was implemented with Jenkins. The developers would check their code into a Git repository and Jenkins would grab those changes. It would build the project and create an artifact, then it would run all of the tests to make sure that the build was successful, and mark it as successful only if everything passed. The QA and security members worked with the entire team to devise a complete set of tests, including things like load testing and security audits.
The developers began using feature toggles to ensure that features that weren't ready for release wouldn't impact the running code and production but could still be deployed, thus allowing the main branch of the repo to serve as the canonical source. And deployments were happening several times a day, usually with no impact on stability. The closer collaboration between developers and operations promoted a more efficient code base, reducing the total number of servers required. And the culture of automation inspired a complete push to the cloud and the adoption of an elastic infrastructure.
When the server load on the web servers gets too high, the auto scaling adds a new server. The new server is an immutable, pre-baked image made with Spinnaker. And when the server load dies down, the newly created server is terminated. The team also switched to a blue green deployment model that has allowed them to deploy multiple times per day and roll back if they need to. Overall, APU is now able to deploy multiple times per day with less down time and better lead time.
So, our scenario may be imaginary, but it was inspired by real companies. In this scenario, how did APU improve their lead time? Well, they implemented a series of changes that allowed code to be deployed throughout the day, thus removing their largest constraint. Now, their developers can focus on creation rather than unplanned work. What were their steps? Well, first, they established a clear baseline, allowing them to determine the success of their efforts. Then they made top down cultural changes.
Finally, they implemented a completely automated continuous delivery process. What this means is that as soon as developers check in their code, the process takes over and ensures the code meets all the requirements to be production quality. Then it allows for a person to deploy to any environment with a push of a button. In order to improve your lead time, you need to review your entire development, deployment, and operations pipeline. You need to look at it carefully and identify where the constraints are located, then you can start planning on ways to remove or reduce those constraints. This process should be continuous until you've hit your goals.
Remember that automation will help with many of the constraints and also make for a more consistent process. So, what have we learned in this lecture? We learned that lead time is the time it takes to get a feature from request to released. We also learned that a solid DevOps plan will help to identify and remove the constraints that are preventing you from moving quickly.
In our next lecture, we'll use the same scenario to further discuss how DevOps can improve stability. So, let's get started.
Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.