The course is part of these learning paths
What is DevOps?
The Business Value of DevOps
Who's using DevOps?
Modern software systems are becoming increasingly complex, to meet quality, availability and security demands. And these systems are changing rapidly to keep up with the needs of end-users. With all of the changes, how do you ensure stability, quality, security and innovation? In this course we look at how the DevOps philosophy can provide a holistic way to look at software development, deployment and operations. And provide some tenets to help improve quality, and stability.
You will gain the following skills by completing this course:
- Why automation, culture, and metrics are essential to a successful DevOps project.
- How DevOps can positively impact your business's bottom line.
- Learn which major companies are successfully utilizing DevOps in their own engineering processes.
You should take this course if you are:
- A newcomer to the DevOps or cloud world.
- Looking to upgrade your skills from a conventional software development career.
This Course Includes
- Expert-guided lectures about DevOps.
- 1 hour of high-definition video.
- Solid foundational knowledge for your explorations into DevOps.
What You'll Learn
|Video Lecture||What You'll Learn|
|What Is DevOps?||In this lecture series, you'll gain a fundamental understanding of DevOps and why it matters.|
|The Business Value of DevOps||Need to justify the business case for DevOps? This is the lecture series for you.|
|Who's Using DevOps?||Find out who's using DevOps in the enterprise - and why their success matters for your own organization.|
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
- Welcome back to our Introduction to
DevOps course. I'm Ben Lambert and I'll be
your instructor for this lecture. In this
lecture, we're going to talk further about
how DevOps can improve lead time. By the
end of this lecture, you should know what
lead time is and how a solid DevOps
strategy can improve it. Over the next few
lectures, we're going to look at the same
scenario through a few different lenses,
the lens for this lecture being lead time
to get a better understanding of how
DevOps solves a business problem.
The problem DevOps solves, as we've talked
about in previous lectures, is that it
mitigates the potential instabilities
caused by constant change. Developers
create change. New features, bug fixes,
code refactoring, it all introduces
potential for instability, which is
something that your operations team wants
to prevent. So how do you balance the two?
By setting up an automated environment
that allows you to detect potential issues
as fast as possible. Mistakes are going to
happen. However, if you catch them early,
you can prevent them from causing problems
in production. Catching mistakes in
production is expensive. Catching mistakes
minutes after a developer commits their
code or even before they commit their code
is not. In the context of software
development and deployment, lead time is
the total time that it takes for an idea
to go from request to being released. So
let's define our scenario and talk about
how DevOps can improve lead time.
We're going to review a fictitious company
called Acme Products Unlimited, or APU,
and how they went from monthly code
deployments to deploying dozens of times
per day. APU is the leading manufacturer
in pest management products. They had some
quality issues a few years back, but those
have since been resolved. They've created
an online marketplace, and it turns out
it's a huge hit. Traffic continues to
increase and they have problems keeping
their site up and running. They have a
backlog of features that they want to get
developed and released to the users,
including a forum for users to chat about
what works for them. They also want to
release their latest annoying bird
detection and capture devices. However,
they're concerned about pushing any
changes because historically deployments
have brought down the site for hours at a
Their current process is to develop for a
couple of weeks, and after resolving all
of the merged conflicts, they hand over
the code to QA for testing. The QA team
sends back a list of defects after a few
days of testing for developers to resolve.
The developers stop what they're doing and
resolve the bugs and send it back to QA.
This back and forth happens a couple of
times, and when QA are happy they sign off
on the changes. Once QA has signed off,
the code is sent over to the operations
team. Operations reviews the change
document, and if they don't have any
questions, they schedule a time, usually
on a Saturday, to have the developers join
them to deploy.
Deployment Saturday comes around and the
developers and operations show up early,
coffee in hand, to start in on the
deployment. The operations team pulls one
of the web servers out of the load
balancer, connects into it, and issues a
Git pull command in the application
directory. The latest code begins
downloading, and in a few seconds it's
complete. They open up a browser and enter
the IP address of the newly updated web
server and click Enter. The browser thinks
about it for a second and then usually
returns a 500 Error with no additional
info because debug mode is turned off. So
depending on the engineer, they either hit
the logs in that server or they enable
debug mode so they can see the error
The problem at APU often comes down to
some new dependency being introduced or
some database change that a developer
forgot to mention. The operations team and
the development team gather around the
screen for a little while and they
inevitably resolve the issue and get the
rest of the servers up and running. Or,
they roll back until they can spend more
time reviewing what went wrong so they can
try again another day. Typically while
development and QA are doing their back
and forth, operations is working on
keeping the site up and running as best
they can. Lately, this has meant
configuring new virtual machines to act as
So hopefully, for the sake of your sanity,
none of this sounds familiar to you. We
can see right away that APU lacks the
culture required for a successful DevOps
plan. They have silos working on phases of
a project and then they hand it off to
another silo without much collaboration.
They also lack any sort of automation.
They're performing manual source-based
deployments and tying up a lot of time and
resources. Infrequent deployments are
usually a symptom of an inefficient
deployment process. Oftentimes companies
feel that deployments are intrinsically
difficult, so they try and do them as
infrequently as possible.
Now, this may seem logical, but it's
actually the problem. The way to improve
deployments is to perform them as often as
possible. If you do deployments once per
month, then it may not seem worth the
investment in time and resources to
automate the process. If there's an
essential action that is problematic, then
the answer is to figure out how to perform
that action so often that it is no longer
a problem. Let's look back at our scenario
company one year later, after the new CTO
has adopted a DevOps philosophy.
APU had a big culture change starting a
year ago when the new CTO came in, and he
pushed his DevOps philosophy. His first
item was to start measuring everything. He
established a baseline for lead time, up
time, how often deployments were being
done, MTTD, and MTTR, just to name a few.
Once he had his metrics, he started
forming autonomous cross-functional teams,
teams comprised of developers, QA,
security, and operations engineers. No
longer would there be a hand-off model.
The teams each worked on their own product
from start to finish. If you created it,
you ran it in production. He had his
engineers implement a fully automated
continuous integration and continuous
delivery pipeline. The continuous
integration was implemented with Jenkins.
The developers would check their code into
a Git repository and Jenkins would grab
those changes. It would build the project
and create an artifact, then it would run
all of the tests to make sure that the
build was successful, and mark it as
successful only if everything passed.
The QA and security members worked with
the entire team to devise a complete set
of tests, including things like load
testing and security audits. The
developers began using feature toggles to
ensure that features that weren't ready
for release wouldn't impact the running
code and production but could still be
deployed, thus allowing the main branch of
the repo to serve as the canonical source.
And deployments were happening several
times a day, usually with no impact on
stability. The closer collaboration
between developers and operations promoted
a more efficient code base, reducing the
total number of servers required. And the
culture of automation inspired a complete
push to the cloud and the adoption of an
When the server load on the web servers
gets too high, the auto scaling adds a new
server. The new server is an immutable,
pre-baked image made with Spinnaker. And
when the server load dies down, the newly
created server is terminated. The team
also switched to a blue green deployment
model that has allowed them to deploy
multiple times per day and roll back if
they need to. Overall, APU is now able to
deploy multiple times per day with less
down time and better lead time.
So, our scenario may be imaginary, but it
was inspired by real companies. In this
scenario, how did APU improve their lead
time? Well, they implemented a series of
changes that allowed code to be deployed
throughout the day, thus removing their
largest constraint. Now, their developers
can focus on creation rather than
unplanned work. What were their steps?
Well, first, they established a clear
baseline, allowing them to determine the
success of their efforts. Then they made
top down cultural changes. Finally, they
implemented a completely automated
continuous delivery process.
What this means is that as soon as
developers check in their code, the
process takes over and ensures the code
meets all the requirements to be
production quality. Then it allows for a
person to deploy to any environment with a
push of a button. In order to improve your
lead time, you need to review your entire
development, deployment, and operations
pipeline. You need to look at it carefully
and identify where the constraints are
located, then you can start planning on
ways to remove or reduce those
constraints. This process should be
continuous until you've hit your goals.
Remember that automation will help with
many of the constraints and also make for
a more consistent process.
So, what have we learned in this lecture?
We learned that lead time is the time it
takes to get a feature from request to
released. We also learned that a solid
DevOps plan will help to identify and
remove the constraints that are preventing
you from moving quickly. In our next
lecture, we'll use the same scenario to
further discuss how DevOps can improve
stability. So, let's get started.
About the Author
Ben Lambert is the Director of Engineering and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps.
When he’s not building the first platform to run and measure enterprise transformation initiatives at Cloud Academy, he’s hiking, camping, or creating video games.