Business and Technology Frameworks
Start course
3h 26m

The DevOps Institute is a collaborative effort between recognized and experienced leaders in the DevOps, InfoSec and ITSM space and acts as a learning community for DevOps practices. This DevOps Foundations course has been developed in partnership with the DevOps Institute to provide you with a common understanding of DevOps goals, business value, vocabulary, concepts, and practices. By completing this course you will gain an understanding of the core DevOps concepts, the essential vocabulary, and the core knowledge and principles that fortify DevOps principles. 

This course is made up of 8 lectures and an assessment exam at the end. Upon completion of this course and the exam, students will be prepped and ready to sit the industry-recognized DevOps Institute Foundation certification exam.

Learning Objectives 

  • Recognize and explain the core DevOps concepts.
  • Understand the principles and practices of infrastructure automation and infrastructure as code.
  • Recognize and explain the core roles and responsibilities of a DevOps practice.
  • Be prepared for sitting the DevOps institute Foundation certification exam after completing the course and assessment exam.

Intended Audience

  • Individuals and teams looking to gain an understanding and shared knowledge of core DevOps principles.  



- [Instructor] Hi and welcome back. In Lecture four, we learned to recognize and explain the Key DevOps Practices of Agile, ITSM, Lean, Safety Culture, Learning Organizations and Continuous Funding. I think the key understanding to grasp with DevOps is that modern IT systems are a system of systems and DevOps does not stand alone as yet another system. DevOps isn't the framework or methodology all in itself. DevOps adopts and leverages multiple frameworks and methodologies such as Agile, Lean and IT Service Management, but think of DevOps as the convergence of these principles. DevOps applies lean principles such as increasing flow and reducing waste to the IT value stream. And DevOps has benefited tremendously from the work the Agile community has done, showing how small teams operating with high trust, small batch sizes with small or more frequent software releases can dramatically increase productivity of development organizations. Manufacturing was transformed in the 1980s by moving to Lean, which enabled organizations that adopted the Lean practices to achieve faster lead times, better quality and to win in the market place. So by using these same principles, DevOps enables us to transform how we work in development and IT operations and by doing so we not only break the downward spiral but also generate more productivity and economic value for the business. So let's look at Agile. In the late 1990s, several methodologies began to get increasing public attention. And each had a different combination of old ideas, new ideas and transmuted old ideas, but they all emphasized close collaboration between the programmer team and the business experts. They encouraged face-to-face communication, as been more efficient than written documentation, and frequent delivery of new deployable business value. They encouraged tight self-organizing teams and ways to craft the code and the team so that the inevitable requirement's term was not such a crisis. In February 2001, at a summit of 17 independent-minded practitioners of several programming methodologies, the participants didn't agree on that much, but they found consensus around four main values. Individuals and interactions over processes and tools. Working software over comprehensive documentation. Customer collaboration over contract negotiations, and responding to change over following a plan. Supplements to the Agile manifesto can be found in the site and there's 12 principles which further explain what it is to be Agile. Scrum is a simple framework for effective team collaboration on complex projects and Scrum provides a small set of rules that create just enough structure for teams to be able to focus on their innovation and solving what otherwise might be otherwise insurmountable challenges. Scrum is not a process or a technique for building products, rather it's a framework within which you can employ various processes and techniques. Scrum makes clear the relative efficiency of product management and development practices so that you can improve. Scrum increases the ability to release more frequently. Scrum, in a nutshell, is basically three roles. 

The first is the product owner and that's an individual who manages the product backlog and ensures the value of the work that the team performs. They're also responsible for ensuring that the product backlog is visible, transparent and clearly shows what the team will work on next, and this ensures the team understands items in the product backlog. The product owner needs to be one person, not a committee, and they maintain the product backlog and ensure that it's visible to everyone. For the product owner to succeed, everyone in the organization has to respect his or her decisions, that's a priority. 

The Scrum Master is an individual who ensures that the team ensures to Scrum practices, values and rules. The team performs the work and delivers a potentially shippable product. No one is allowed to tell a team to work on a different set of priorities and the team isn't allowed to listen to anyone else who says otherwise. How about the scaled Agile framework, or SAFe? SAFe is another Agile framework that's gaining popularity. It's in this course to demonstrate that there's multiple approaches to achieve the principles of the Agile manifesto. In SAFe version 4.5, there are four configurations, Essential, Portfolio, Large Solution and Full. Essential SAFe is the most basic configuration. It describes the most critical elements needed and realizes the majority of the framework's benefits and allows for such concerns as strategic direction, investment funding and lean governments. Large Solution SAFe adds the Large Solution Label to Essential SAFe and allows for coordination and synchronization across multiple platforms, but without the Portfolio considerations. So Full SAFe includes all four labels, building on the team in program level in Essential SAFe. Now the metrics are the primary measure in SAFe and it's the objective measurement of working solutions. SAFe defines some additional intermediate and long-term measurements, as well. Metrics that teams, programs and portfolios can use to measure progress. 

Shared services represents the specialty roles that are necessary for the success of an art or value stream, but that cannot be dedicated full-time to any specific train. Community of Practice, or COP, is an informal group of team members and other experts acting within the context of the program or enterprise that has a mission of sharing practical knowledge in one or more relevant domains. Milestones is used to track progress toward a specific goal or event. These include fixed date, program implemented and learning milestones. The roadmap communicates planned art, or ART, and value stream. It's the deliverables and milestones over a timeline. The vision describes the future view of the solution to be developed, reflecting customer and stakeholder needs, as well as features and capabilities, which are proposed to address those needs. The System Team is a special Agile Team that provides assistance in building and using the Agile development environment. That's including continuous integration, test automation and automating the delivery pipeline. The lean user experience is the application of lean principles to user experience design. It uses an Edwardtive hypothesis driven approach to product development through constant measurement and learning loops. So in SAFe, lean UX is applied at scale, with the right combination of centralized and decentralized user experience design and implementation.


- [Speaker] One of the big success factors here at Spotify is our Agile Engineering Culture. Culture tends to be invisible, we don't notice it because it's there all the time, kind of like the air we breathe. But if everyone understands the culture, we're more likely to be able to keep it an even strengthen it as we grow. So that's the purpose of this video. When our first music player was launched in 2008, we were pretty much a Scrum company. Scrum is a well-established Agile development approach and it gave us a nice team-based culture. However, a few years later, we had grown into a bunch of teams and found that some of the standard Scrum practices were actually getting in the way. So we decided to make all this optional. Rules are a good start, but then break them when needed. We decided that Agile matters more than Scrum and Agile principles matter more than any specific practices, so we renamed the Scrum Master role to Agile Coach because we wanted Servant Leaders more than process masters. We also started using the term Squad instead of Scrum Team and our key driving force became autonomy. So, what is an Autonomous Squad? A squad is a small, cross-functional self-organizing team, usually less than eight people. They sit together, and they have end-to-end responsibility for the stuff they build, design, commit, deploy, maintenance, operations, the whole thing. Each squad has a long term mission such as make Spotify the best place to discover music or internal stuff like infrastructure for A B testing. Autonomy basically means that the squad decides what to build, how to build it and how to work it together while doing it. There are of course some boundaries to this, such as the Squad mission, the overall product strategy for whatever area they are working on and short-term goals that are renegotiated every quarter. Our office is optimized for collaboration. 

Here's a typical Squad area, the Squad members work closely together here, with adjustable desks and easy access to each other's screens. They gather over here in the lounge for things like planning sessions and retrospectives. And back there is a huddle room for smaller meetings or just to get some quiet time. Almost all walls are whiteboards. So, why is autonomy so important? Well, because it's motivating and motivated people build better stuff. Also, autonomy makes us fast by letting decisions happen locally in the squad instead of via a bunch of managers and committees and stuff. It helps us minimize hand-offs and waiting so we can scale without getting bogged down with dependencies and coordination. Although each squad has its own mission, they need to be aligned with product strategy, company priorities and other squads. Basically, be a good citizen in the Spotify ecosystem. Spotify's overall mission is more important than any individual squad, so the key principle is really be autonomous but don't suboptimize. It's kind of like a jazz band, although each musician is autonomous and plays his own instrument, they listen to each other and focus on the whole song together. That's how great music is created. So our goal is loosely coupled but tightly aligned squads. We're not all there yet, but we experiment a lot with different ways of getting closer. In fact, that applies to most things in this video. This culture description is really a mix of what we are to date and what we are trying to become in the future. Alignment and autonomy may seem like different ends of a scale as in more autonomy equals less alignment. However, we think of it more like two different dimensions. Down here is low alignment and low autonomy, a micromanagement culture. No high level purpose, just shut up and follow orders. Up here is high alignment, but still low autonomy. So leaders are good at communicating what problem needs to be solved, but they're also telling people how to solve it. High alignment and high autonomy means leaders focus on what problem to solve, but let the teams figure out to solve it. 

What about down here then? Low alignment and high autonomy means teams do whatever they want and basically they all run in different directions. Leaders are helpless and our product becomes a Frankenstein. We're trying hard to be up here, aligned autonomy and we keep experimenting with different ways of doing that. So alignment enables autonomy. The stronger alignment we have, the more autonomy we can afford to grant. That means the leader's job is to communicate what problem needs to be solved and why. And the squads collaborate with each other to find the best solution. One consequence of autonomy is that we have very little standardization. When people ask things like, "Which code editor do you use?" Or, "How do you plan?" The answer is mostly, "Depends on which squad." Some do Scrum sprints, others do cum bam, some estimate stories and estimate velocity, others don't. It's really up to each squad. Instead of formal standards, we have a strong culture of cross-pollination. When enough squads use a specific practice or tool, such as GIT, that becomes the path of least resistance. And other squads tend to pick the same tool. Squads start supporting that tool and helping each other and it becomes like a defacto standard. This informal approach gives us a healthy balance between consistency and flexibility.

Our architecture is based on over 100 separate systems, coded and deployed independently. There's plenty of interaction, but each system focuses on one specific need, such as play list management, search or monitoring. We try to keep them small and decoupled, with clear interfaces and protocols. Technically, each system is owned by one squad. In fact, most squads own several. But, we have an internal open-source model and our culture is more about sharing, than owning. Suppose Squad one here needs something done in system B and Squad two knows that code best. They'll typically ask Squad two to do it. However, if Squad two doesn't have time, or they have other priorities, then Squad one doesn't necessarily need to wait. We hate waiting. Instead, they're welcome to go ahead and edit the code themselves and then ask Squad two to review the changes. So anyone can edit any code, but we have a culture of peer code review. This improves quality and, more importantly, spreads knowledge. Over time, we've evolved design guidelines, code standards and other things to reduce engineering friction, but only when badly needed. So on a scale from authoritative to liberal, we're definitely more on the liberal side. Now, none of this would work if it wasn't for the people. We have a really strong culture of mutual respect. I keep hearing comments like, "My colleagues are awesome." People often give credit to each other for great work and seldom take credit for themselves. 

Considering how much talent we have here, there is surprisingly little ego. One big aha for new hires is that autonomy is kinda scary at first. You and your squad mates are expected to find your own solution, no one will tell you what to do. But it turns out if you ask for help, you get lots of it and fast. There's genuine respect for the fact that we're all in this boat together and need to help each other succeed. We focus a lot on motivation. Here's an example, an actual email from the head of People Operations. "Hi everyone, our employee satisfaction survey "says 91% enjoy working here and 4% don't." Now that may seem like a pretty high satisfaction rate, especially considering our growth pain, from 2006 to 2013, we have doubled every year and now have over 1,200 people. But then he continues, "This is of course not satisfactory, "and we want to fix it. "If you're one of those unhappy 4%, please contact us, "we're here for your sake, and nothing else." So, good enough isn't good enough. Half a year later things had improved and satisfaction rate was up to 94%. This strong focus on motivation has helped us build up a pretty good reputation as a work place. But, we still have plenty of problems to deal with. So yeah, we need to keep improving. Okay, so we have over 50 squads spread across four cities. Some kind of structure is needed. Currently, squads are grouped into tribes. A tribe is a lightweight matrix. Each person is a member of a squad as well as a chapter. The squad is the primary dimension, focusing on product delivery and quality, while the chapter is a competency area such as quality assistance, Agile coaching or web development. As squad member, my chapter lead is my formal line manager, a servant leader focusing on coaching and mentoring me as engineer. So I can switch squads without getting a new manager. It's a pretty picture, huh? Except that, it's not really true. In reality, the lines aren't nice and straight and things keep changing. Here's a real life example from one moment in time from one tribe, and of course, it's all different by now. And that's okay. The most valuable communication happens in informal and unpredictable ways. To support this, we also have guilds. A guild is a lightweight community of interest, where people across the whole company gather and share knowledge within a specific area. For example, leadership, web development or continuous delivery. Anyone can join or leave a guild at anytime. Guilds typically have a mailing list, bi-annual on conferences and other informal communication methods. 

Most organizational charts are an illusion. So our main focus is community, rather than hierarchal structures. We've found that a strong enough community can get away with an informal volatile structure. If you always need to know exactly who is making decisions, you're in the wrong place. One thing that matters a lot for autonomy is how easily can we get our stuff into production? If releasing is hard, we'll be tempted to release seldom to avoid the pain. That means each release is bigger and therefore even harder. It's a vicious cycle. But if releasing is easy, we can release often. That means each release is smaller and therefore easier. To stay in this loop and avoid that one, we encourage small, frequent releases and invest heavily in test automation and continuous delivery infrastructure. Release should be routine, not drama. Sometimes we make big investments to make releasing easier. For example, the original Spotify desktop client was a single, monolithic application. In the early days, with just a handful of developers that was fine. But as we grew, this became a huge problem. Dozens of squads had to synchronize with each other for each release and it could take months to get a stable version. Instead of creating lots of process and rules and stuff to manage this, we changed the architecture to enable decoupled releases. Using chromium embedded framework, the client is now basically a web browser in disguise. Each section is like a frame on the website and squads can release their own stuff directly. As part of this architectural change, we started seeing each client platform as a client app and evolved three different flavors of squads. The Client App Squads, Feature Squads and Infrastructure Squads. A Feature squad focuses on one feature area, such as search. This squad will build, ship and maintain search related features on all platforms. 

A Client App Squad focuses on making release easy on one specific client platform, such as desktop, IOS or Android. Infrastructure Squads focus on making other squads more effective. They provide tools and routines for things like continuous delivery, A B testing, monitoring and operations. Regardless of the current structure, we always strive for a self-service model. Kind of like a buffet, the restaurant staff don't serve you directly, they enable you to serve yourself. So, we avoid hand-offs like the plague. For example, an Operation Squad or Client App Squad, does not put code into production for people. Instead, their job is to make it easy for Feature Squads to put their own code into production. Despite the self service model, we sometimes need a bit of sync between squads when doing releases. We manage this using Release Trains and Feature Toggles. Each client app has a release train that departs on a regular schedule. Typically every week or every three weeks, depending on which client. Just like in the physical world, if trains depart frequently and reliably, you don't need much up front planning. Just show up and take the next train. Suppose these three squads are building stuff and when the next release train arrives, features A, B and C are done, while D is still in progress. The release train will include all four features, but the unfinished one is hidden, using a feature toggle. It may sound weird to release unfinished features and hide them, but it's nice because it exposes integration problems early and minimizes the need for code branches. Unmerged code hides problems and is a form of technical death. Feature toggles let us dynamically show and hide stuff in tests as well as production. In addition to hiding unfinished work, we use this to A B test and gradually roll out finished features. All in all, our release process is better than it used to be, but we still see plenty of improvement areas, so we'll keep experimenting. This may seem like a scary model letting each squad put their own stuff into production without any form of centralized control, and we do screw up sometimes. But we've learned that trust is more important than control. Why would we hire someone who we don't trust? Agile at scale requires trust at scale and that means no politics. It also means no fear. Fear doesn't just kill trust, it kills innovation because if failure gets punished, people won't dare try new things.


- [Instructor] So how do we go about increasing agility? DevOps increases agility by breaking down silos, first of all. It's improving some or removing some of the constraints that we have around how we do things and it takes a unified approach to systems engineering. It allows us to apply agile principles to both Dev and Ops, so we get flow back and forward. And we have a sharing knowledge, skills and experience of data. DevOps also allows us to increase agility by recognizing the importance of automation allowing us to deploy faster and with fewer errors. So Dev Ops extends agile principles beyond the boundaries of the software to the entire delivered service across the organization. So let's look at IT Service Management or ITSM. IT Service Management is the implementation and management of quality IT services that ultimately meets the needs of the business. So, it provides guidance and structure to processes such as change, configuration, release, problem and incident management. ITSM processes underpin the entire service life cycle from strategy, design, transition, operations, continual improvement and value creation. Now DevOps needs ITSM practices to meet the goal of deploying faster. It allows changes without causing interruption and disruption. And repeatable service management processes adapted to an organization's current business needs can lead the way to stable, continuous delivery and increased flow. ITIL or IT Infrastructure Libraries is the most widely accepted approach to IT Service Management. ITIL and ITSM remain the basic codifications of the processes that underpin IT Operations. And they actually describe many of the capabilities needed in order for IT operations to support a DevOps style work stream. 

Organizations that have adopted ITIL are finding that they can improve flow by applying Agile and lean principles to ITIL processes' improvement. Organizations that have adopted DevOps practices who aren't familiar with ITIL, find they turn to it for guidance in an effort to improve and streamline their processes. Now once processes have been streamlined, automation can be applied to accommodate the faster lead times and higher deployment frequencies associated with DevOps, that which are particularly relative to service transition processes such as change, service asset and configuration release and deployment management. So start with your key ITIL processes so ITSM process model support DevOps and continuous delivery. Predefined procedures such as the steps to be taken, the chronological order and dependencies, responsibilities, timescales and thresholds, escalation procedures are crucial to ensuring a successful DevOps implementation. Defining steps for handling specific types of transactions, for example, and ensuring a defined path or timeline is followed. Identify what can be automated. Some examples that really influence the success of DevOps' implementation is change models, release models, test models, incident models, problem models and request models. So Agile Service Management ensures that ITSM processes reflect Agile values and are designed with just enough control and structure in order to effectively and efficiently deliver services that facilitate customer outcomes, when and how they are needed. So the development side of many organizations is already becoming comfortable with Agile practices. Agile Service Management brings the vocabulary and the discipline of these practices to the Service Management side of a business. Particularly to functions such as Operations Management, the Service Desk and so forth. So, learning occurs through constant feedback. And when ensuring that the values of the Agile manifesto are embedded in ITSM processes throughout the service life cycle. 

Agile Service Management does not reinvent ITSM, it modernizes the approach. DevOps has its routes in the lean manufacturing world which addresses the problem of engineers designing products that factories can't afford to build. Simply put, lean manufacturing is the application of lean production principles to the manufacturing unit of an organization. In the same way that lean IT is the application of lean production principles to the IT organization, lean enterprise would be a strategic initiative typically driven as a top-down initiative. And a successful lean enterprise initiative would require empowering everyone in the organization to participate in lean. So let's look at some of the sources of waste, and the goal of lean thinking is to create more value for customers with fewer resources and least waste. Waste is any activity that does not add value to the process. A Toyota executive, Taiichi Ohno, first identified seven sources of waste as part of the Toyota production system. More recently, others have added to and modernized the sources of waste into an acronym, downtime. There are many other sources of waste, but it's a good start. So defects, deviations from requirements. Overproduction, producing more or faster than required. Waiting, delays while waiting on a previous step. Non-use, unused knowledge or creativity. Transportation, moving products from one location to another. Inventory, carrying more materials that are needed, and motion, moving people or assets more often than is required and excessive processing, doing more than is required. A fantastic acronym, downtime, love it. 

Value stream mapping is a lean tool that depicts the flow of information, materials and work across functional silos. With an emphasis on quantifying waste, including time and quality. So value stream mapping helps organizations analyze the current state and design a future state for the activities that take a product or service from its beginning through to the customer. A value stream map is used to understand and streamline work processes using lean tools and technologies. A key is to see things from a customer perspective and strive to reduce waste versus value adding activities. Effective value stream mapping involves observing the process directly, versus visualizing the value stream in an office of conference room. So a value stream map makes work visible to everyone. It's a dynamic document that gets refined as processes are improved. Here is an example of a value stream map. This visualization of a value stream, often a value stream is built with sticky notes and shows how where process or unit contributes value or represents waste. The improvement Kata is a structured and focused approach to create a continuous learning and improvement culture, aculture, which stands for good change. To make the scientific improvement Kata patent a habit in an organization, its managers teach and coach the improvement Kata routines as a little bit everyday, which is in contrast to continuous improvement, which approaches or attempts to predict the path and focus on implementation. The improvement Kata builds on discovery that occurs along the way. Teams using the improving Kata learn as they strive to reach a target condition and they adapt based on what they are learning. 

The five questions are used by a coach to guide a learner through the PDCA cycles and these are required to overcome obstacles standing in the way of a target condition. Martial arts practitioners may be familiar with the term Kata as it's repetition of a move that makes it natural and an instinctive habit. A safety culture in DevOps is about both psychological safety and rewarding the right behaviors. It's about being fearless, about reporting problems. Systemic safety, using techniques and tooling to prevent, preempt and predict and remediate failure. Safety Culture as a term emerged after the Chernobyl nuclear disaster and has been documented extensively by Sidney Dekker. The airlines that report the most incidents are statistically proven to be the safest. The Andon Cord is a cord in the Toyota factories that caused the line to stop when anyone spots a defect. It drives a number of behaviors thanks to the learning opportunity. Learning organizations understand that not embedding learning into the culture of an organization creates cultural debt. Organizations that don't learn are less likely to be able to compete long term. It may experience high and expensive staff turnover. At the end of Beyond the Phoenix Project, John Willis and Gene Kim conclude that they would refer to high performing organizations that practice DevOps principles as dynamic learning organizations. Management commitment is essential to becoming a learning organization. Continuous funding is a concept that's inherent to DevOps, budgeting cycles tend to be annual. And so, I'm not agile enough to work along Agile projects. At DOES 2016, Jon Smart from Barkley's Bank said that they had gone from 4% to 50% of strategic change projects being agile in the last 12 months. Helen Beal asked if he was continually funding them and the Twitter screenshot on the slide shows his response. "I stepped towards a continuous funding model. "We're starting to pilot agile investment "with quarterly rolling wave instead of annual budgeting." That brings us to the end of Lecture four, I will see you in Lecture five.

About the Author
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.