SRE, Other Frameworks, and Trends
The course is part of this learning path
This course reviews how site reliability engineering (SRE) can work with and complement other frameworks, methodologies and/or delivery approaches. Additionally, it looks at SRE trends and evolution. By the end of this course, you'll have a clear understanding of how to use SRE together with other frameworks and where to expect SRE is going.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Understand how SRE interacts with other frameworks such as Agile, DevOps, and ITSM
- Learn the five new trends forming in SRE, and how it is evolving
- Explore the specialized jobs roles found in site reliability engineering
Anyone interested in learning about SRE and its fundamentals
Software Engineers interested in learning about how to use and apply SRE within an operations environment
DevOps practitioners interested in understanding the role of SRE and how to consider using it within their own organization
To get the most out of this learning path, you should have a basic understanding of DevOps, software development, and the software development lifecycle.
Welcome back. In this course, I'm going to review how SRE can work with and complement other frameworks, methodologies and/or delivery approaches. Additionally, I'll also provide a discussion on SRE trends and evolution. By the end of this course, you'll have a clear understanding of how to use SRE together with other frameworks and where to expect SRE is going. Right, let's begin.
SRE applies to anyone running services and production. SRE mostly complements other existing frameworks, methodologies and/or delivery approaches that have already been adopted. If you're already using other frameworks, then the good news is, is that SRE will complement those frameworks. Again, the sweet spot for SRE is organizations which run large services and production and want to do so reliably and at scale.
When considering the position and role that SRE plays, it's useful to consider it alongside existing frameworks, if only to acknowledge how it can be complementary. Consider the following. Agile unifies the business with delivery. DevOps advocates mechanisms like continuous delivery and continuous deployment for increasing velocity and flow. SRE provides business wide focus on stability and reliability. And ITSM builds organizational learning across the value stream.
Choosing SRE does not require you to give up or swap out other frameworks. In fact, SRE can and often does interact with other frameworks and can be highly complimentary. Pause here and reflect briefly on the following question. How does Agile inform what we do in SRE? The intention here is to consider what agile means to your organization and if it overlaps or impacts on SRE and vice versa. There are no right and wrong answers here.
Moving along, let's now consider how SRE plays with these other frameworks, such as Agile, DevOps, and ITSM. Here we can see that SRE teams can behave in an agile way by using frameworks like Scrum and Kanban. Backlogs of toil make work visible and automation can be prioritized. Ceremonies ensure coordination, visibility, and prioritization. The definition of done is more production focused and value delivered through working software.
SRE and DevOps go hand in hand. SRE is in a sense a specialization of DevOps where organizational silos are broken down further, pipelines of delivery go further, DevOps metrics and measures are further improved and automation is more widespread and consistent. The following case story presented by VictorOps states that, "Like DevOps, there is no one-size fits all approach to Site Reliability Engineering. What works for a company like Google or Facebook doesn't make sense for us. Effective SRE isn't simply responding to incidents quickly when they happen, but in building infrastructure proper testing and improving the availability of your systems."
Here, the key takeaway is that SRE really is a key component of a DevOps culture and discipline. Considering SRE and ITSM together. SRE can help with ITSM compliance activities through automation and engineering. Like SRE, ITIL processes are underpinned by automation, particularly during transition in operation processes as part of continuous testing and delivery. IN SRE, failure is a learning opportunity, continuous learning is embedded in ITSM.
ITIL provides guidance and structure to processes such as change, configuration, release, incident and problem management, all areas that SRE are involved in as well. ITSM process models support SRE. Models are used throughout ITIL for example, incident management, problem management, and/or request fulfillment. Creating models for different pipelines can also help you to understand the level of governance, risk and compliance that is needed for each. In fact, the engineering heartbeat of SRE automates away a lot of manual service management processes, in particular service transition, service operations and service improvement.
Engineering approaches also make compliance evidence available. SRE doesn't need to stand alone. No one framework needs to be exclusively chosen in favor of the other, and this is certainly also the case with SRE. In fact, all frameworks have their own strengths and weaknesses and purpose.
Overall, what we are trying to achieve as a seamless and productive service delivery flow across the organization, which delivers stable and reliable services that give value to our stakeholders. Agile, DevOps, SRE and ITIL are all contributors to this. SRE is part of a system of systems for delivery. This slide illustrates where the main delivery frameworks fit across the value stream. The exact alignment might differ slightly when applied within different environments.
Different implementations affect the value stream in different ways. Regardless, all approaches flow from left to right. They are about delivering more to users. SRE is unique in that it flows from not left to right, but from right to left, taking the wisdom of production and giving it back to improve ideas, plans, designs, builds, deployments, testing releases, and operation or toil reduction. The following YouTube hosted video is worth a watch as it weaves together different frameworks which have been discussed in this course.
Let's now move on and discuss how SRE is evolving. But before we do so, consider the following question. What future challenges do you see coming that affect SRE? Consider what things currently impact your business and the onward impact they will have on production and ultimately, what the role of Site Reliability Engineering will play. Some ideas to consider might be, one, we're increasing our digital channels, two, we're going all in on cloud in cloud hybrids, three, all of our future applications will be mobile first. And four, were expecting significant user growth across our digital channels.
The following opinion provided by a senior reliability engineer at LinkedIn highlights several important things as to where they think SRE is heading. "The five new trends that I see emerging are: "failure is the new normal, automation as a service, cloud is king, observe and learn, and the evolution of the network engineer."
Let's now drill down into these five new expected trends. One, failure is the new normal. This is the idea that we should learn from failure and even start introducing failure. Two, automation as a service. As a service means we no longer need to install and manage automation tools ourselves. Instead, service providers will make them available to us. For example, consider CI/CD as a service from Amazon. Three, cloud is king. Using managed services for reliability offered by cloud platforms, e.g. auto scaling and self healing, this will become the norm. Four, observe and learn, i.e., observability. The idea of learning more about our services and learning what makes them reliable. And five, the evolution of the network engineer.
I will discuss the concept of an NRE, network reliability engineer, as well as other roles in the coming slides. Picking up on point five presented in the previous slide, expect to see SRE role specialization. For example, the job role of a network reliability engineer or NRE. Acting in this capacity, you would apply an engineering approach to measure and automate the reliability on networks. You would codify software-defined networks, SDNs, and apply SDLC principles to build, test and deploy network changes. You would use chaos engineering to test the reliability of your networks and you would monitor network service level indicators and create automated and/or manual responses.
Next up, the database reliability engineer or DBRE. Acting in the job role of a database reliability engineer, you would use software and tooling to automate manual database tasks. Apply chaos engineering to the database to confirm failover and restore responses. Move towards using managed database services. For example, AWS's RDS service. And provide innovative solutions to organizational data challenges.
Next up, the customer reliability engineer or CRE. Acting in the job role of a customer reliability engineer, you would apply an engineering mindset to customer support, create a shared responsibility across supplier and customers use SRE practices to improve customer applications and leverage automation to reduce toil. For example, introducing automated customer incident management tools and/or chat-bots. And finally, the heritage reliability engineer or HRE. Acting in the job role of a heritage reliability engineer, you will establish SLOs focused on legacy processing. You would re-platform legacy onto modern architectures, perhaps going from mainframe-to-intel, or even onto cloud, eliminate toil by investing time in engineering, expand telemetry to improve observability, and run failure tests to ensure critical services remain available.
Now, before I finish, consider the following prediction given by Honeycomb's CEO regarding where she thinks SRE tooling is heading. "I believe that in the next three years, all three of those categories, APM, monitoring/metrics, logs, and possibly others are likely to cease to exist. There will only be one category: observability. And it will contain all the insights you need to understand any state your system can get itself into."
Okay, that completes this course. In this course, you learned about how SRE can work with and complement other frameworks, methodologies and/or delivery approaches. You also learn about SRE trends, evolution, and new SRE job specializations. Okay, close this course and I'll see you shortly in the next one.
About the Author
Jeremy is the DevOps Content Lead at Cloud Academy where he specializes in developing technical training documentation for DevOps.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 20+ years. In recent times, Jeremy has been focused on DevOps, Cloud, Security, and Machine Learning.
Jeremy holds professional certifications for both the AWS and GCP cloud platforms.