Principles of SRE
The course is part of this learning path
This course provides an introduction to Site Reliability Engineering (SRE), including background, general principles, and practices. It also describes the relationship between SRE and DevOps. The content in this course will help prepare you for the Google “Professional Cloud DevOps Engineer” certification exam.
If you have any comments or feedback, feel free to reach out to us at firstname.lastname@example.org.
- Learn about Site Reliability Engineering (SRE)
- Understand its core vocabulary, principles, and practices
- Discover how to use SRE to implement DevOps principles
- Anyone interested in learning about Site Reliability Engineering and its fundamentals
- DevOps practitioners who want to understand the role of Site Reliability Engineer
- Engineers interested in obtaining the Google “Professional Cloud DevOps Engineer” certification
- A basic understanding of DevOps
- A basic understanding of the software development life cycle
As we have seen, SRE and DevOps complement each other quite nicely. Although they are different, they both share the same underlying goals. DevOps, which is the broader of the two, seeks to one, reduce organizational silos. Two, accept failure as normal. Three, implement gradual changes. Four, leverage tooling and automation. And five, measure everything.
Site reliability engineering provides a specific implementation for achieving these same goals by one, the SRE role, which shares responsibility of production with developers. Two, blameless postmortems to learn from mistakes and to avoid a culture of fear and blame. Three, error budgets to balance growth with stability. Four, to identify and reduce toil via automation. And five, tracking SLIs against defined SLOs and SLAs.
Remember, an SLI is a measurement of how your system is performing, an SLO is an internal goal, and an SLA is a guarantee to customers. At this point, you should now have a basic understanding of site reliability engineering principles. If you are interested in learning more, I encourage you to check out the following resources.
First, we offer a whole learning path called Site Reliability Engineering Foundation Learning Path. It provides more in-depth knowledge about SRE and covers a wider range of topics. This is a great resource for deeper understanding. Also, there are a number of excellent SRE resources available from google at sre.google. And if you're planning to take the Professional Cloud DevOps Engineer Exam, I highly recommend three resources in particular. First is the "Site Reliability Engineering" book. Second is "The Site Reliability Workbook", and finally, the playlist of short videos that helps explain core SRE concepts.
Well, that's all I have for you today. Remember to give this course a rating and if you have any questions or comments, please let us know. Thanks for watching and make sure to check out our many other courses on Cloud Academy.
Daniel began his career as a Software Engineer, focusing mostly on web and mobile development. After twenty years of dealing with insufficient training and fragmented documentation, he decided to use his extensive experience to help the next generation of engineers.
Daniel has spent his most recent years designing and running technical classes for both Amazon and Microsoft. Today at Cloud Academy, he is working on building out an extensive Google Cloud training library.
When he isn’t working or tinkering in his home lab, Daniel enjoys BBQing, target shooting, and watching classic movies.