Incident Management - Level 1

DifficultyBeginner
AVG Duration0h
Students5
Content
1

Description

Site Reliability Engineering (SRE) is a discipline that combines software engineering and systems engineering to build and run large-scale, highly available systems. Incident management is an important part of the SRE process, as it involves identifying and responding to problems or outages in the system.

Learning how to conduct Incident Management is important to limiting future disruption in your IT systems and business. Performing incident management will help your own organisation strengthen it's IT operations over time, ensuring that in times of need, the pathway forward to restoring business operations is clear and can be executed in minimal time.

This learning path teaches Incident Management to a Level 1 standard.

Learning Objectives

  • Learn how to conduct Blameless Postmortems and why they are useful to reduce future incidents

Certificate

Your certificate for this learning path

Training Content

1
Course - Intermediate - 5m
Blameless Postmortems
This course provides you with an introduction to Blameless Postmortems. In Site Reliability Engineering (SRE), Blameless Postmortems are a retrospective meeting whose goal is to recap and analyze a significant service failure. It provides an open forum where everyone can ask questions, share thei...
About the Author
Students126297
Labs66
Courses113
Learning paths180

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, Azure, GCP), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, Azure, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).