Defining Groups

Intermediate
5m
284
4.6/5

Managing and investigating service incidents is an important part of the maintenance process. It is a necessity that can be laboring but with the right organization, understanding of the systems, the knowledge of processes, and the discipline to adhere to best practices, it can be optimized. This lesson will focus on the predominant parts of managing service incidents and utilizing Google Cloud Platform to aid in the endeavor.

Perhaps the most important aspect of managing service incidents is managing the personnel involved. With that comes the need to manage their roles and responsibilities. This lesson will discuss the strategy for managing such roles and effectively managing the team. Part of managing the team is having a process for turnover of team members; managing the workload of the team, developing and scaling a reporting structure, and maintaining team productivity.

Perhaps the second most important aspect of managing service incidents is establishing effective communication. Constant and effective communication within the team and external to the team is paramount. This is especially true for keeping stakeholders informed.

The lesson will also discuss tooling to aid in monitoring and incident resolution, specifically Google Cloud Platform’s Stackdriver service. The service makes investigating service incidents easier by giving the response team the information needed.

If you have any feedback relating to this lesson, please contact us at support@cloudacademy.com.

Learning Objectives

  • Understand how to handle personnel to aid incident response
  • Learn how to manage roles within a team
  • Learn how to investigate incidents effectively

Intended Audience

This lesson is suited to anyone wanting to learn about incident handling using Google Cloud Platform.

Prerequisites

  • An active Google Cloud Platform account with admin permissions in order to administer roles, create test infrastructure, and configure operational tooling
  • A good understanding of managing service issues
  • Knowledge of issue mitigation practices
  • An understanding of logging and monitoring concepts
  • High-level knowledge of how roles should interact
About the Author
Students
2,008
Courses
2

Cory W. Cordell is an accomplished DevOps Architect, Software Engineer, and author. He started his DevOps career as a DevOps Engineer for a large bank where he helped implement DevOps practices and tooling and establish a DevOps culture. 

Cory then accepted a position with a global firm to build a DevOps department. He led a team of DevOps Engineers to establish best practices and train development teams on tooling and those practices. He worked to help development teams migrate their applications to Azure Kubernetes Service and establish pipelines to build, test, and deploy code. Realizing that a substantial gap existed in the toolchain, he developed an application to aid in infrastructure tracking and to provide UI abilities for teams to view application status for their software.

Cory is now enjoying working as a contractor and author.

Covered Topics