Managing & Investigating Service Incidents on GCP
The course is part of these learning paths
Managing and investigating service incidents is an important part of the maintenance process. It is a necessity that can be laboring but with the right organization, understanding of the systems, the knowledge of processes, and the discipline to adhere to best practices, it can be optimized. This course will focus on the predominant parts of managing service incidents and utilizing Google Cloud Platform to aid in the endeavor.
Perhaps the most important aspect of managing service incidents is managing the personnel involved. With that comes the need to manage their roles and responsibilities. This course will discuss the strategy for managing such roles and effectively managing the team. Part of managing the team is having a process for turnover of team members; managing the workload of the team, developing and scaling a reporting structure, and maintaining team productivity.
Perhaps the second most important aspect of managing service incidents is establishing effective communication. Constant and effective communication within the team and external to the team is paramount. This is especially true for keeping stakeholders informed.
The course will also discuss tooling to aid in monitoring and incident resolution, specifically Google Cloud Platform’s Stackdriver service. The service makes investigating service incidents easier by giving the response team the information needed.
If you have any feedback relating to this course, please contact us at email@example.com.
- Understand how to handle personnel to aid incident response
- Learn how to manage roles within a team
- Learn how to investigate incidents effectively
This course is suited to anyone wanting to learn about incident handling using Google Cloud Platform.
- An active Google Cloud Platform account with admin permissions in order to administer roles, create test infrastructure, and configure operational tooling
- A good understanding of managing service issues
- Knowledge of issue mitigation practices
- An understanding of logging and monitoring concepts
- High-level knowledge of how roles should interact
Roles are great for granting and limiting permissions but what if there are 20 people on the team doing a few different jobs. Plus people are constantly rolling on and off of the team. This could be a nightmare to manage. Each team member would need to be assigned roles when they come onto the team and removed from roles when leaving the team. Enter groups to save the insanity.
While roles are assigned to people, people are assigned to groups. So, instead of having to assign multiple roles to people, those roles can be assigned to a group and then those people placed in groups. This would allow fewer assignments across the range of personnel.
One thing to keep in mind is that groups operate at the organization level and not at the project level. Let's walk through creating a group and then assigning a person to the group.
I've opened my browser to the IAM page in Google Cloud Platform. I'll select the Group menu button located at the bottom of the site menu to go to the group's page. Since groups operate at the organization level, I'll need to select an organization to continue. I'll create a group named starTeam. Clicking on the Create Group button opens the create group page.
The group must be given a name and an email address. The email address is used to identify the group and the group name is a friendly name. I'll enter starTeam for the group name. And I'll use starTeam again for the group email address, although this could differ from the group name.
The group description is optional but it could help to provide some details around the group. I'll just enter a description of, StarTeam is an advocate for the five points of success. Below the group description, we can add members to the group and assign a group role. At least one member must be added for the group to be created. I'll add myself to the group.
Group roles are member, manager, and owner and range from least privilege to most privilege. I'll add myself as the owner. I'll click Submit to create the group. We are returned to the groups page and can see our new group listed. This group can now be added to IAM and assigned roles. I'll click IAM from the side menu to open the IAM page.
Make sure that the organization is selected in the dropdown at the top. Clicking the Add button brings up the add members panel. I'll enter the email address of the group to the new members field. It's important to note that the group name will not work, so if the group email name differs from the display name then the name entered into the group email field when the group was created must be used.
Clicking the select a role dropdown displays all of the role options available at the organization level. I'll add a role of project viewer. Conditions based on time or resource could also be added but I'll leave this as is. Clicking Save will add the group to the organization's members list and close the add member panel.
Role changes for the group can be made once the group has been added by clicking on the Edit button. The group can also be removed by selecting the checkbox beside the group and then clicking the Remove button at the top. Clicking Confirm will remove the group from the organization.
Cory W. Cordell is an accomplished DevOps Architect, Software Engineer, and author. He started his DevOps career as a DevOps Engineer for a large bank where he helped implement DevOps practices and tooling and establish a DevOps culture.
Cory then accepted a position with a global firm to build a DevOps department. He led a team of DevOps Engineers to establish best practices and train development teams on tooling and those practices. He worked to help development teams migrate their applications to Azure Kubernetes Service and establish pipelines to build, test, and deploy code. Realizing that a substantial gap existed in the toolchain, he developed an application to aid in infrastructure tracking and to provide UI abilities for teams to view application status for their software.
Cory is now enjoying working as a contractor and author.