Monitoring & Alerting
This course shows you how to monitor your operations on GCP. It starts with monitoring dashboards. You'll learn what they are and how to create, list, view, and filter them. You'll also see how to create a custom dashboard right in the GCP console.
The course then moves on to monitoring and alerting, where you'll learn about SLI-based alerting policies and third-party integrations. You'll also learn about SLO monitoring and alerting, along with integrating GCP monitoring with products like Grafana. We’ll wrap things up by touching on SIEM tools that are used to analyze audit and flow logs.
This course contains a handful of demos that give you a practical look at how to apply these monitoring techniques on the GCP platform. If you have any feedback relating to this course, feel free to reach out to us at firstname.lastname@example.org.
- Create, list, view, and filter dashboards
- Configure notifications, including through third-party channels
- Learn about SLI- and SLO-based alerting and monitoring
- Integrate GCP operations monitoring with Grafana
- Analyze logs with SIEM tools
This course is intended for anyone who wishes to learn how to manage GCP Operations monitoring.
To get the most out of this course, you should already have some experience with Google Cloud Platform.
Hello, and welcome to SLI-Based Alerting Policies.
Organizations use GCP Cloud Monitoring to collect metrics on the performance of their service infrastructure. For example, an organization might use “Request Count” to track the number of HTTP requests per minute for a website or web app that result in 2xx or 5xx responses. If only interested in information on the latency for HTTP 2xx responses, an organization might want to monitor “Response latencies”.
Such performance metrics are automatically identified based on a set of known service types. These service types include App Engine, Istio on Google Kubernetes Engine, and Cloud Endpoints. It’s also possible to define custom service types and to select specific performance metrics for them.
Whether you are interested in the known service types or custom service types, the performance metrics that you track are the basis of the SLIs for your service. An SLI, or service level indicator, refers to the performance of a specific part of your service – and because the performance metrics are automatically identified for services on App Engine, Istio on Google Kubernetes Engine, and Cloud Endpoints, useful SLIs are already known for those services. For example, let’s assume you support a service that requires request-count or response-latencies metrics. In such a case, standard SLIs can be derived from those metrics. You could derive an availability SLI by comparing the number of successful responses to the total number of all responses. A latency SLI could be derived by comparing the number of calls below a certain latency threshold to the total number of all calls.
Service-specific SLIs can also be established when you need to track some other measure of what “good performance” means. Such service-specific SLIs will usually fall into one of two different categories: Request-Based SLIs and Windows-Based SLIs.
Request-based SLIs are used when “good service” is measured by “units of service”. For example, a request-based SLI might measure the number of successful HTTP requests for a website.
Windows-based SLIs, however, are used when “good service” is measured by counting the number of time periods where performance meets a specific criteria. An example of a Windows-based SLI would be something like response latency that’s below a specified threshold.
Before you create an alerting policy, you need to decide what you want to monitor, when the alerting policy should be triggered, and how you want to be notified.
The table on your screen shows what settings need to be defined when creating an alerting policy.
- Title: The name and a brief description of the alerting policy
- Summary: A brief description of the alerting policy
- Target pane fields: Specify what is being monitored and how the data is aggregated
- Configuration fields: Specify when the alerting policy triggers
Notice we have a title, a summary, the target pane fields, and the configuration fields. The title is pretty self explanatory. It is the name and a brief description of the alerting policy being created. The summary is a brief description of the alerting policy. The target pane fields are used to specify what you want to monitor and how the data is aggregated, while the configuration fields are used to specify when the alerting policy triggers.
Alerting policies can also be managed programmatically via the API. However, this feature is currently in beta at the time of this recording, so I don’t want to get into too much detail. To ensure you have the latest information on this topic, visit the URL that you see on your screen.
Join me in the next lesson, where I’ll show you how to create an alerting policy.
Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.
In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.
In his spare time, Tom enjoys camping, fishing, and playing poker.