Monitoring & Alerting
This course shows you how to monitor your operations on GCP. It starts with monitoring dashboards. You'll learn what they are and how to create, list, view, and filter them. You'll also see how to create a custom dashboard right in the GCP console.
The course then moves on to monitoring and alerting, where you'll learn about SLI-based alerting policies and third-party integrations. You'll also learn about SLO monitoring and alerting, along with integrating GCP monitoring with products like Grafana. We’ll wrap things up by touching on SIEM tools that are used to analyze audit and flow logs.
This course contains a handful of demos that give you a practical look at how to apply these monitoring techniques on the GCP platform. If you have any feedback relating to this course, feel free to reach out to us at email@example.com.
- Create, list, view, and filter dashboards
- Configure notifications, including through third-party channels
- Learn about SLI- and SLO-based alerting and monitoring
- Integrate GCP operations monitoring with Grafana
- Analyze logs with SIEM tools
This course is intended for anyone who wishes to learn how to manage GCP Operations monitoring.
To get the most out of this course, you should already have some experience with Google Cloud Platform.
Hello and welcome back, in this demonstration here, we are going to walk through the process of creating a basic alerting policy through the Google Cloud Platform cloud Console. On the screen here, we can see I'm logged into my monitoring dashboard and I'm logged in with my admin account.
To create an alerting policy. What we need to do here is browse down to alerting here in the left hand pain. And then from the alerting summary page here, what I need to do is simply click create policy. Now in this create alerting policy screen, we need to give our policy a name, we need to specify some of the conditions for the policy and then optionally, we can configure our notification channels and any kind of documentation.
So what we'll do here is we'll call this alerting policy, my alerting policy, something really original I'm sure. And then what we're going to do is specify the conditions our policy. You can see here, we can specify metrics, we can do an uptime check. We can monitor and alert for SLO burn rate or process health.
What we're going to do here is doing alerting policy for metrics. Now we can give our condition here a name, but what we're going to do is specify the target information and then accept the default name it gives it. And you'll see here in a second what happens. So what we'll do here is we'll create an alerting policy for CPU usage.
Now to do that in this target field here, we need to specify both a resource type and the metric that we're going to alert on. If we click in the box here, we see two sections here, a resource type section, and a metric section. What we'll do here is select VM instance as the resource type that we're going to create this for. And then what we'll do is we'll browse down for CPU utilization.
Now we can see, we already have some data in a chart for this particular metric, and we can also see that the name was filled in automatically. And we'll leave this default name here. Now we could filter our policy here to remove any kind of noise and we can group our data, but we're not interested in doing anything special here. I'm just walking you through the basic process of creating a pretty vanilla policy.
Now, if we hover over aggregator here, we can see that the aggregator describes how we wanna aggregate the data points across the multiple time series within our data. If we select the drop down, we can see all the different options for aggregation, standard deviation, the 99th percentile, 95th, 50th, min-max, some even the mean. we're not going to do any aggregation here, so we'll just leave this set to none.
If we select advanced options here, we have a couple of different options. We have the aligner, we have the alignment period, and even the secondary aggregator along with a legend template. And if we hover over these icons here, these little icons, tell us a little bit more about each of these options.
For example, here, the aligner, it tells us, describes how to bring the data points that we're collecting or alerting on in each of the individual time series into equal periods of time. We hover over the alignment period. We can see that this determines the time interval for which the aggregation takes place.
So each one of these you can hover over and get information on to get more of an explanation of what happens here. We don't need any of our advanced options here for this demonstration, so we'll leave that alone. And then down here we have the configuration of the actual learning policy.
What we can do here with this dropdown is configure which conditions will trigger this actual alerting policy. The default here is anytime series violates. Now with this specific condition trigger means this any time series violates, is that any time the condition as it's configured now is above, let's say 1% for more than one minute, anytime that happens, it will generate an alerts.
If we select the drop down here, we can see a percent of time series, a specific number of times that occurs. And of course we have if all time series violate, so we'll leave this at anytime it violates, it will throw the alert. At this point, we can add our condition here. So now anytime the CPU utilization is above 1% for greater than one minute, we're going to get an alert.
Now this notifications panel here, the channel for notifications, we can add this in optionally. If we select the dropdown here, we can see, we can add a campfire channel, Google cloud console mobile, email, We can even use third party integrations, such as PagerDuty and Slack. We can also use SMS and webhook. We don't have any of these set up right now, because as I mentioned previously, you need to set these notification channels up first, before you create your learning policy.
I don't have any of these set up, but email is, so we'll select email. And then what I would do here is add my email address to this notification channel to receive the notifications at my email address. And we'll add it in, and I'm not worried about any kind of documentation here, so we'll go ahead and save it. And there you have it. We now have an alerting policy called MyAlerting policy that we'll look for CPU utilization over 1% for greater than one minute.
When it sees that occur, it's going to send an email to my notification channel since that's the channel type I've configured, and it's going to send that notification to my Cloud Academy email address. So that is how you create a basic alerting policy using the Google Cloud Platform, Cloud Console.
Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.
In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.
In his spare time, Tom enjoys camping, fishing, and playing poker.