The course is part of these learning paths
Application Monitoring and Alerting
Platform Monitoring and Alerting
In this lesson, we will discuss how to set up actionable and relevant alerts. You will learn how setting a large number of irrelevant alerts will train your team to ignore all alerts. We will teach you about Azure Monitor, which handles system level alerts such as low disk space or high CPU load
We will configure notifications to fire when specific metric thresholds are met from the monitoring dashboard. You will see the importance of creating truly useful and meaningful alerts. You will need to think about how your system acts, and if the settings are correct for it. We will set up Azure Monitor to be sure the alerts are being received by the right people. Finally, we will also discuss how to establish alerts based on queries run by Azure Log Analytics.
A key principle with alerting is ensuring that alerts are actionable and relevant. If you set up a large number of irrelevant alerts that fire constantly, you will train your team to ignore all alerts. Maintaining a good signal to noise ratio is critical with alerts that may require human intervention. It is similarly important that you try to make sure all alerts are actionable. This is not always possible - sometimes you may have to configure an alert for a situation that is not easily resolved. To the extent possible, you want to try to have documented responses to alerts so that the majority of your alerts are actionable.
For system level alerts - things like low disk space or high cpu load - we can use Azure Monitor. In the Azure Monitor portal we can configure notifications to fire when specific metric thresholds are met. From the monitoring dashboard click on the ‘Add metric alert’ button and create a name for your alert. You will then pick a relevant metric and set a condition such as ‘greater than’ or ‘less than’ some specific value.
You also will have to set a time period for the condition such as ‘5 minutes’. This would mean that the alert condition must be sustained for five minutes in order for the alert to trigger. It is crucial to think very carefully about the time period parameter. Configuring it incorrectly can lead to false positives or false negatives. For example, let’s say we set an alert for CPU load. We set it to alert whenever load is above 2.0 for more than 3 minutes. This may be too sensitive, as perhaps your application regularly has short periods of high load that are expected. In such case the alert would just end up being noise and training your response team to not take alerts seriously.
For Azure metric alerts be sure to add the right email address under the ‘Additional administrator email(s)’ section. This way you can ensure that the right people are notified when an alert is triggered.
Application Insights alerts are similarly easy to set up for your application performance metrics. From the Insights dashboard just click on ‘Alerts’ and then ‘Add alert’ to get the setup menu. From there you will define the alert rules in much the same way as we did with system metrics. You will pick a metric, a time period, a threshold, and a condition. You can set contact emails or a webhook address if you wish to integrate your Insights alerts with another system.
Finally it is possible to set up alerts based on queries run by Azure Log Analytics. From the Analytics dashboard you can create the alert rules by defining a time window and query. If the query returns the expected result within the time window, and alert can be generated. This can be used for example to catch serious ERROR messages that were output to an event log within some time frame.
And that about wraps it up for alerting using Azure Monitor and Application Insights. As we have seen, we can create a variety of alerts based on system metrics and application performance stats. Keep in mind the importance of minimizing noise with your alerts - make sure that when an alert is triggered it really means something. With this in mind, let’s move on to the next major section of course.
About the Author
Jonathan Bethune is a senior technical consultant working with several companies including TopTal, BCG, and Instaclustr. He is an experienced devops specialist, data engineer, and software developer. Jonathan has spent years mastering the art of system automation with a variety of different cloud providers and tools. Before he became an engineer, Jonathan was a musician and teacher in New York City. Jonathan is based in Tokyo where he continues to work in technology and write for various publications in his free time.