This course covers the Design a management, monitoring, and business continuity strategy part of the 70-534 exam, which is worth 20–25% of the exam. The intent of the course is to help fill in an knowledge gaps that you might have, and help to prepare you for the exam.
Welcome back. In this lesson, we're gonna be talking about monitoring strategies.
This is an important topic. As engineers, monitoring is crucial because it's not enough to create and deploy systems, we need to know how those systems are actually performing. Knowing this will allow us to make sure that our infrastructure and applications are performing correctly.
If we identify that we have a greater than average amount of HTTP 500 errors, then we can help the developers to find the problem and get it quickly resolved. Or, if we can identify that the disk space on some of our servers are running low, we'll be able to resolve that before it becomes an issue.
So monitoring can be both reactive and proactive, and the more in-depth the monitoring, then ideally the more proactive we can be. We've talked about SCOM previously. It's a great way to get all of our monitoring into a single location. If you're already using SCOM for some of your on-prem stuff, then extending it to use with Azure makes a lot of sense.
It'll allow you to have a single source of truth for your monitoring data. However, SCOM isn't the only way to monitor your Azure VMs. Azure also has built-in monitoring that provides us with the ability to monitor things such as memory usage, processes, CPU, among others.
Okay, let's start by creating a virtual machine. I'm gonna select the Windows 2012 datacenter version, and you can see that we have several here on this list to select from, and a lot of other options that we could select from back in the marketplace.
We're gonna give ours a name of demo. I know, it's very original. And then we're gonna set up the username and password. And then I'm gonna give it a resource group, unimaginatively named resource group. Okay, and then I'm gonna submit this.
And now we can see that we can select the VM size. So I'll select that, and we're gonna submit it, and on the next blade, we're gonna have some more settings. We're gonna leave these as the default. Notice at the bottom, monitoring is enabled by default. So we'll click OK, and then once more. Great, so now the VM is gonna start up.
Let's check out the monitoring. So we're gonna select monitoring, and we'll pick our VM from the resource list. And now you can see that we have several metrics that are enabled by default. If we select a few, let's say some of the TCP info, then we get this chart here.
Now, if we need to monitor specific things, like ASP.NET apps, or SQL servers, then we can add in these metrics under the diagnostic settings. So I'm just gonna check these boxes. Notice you can also change the logging level for the Windows services, which is very useful. And okay, I'm gonna save this. Now, this is gonna take a few moments to take effect, so I'm gonna jump ahead.
Okay, we're back. I've reloaded the page. This took about five minutes, and now if we filter, we're gonna set the resources to show only our demo VM, and notice that we have a few more options of, for metrics in the list here. We have some of these ASP.NET metrics, which don't have any data because this is a new VM, and we also have SQL server metrics.
Again, we don't have any data here, but you can tell that we have quite a few metrics here to choose from. So built directly into Azure is a pretty robust monitoring solution. Alright, the next tool I want to talk about is the Global Service Monitor.
Now, this is a cool product conceptually, though I'm not really sure anyone uses it. There are much easier to use uptime monitoring solutions out there, though it's part of the objectives for the exam, so we're gonna talk about it just a bit. Global Service Monitor requires SCOM 2012 or greater.
So that may be a blocker if you're not already using SCOM. Global Service Monitor, often abbreviated GSM, is a tool that will allow you to configure web requests to be run from locations around the world against your web apps. This is going to enable testing availability and latency for requests. Now, there are some limitations to GSM. Namely, the total number of requests is equal to the number of tests multiplied by the number of locations.
You can't exceed 25 tests per subscription. The minimum interval between tests needs to be greater or equal to five minutes, and the global test timeout is 30 seconds. So if you can work within these constraints, you'll have access to a tool that will help you to monitor your web apps and help you to gain insights into the latency of the requests.
The next monitoring option that we're gonna talk about is OMS, which stands for Operations Management Suite. This is a product that offers things like log analytics for both on-prem and cloud-based VMs. It allows IT automation, backup and recovery tools, and security and compliance tools. In a lot of ways, this is SCOM online.
Let's check out how to use the log analytics. Remember that demo VM that we set up earlier? Let's start looking into the logs.
Let's start by searching for log analytics and we're gonna select it from the list. Now we have a few different options to choose from, however, the one at the top is the one we're looking for. Once we select it we can see that we get a new blade with some info about log analytics.
It even offers some additional information to help learn more about it. We're gonna select Create at the bottom, and it's gonna kick off the setup form. We need to set up an OMS workspace, or optionally we can select an existing one.
I'm gonna give it a name. I'm gonna pick one of my existing resource groups here. And once we're happy, we're gonna click OK. So it's gonna start the deployment. Now this can take a little while, so again, I'm gonna skip ahead and we'll come back once it's complete.
Okay, welcome back. Now, we need to link a VM to OMS so it can start consuming the logs. So we're gonna scroll down to the data sources, and click the VMs, and we have just the one VM. We're gonna select it. And then in the next blade we can select the Connect option.
This is gonna take a little while, too. Now, while it's doing that, let's look at some of the other options that we have for data sources. Notice we have the option to pull data from Azure storage.
So we first need to add a storage option. We'll select one of the options here that exists already. And then it wants to know what type of data this is. It's either IIS logs, events, syslogs, et cetera.
Now, I'm not gonna complete the process, however, it's worth knowing that this is an option. Okay, let's check back in on our VM and see if it's completed the connection process. Okay, perfect, what timing. So now if we open up the OMS portal, we can see that we have some data.
We'll start by selecting Log Search. And once that loads, we'll select all collected data. We can see that we have some events here. Most of them are heartbeats, based on the info on the left hand side of the screen where it shows the different types.
We can change the view to a table or list. We can filter on type or time frame, in addition to the search tools, and we can also create our own dashboards if we want to see certain metrics at a glance. And if we want to extend this even more, we can add solutions from the solutions gallery. Let's check out a security and audit option.
The first time we launch this solution, it'll ask us to enable some alerts. I'm gonna say yes to these. And while I don't have any useful data at the moment, the takeaway should be that we have the ability to add new solutions very easily to help you monitor and report in very useful ways.
Okay, the last monitoring tool I want to talk about is Application Insights. If you're running a web app or service, I'm not sure that you can ignore this product.
Application Insights gives us a very easy way to understand how our apps are performing. It gives us a query syntax, so we can analyze the data and find problems in real time. It also uses machine learning to proactively identify potential issues, which is fantastic.
Because any time we can find issues before users do, we're doing our jobs well. Let's take a look at how we can query data. I'm gonna be using the demo data so that we have a good data set to look through.
Let's start by looking at the requests data set, and we'll take just 10 results. Alright, so you can see it's already easy to get this started. Now, let's take the results for the last day, and to do that we're gonna use the built-in ago method, and when we run this, we can see that we get data back very quickly.
The query syntax is pretty easy to learn, and it also gives us the ability to generate charts very easily. I'm gonna paste in this query that I already have copied, and let's check this out. I'm not gonna dive into the query syntax, but check out the last part where it says render time chart.
So these two words are all that's needed to turn our data into a chart. With almost no code, we were able to fetch useful data and even visualize it. This is a very useful tool and one that will make your job much easier. So this is a small portion of what Application Insights offers. I recommend that you check it out further.
Alright, let's wrap up. We all know monitoring is important, and Microsoft provides us with quite a few tools to use. In our next lesson, we're gonna take a look at patching strategies. So if you're ready, let's get started.
About the Author
Ben Lambert is the Director of Engineering and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps.
When he’s not building the first platform to run and measure enterprise transformation initiatives at Cloud Academy, he’s hiking, camping, or creating video games.