The course is part of these learning paths
CloudWatch is a monitoring service for cloud resources in the applications you run on Amazon Web Services. CloudWatch can collect metrics, set and manage alarms, and automatically react to changes in your AWS resources. Amazon Web Services Cloudwatch can monitor AWS resources such as Amazon EC2 instances, DynamoDB tables, and Amazon RDS DB instances. You can also create custom metrics generated by your applications and services and any log files your applications generate. You’ll see how we can use Amazon CloudWatch to gain system-wide visibility into resource utilization, application performance and operationally you’ll use these insights to keep applications running smoothly. This course includes a high-level overview of how to monitor EC2, monitor other Amazon resources, monitor custom metrics, monitor and store logs, set alarms, graph and view statistics, and how to monitor and react to resource changes.
- Systems Admins
- Operational Support
- Solution Architects working on AWS Certification
- Anyone concerned about monitoring data or AWS recurring billing
- AWS Console Login
- General knowledge of how to launch an Elastic Compute Cloud (EC2) instance on either Linux or Windows
- View CloudWatch Documentation at https://aws.amazon.com/cloudwatch/
- An operational EC2 (Windows/Linux)
- Monitor EC2 and other AWS resources
- Build custom metrics
- Monitor and store log information from Linux instances
- Set alarms for metrics to take action on an instance or auto-scaling group
- Create a dashboard to monitor EC2 instances
- React to load to trigger auto scaling horizontally within AWS.
This Course Includes:
- Over 90 minutes of high-definition video
- Console demos
What You'll Learn:
- Course Intro: What to expect from this course
- Getting Started: How to launch an EC2 instance
- Building a Dashboard: How to take the metrics from the instance and create a dashboard
- Monitoring EC2 Instances: How and why you should be monitoring the environment in Amazon Web Services
- Sending Log Files to Cloudwatch: A lesson on the importance of sending log files to Cloudwatch
- Alarms: How to specify alarms
- Course Conclusion: Course summary
Before we get started on our next lesson, I thought it would be a good idea to review our progress and summarize what we've done in the prior lessons. We launched an EC2 instance and we enabled enhanced monitoring. We then discovered that we couldn't monitor our disk or memory information about this instance and so we added the capabilities to do that. We also configured credentials in an IAM policy and discussed why that might be more beneficial than creating a user and trying to administer secret access keys. After setting that up, we configured CloudWatch to push metrics from an EC2 instance.
We built a dashboard and we monitored our new server. In addition, we looked at ways to make our life more convenient including pushing the log files to CloudWatch so that we eliminated the need to log into the server locally to conduct those activities. So now, we have our EC2 instance set up. We're pretty comfortable with the metrics that we're collecting and the log files that we're collecting, and we're enjoying the convenience of being able to go into CloudWatch and fully monitor that server. Perhaps in your own time, you took a minute to create some automation and even created a launch configuration that will enable CloudWatch and CloudWatch log monitoring.
All those are good things but the challenge that we've created for ourselves is that, while that information is available in the CloudWatch console, we have to go look at it. All of our metrics have to be watched, and there are no actions occurring as a result of the data collected. Let's talk about alarms in lesson six.
My name is Michael Bryant and I'll be your instructor. Alarms are essentially thresholds that we specify. They set the upper or lower limit. I like to call this the tolerance. Alarms can be associated with a simple action or a complex series of actions. For example, if your CPU utilization were to exceed a threshold or an upper limit of 85% it would be convenient if an auto-scaling group took that alarm and launched an additional server, perhaps into your web tier to cover any additional load at this time. Or in a less complex case, perhaps you wanna set an alarm so that if your web server becomes unavailable for any reason you get a notification. One of the things that we can do is set up a simple health check on our web server that we've created that's now running Apache.
One of the things that we can do is establish a health check. We'll create a small file on the web server that we have running and we'll repeatedly check that file on a regular interval to make sure that it's accessible via port 80 or HTTP. I opened up the Amazon console and navigated myself to the EC2 instances console screen. Before I get started with establishing a health check, I need to assure that the security group will allow an inbound connection on port 80, otherwise our health check will verifiably fail. When I click on this, I'll find that I only have port 22 open. In my case, I'll need to add port 80 for this health check to work.
To do that you can click on the security group and then in the tabs below, click edit. We'll add a rule and we'll just add HTTP. For this example, you can leave it on anywhere. And click save. This will automatically apply and you do not need to restart the server or perform any other action for this to work. You can click instances now. I'm gonna open up a terminal window where I've already SSH to the server. In this case, I'm gonna switch directories and create an HTML file at the root. I'm going to call it healthcheck.html.
In this file I'm gonna make a simple HTML file. All right. Now we have probably the most simple health check one could perform. What you'll want to do now is copy the public DNS information and then we'll append it with the file we just created. You should see that we have health check success. This healthcheck.html file that we've created is going to be the basis of how our alarm is going to check to make sure that our Apache web server is serving traffic. What's going to happen is as often as we dictate, CloudWatch is going to check this file and ensure that it is being served to the public over port 80. If at any it's not, the alarm will enter an alarm state. This health check alarm is actually set up via Route 53 and not via CloudWatch. I'm now back at the EC2 console and I'll click on the three-sided box. Select Route 53. Select health checks. Select create health check. And we'll title this CloudWatch. We'll need the IP address of the server, which in our case is 220.127.116.11. Our host name, we'll use the DNS entry. We do want to be on port 80, and the path will be healthcheck.html. So what we've done is we've configured Route 53 to have a health check and it's monitoring an end point. It'll be monitoring this end point by an IP address, over HTTP, on our IP address of 18.104.22.168.
The host name I'm using is provided by Amazon by default. The port will be 80 and the path is, as you can see, Route/healthcheck.html. We previously verified that this is the correct location of the file that we created in the file system in the terminal session. We'll go back to Route 53 now and we'll select next. At this point, we'll just create the health check and take a look at it. You'll see that our CloudWatch status says unknown, but this actually represents a history.
We'll take a look at our health checkers, and we'll see that in each of these regions it's sending traffic to our healthcheck.html file and we're receiving a status code of 200 which shows that the file is okay and our health check is proceeding normally. We have now established that this server is definitely serving traffic including the healthcheck.html file. It turns out, in this use case, this server is very important to our production needs. I need to know anytime this server isn't sending traffic.
Let's create an alarm to notify us 24 hours a day of when that event might occur. You click on the alarms tab and then create an alarm. Let's call this alarm name: server down. The alarm description is not necessary. We definitely wanna send a notification and we'll create a new SNS topic. Simple notification service is already integrated into CloudWatch and it provides you the ability to send emails or other notifications to a select group of people. We'll call the topic: server down, and we'll send it to an email address. Okay. The alarm target is health check status and the fulfilled condition is set to minimum less than one. This means that if there's any less than 1.0 of the health check status for at least one minute, than an alarm will be sent. It's important to know that when you have a new SNS topic created, Amazon is going to send you an email and ask you to subscribe to these notifications. If you do not accept the subscription you will not receive future notices. We'll click confirm. You'll now notice that our CloudWatch alarm says: insufficient. This is an expected result and that's because in our CloudWatch alarm we specified that for period of one minute it must have a perfect health check or it's going to send an alarm.
In this case, perfect is 1.0 out of 1.0. It's just the scale that's being used. After one minute, this state will change to healthy and then in the future if we interrupt the healthcheck.html file, for example if we change the security group and blocked port 80 or we deleted the health check file from the server, or even if we shut the server down, it will go into an alarm state and send us an alert that let's us know that this server is not serving traffic in our production environment. What's interesting about this is that Route 53 in health check alarms are not configured via the CloudWatch console.
But let's go back over there now that we've created an alarm. We'll click on the three-sided box and then we'll click on CloudWatch. Select create alarm. I want to know anytime this server is at risk of running out of space. As you can see we have Linux system metrics which are some of the custom metrics we created in an earlier lesson. Select file system instance ID path. S-dated I wanna know anytime the Route volume is about to run out of space. So we'll select disk space used. As you can see we're using a very small amount of the actual disk. Let's go ahead and click next. I'll title this alarm: disk space running out. And I want to know whenever the disk space used is greater than or equal to 80% of the disk for one consecutive period. You'll need to look to the right here to find what the period is. It could be that you want to change this.
The available options include: one minute, five minutes, 15 minutes, one hour, six hours or one day. You can also select whether you want to average these, a minimum, a maximum, a sum, or using some data samples. Be careful with sum because it adds up the data points over the period and may actually throw unnecessary alarms. If for example your disk space was at 25%, at each period that its sampled it will sum the prior period. So eventually in this type of alarm, you will eventually reach the threshold even though the disk isn't out of space. I would recommend using average for this type of alarm.
We now want to see what happens with missing data. In this case, we'll just treat missing data as just missing. You may have other use cases where you would want to say missing data with, not with breach the threshold and go ahead and send you an alarm. In that case, you would select bad. There's more information about this in the info box and you can click the learn more to find out how these different missing data conditions would be treated.
The next thing we want to do is, whenever this blue line goes over this red line, we want to know about it. And that is known as a state of alarm. So what we're going to do is we're going to send a notification to a list. We'll create a new list. We're gonna start by giving it a name. Server managers. And we'll add an email address to that. And with that established we'll create the alarm. Remember that you're going to get a confirmation link set to the address that you specify. You'll have 72 hours to confirm these and if you don't confirm it, what will happen is, you will not receive the updates nor the alarms. In this case, I'll click, I'll do it later. As you can see, the alarm state now is showing that it's okay. You can click to disregard the message that your alarm disk space running out has been saved.
While that's collecting data, let's go ahead and create ... While that's collecting data, let's create another alarm. I want to know anytime this instance has a CPU utilization greater than 80%. We'll follow the exact same process. We'll name our alarm: CPU utilization too high. I wanna know anytime it's greater than or equal than 75% this time. And you can see on the graph at the right that our current CPU utilization is very low. In fact, it's about zero. Okay? So we'll now set this alarm for one consecutive period of five minutes and then I want to send that notification to server managers. This is the same list we just created but now it's available in a drop-down. Let's click create an alarm. As you can see, the alarm state is insufficient data. It's gonna check that for five minutes, and then it'll report in okay.
One thing we just did is we created an SNS topic called server managers. If you wanted to add additional people to be notified at three in the morning that the server utilization or the disk space running out is too high, the way to do that is to click on the three-sided box and then you'll find SNS. If you're having difficulty finding SNS in the menu structure, you can always type SNS in the search box and click on simple notification service. Let's look at the topics that are available. Currently, in our Northern California region I have the topic we created called server managers. We can click on the server managers ARN and then create a subscription. If there are additional people you wanna wake up in the middle of the night, you can simply add their email and we can use the example here.
Now, anytime a CloudWatch alarm triggers with a notification to server manager, all of the subscribed email addresses in this list will be notified simultaneous. We'll now head back over to CloudWatch. And we'll look at our alarms. If you need more alarms, you'll simply create an alarm for the metrics that you require and you'll complete this process over and over until you have all the metrics and all the alarms covered that you need for your production environment. At this point, I'm pretty happy with how we have our server set up.
We launched an EC2 instance that t2 micro way back in lesson 1 and then we enabled enhanced monitoring. In a later lesson we then added disk and memory information by establishing an IAM role, configuring credentials for it, and then installing the helper scripts necessary for our instance to push information through the API to CloudWatch. Along the way, we built a dashboard, we monitored our new server and we created alarms for DNS availability in Route 53 and also some of the metrics we wanted to monitor in CloudWatch. In addition, we used simple notification service or SNS to push messages to wake people up or bring at their attention two problems in our environment.
About the Author
Network engineer and program analyst.