Monitoring and compliance
The AWS Certified SysOps Administrator (associate) certification requires its candidates to be comfortable deploying and managing full production operations on AWS. The certification demands familiarity with the whole range of Amazon cloud services, and the ability to choose from among them the most appropriate and cost-effective combination that best fits a given project.
In this exclusive Cloud Academy course, IT Solutions Specialist Eric Magalhães will guide you through an imaginary but realistic scenario that closely reflects many real-world pressures and demands. You'll learn to leverage Amazon's elasticity to effectively and reliably respond to a quickly changing business environment.
The SysOps certification is built on solid competence in pretty much all AWS services. Therefore, before attempting the SysOps exam, you should make sure that, besides this course, you have also worked through the material covered by our three AWS Solutions Architect Associate level courses.
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
Hello and welcome to lecture number six. In this lecture, we will continue configuring our elastic infrastructure. But before that, we will troubleshoot our current setup. Then we will configure alarms and notification. Do you know Werner Vogels? He is AWS's CTO, and once he said that everything fails all the time. And it's true. In this lecture, I will start with a simple situation, where a problem was not felt by the users but it was there, and we noticed too late.
Imagine you are a Cloud Motors systems engineer and you are here at the portal. Everything looks fine. The orders are fine. No performance issues. Everything seems to be working and you are happy. But when you go to the EC2 dashboard and you have a look at the instances that you now have running, you feel great because ASG is working. But hey, something is very wrong. There are too many terminated instances and the state transition reason doesn't help you understand why. You go on ELB and you notice that there are only two instances currently in service, the two instances that you have created without auto scaling. It says that they are unhealthy, so you go to auto scaling to try and find out more clues about what's going on.
All the numbers seem fine and the instances are described as healthy. So from an EC2 point of view, they are okay.
You could detach them, but that wouldn't solve the problem. So you go to the instances themselves to see if there is a problem with the user name, because we now know that the problem is somehow related to the web servers. ELB can't access the /welcome/index page. So you check the user data. Maybe there is none or it's wrong, but it, too, looks just fine, identical with the two working instances. So you have an idea what's going on. You don't know for sure how much time the instances are taking to become operational. So you go on cloud formation and notice that the first rail server took around four minutes to be operational, and remember something about the grace time period that we specified in our last lecture.
It is set for 300 seconds. That should be enough. But since we changed the size of the machine and the size of the disk in our launch configuration, you decide to change it, because with this setup, maybe the instance is taking more time to be operational. Now that we changed the grace time, we need to watch the results. It happens that auto scaling is just launching a new instance. Let's name it to keep an eye on it. Since we need time to see the results of the change we've made on the grace period time, I will stop the recording and start again when I see the results.
Right. Let me refresh the page. And everything is fine now. Our instances are now registered on the ELB, and ASG has stopped terminating instances. Now to avoid problems like this where we only notice a problem when it's too late, and we already have spent a few bucks for no purpose, I will set up some notifications to keep me updated on what's going on. SNS is the notification system that AWS uses.
To get started, we need to create topics. I will create one with only me as the subscriber, but we can change it later. I will deselect to receive notifications when an instance launches, because I don't think it's necessary. Now I can save, and the notifications are set. Let me show you how it looks.
On SNS, we can see the topic that I created here, and we can manage it. A thing to consider is that you have to confirm the subscribers when we set an email on a topic. They will receive an email to confirm the subscription and only after that, notifications will start to come. Besides notifications, we need to define alarms that will increase and decrease our auto scaling group as the traffic grows or shrinks. I will define simple ones, just to show you how it's done. I will set first an alarm when the average CPU utilization is below 10%.
I will also send notifications to the topic that we created. Notice that we can also see a graphic with the average utilization of the instances on the auto scaling group. Now that we created an alarm, we need to define an action to be performed when the state of the alarm changes to alarm. We can also define a time to wait to execute another action of the same policy. I will repeat the same process, but change the alarm to define an increase group policy. There is no recipe to define these alarms. You need to know your application to define them well. The best way to proceed is to set the alarms, set the policies and test them as you go, because you can modify them later. If we go to Cloud Watch, we can take a look at how an alarm looks. Alarms are very simple. After you define them, there are three possible states. Okay, alarm and insufficient data. You could change the state via the AWS command line or force the instance to reach the state you want.
Let me connect to one of the instances that we are using, and run a process to raise the CPU usage. It's a simple tool called stress, but you need to use the EPEL repo to get it on an Amazon Linux instance. Now that we're set, you could just run the tool and force the instance to trigger the high CPU utilization alarm that we described earlier. This can be helpful to you. See you at the next lecture.
About the Author
Eric Magalhães has a strong background as a Systems Engineer for both Windows and Linux systems and, currently, work as a DevOps Consultant for Embratel. Lazy by nature, he is passionate about automation and anything that can make his job painless, thus his interest in topics like coding, configuration management, containers, CI/CD and cloud computing went from a hobby to an obsession. Currently, he holds multiple AWS certifications and, as a DevOps Consultant, helps clients to understand and implement the DevOps culture in their environments, besides that, he play a key role in the company developing pieces of automation using tools such as Ansible, Chef, Packer, Jenkins and Docker.