AWS Auto Scaling Policies
This course explores the various auto scaling policies that exist within AWS. We'll cover what each of the policies do, their strengths and weaknesses, and when best to use them. Understanding the ins and outs of these policies will help you save a lot of money and keep your customers happy by removing latency and downtime.
By the end of this course, you will understand how each of the AWS auto scaling policies works and in what situations they perform best.
This course is recommended for solutions architects and developers who are working on creating highly available systems within AWS.
To get the most out of this course, you should already have a basic working knowledge of AWS.
Dynamically scaling is the bread and butter of autoscaling within AWS. It removes the burden of having to manually launch instances yourself and remove them when they are no longer needed. A good dynamic scaling policy should be able to handle your day-to-day needs as long as you can get it set up correctly.
At the moment there are two types of dynamic scaling policies - the first one is step scaling, it was the original way to scale your instances within an autoscaling group. The second method is to use Target tracking to dynamically scale your fleet.
Let's start by talking about step scaling first as it really helps in demonstrating how all scaling works behind the scenes.
Step scaling is a method of adding or removing instances from your autoscaling group based on tracking of a certain metric. The most common metric that is tracked is the overall CPU usage of the entire autoscaling group. When the metric you are tracking goes over a specific threshold (the upper bound), you can have the auto scaling group add more instances into the group to bring that metric below that upper bound threshold.
For example, you could set an upper bound of 80% CPU usage, and have the step scaling policy add an instance into the auto scaling group when we are above that limit. When that instance comes online, it will reduce the CPU usage to 60% and lower the CPU load below the trigger point.
We can also have the same happen on the bottom end of the spectrum. You can set a lower bound of 20% and have the auto scaling group remove an instance if the CPU load is ever too low. This helps to save costs when demand goes down.
Having both of these systems working keeps your auto scaling groups, and the instances they contain, in a state of equilibrium.
When adding new instances into your group it will take a few minutes for them to completely come online and handle load. It's important that we recognize this and do not keep adding more and more instances while still above the threshold. This is why we will need to set a cooldown policy to mediate this problem.
The cooldown policy tells the system to wait for a period of time before evaluating the need to add more instances. When set appropriately, this will let the newly added instances come online and remove some of the load from the system. Without this, your auto scaling group would rapidly overscale and end up costing you a lot of money.
The threshold trigger points that autoscaling is making decisions on (for adding and removing instances), are created by setting up a CloudWatch alarm that autoscaling can listen to. You can even have multiple alarm levels that trigger different amounts of scaling. You could have one alarm set to go off at 60% CPU utilization, and another to go off at 80%, and even a third that goes at 95%.
Each one of these alarms can trigger on their own and will try to add their specific number of instances into the group. For example, the 60% alarm might add just one instance, the 80% alarm could add 3, while the 95% alarm could add 5 instances.
It is important to note that each of these alarms could all be hit in sequence as load increases over time. Now, this is where things can get a little complicated.
When calculating how many instances a specific threshold should add into the autoscaling group, the system first checks to see if there are any scaling events already taking place.
For example, let’s say that the 60% alarm that adds one instance has already triggered. It has a cooldown period of two minutes. If the 80% CPU threshold alarm is triggered, it will want to add three instances. However, it will first check to see if there is already a scaling event occurring, and will take that into account.
Instead of adding three instances on top of the one instance from the previous alarm, it will add two instances instead. This will bring the grand total of new instances up to three, instead of four.
To continue with the example, if we now trigger the 95% CPU utilization alarm, that wants to add 5 instances, it too will check to see if there is already a scaling event happening.
When it checks to see how many new instances are already being added (which is three currently) it only provisioned two more instances, bringing the total to five. ( the number it wants to add if it was the only scaling event occurring)
Autoscaling functions this way to prevent a massive overscaling event from occuring. This helps prevent wild swings up and down.
Some general advice: you should try to scale up more aggressively to help deal with load issues while scaling back down slower. This is because it takes a while for instances to come online, while they can be terminated almost immediately. You don't want to get yourself into trouble by thrashing up and down, killing instances, only to respawn them soon after.
The way you stop this from happening is by having a long cooldown on your auto scaling groups instance removal policy. This will help keep your instances alive longer and remove the thrashing issue. This was a bigger issue when instances were charged by the hour, so you would not want to keep adding and then killing the instances quickly, just to be billed for a full hour - when they might have only been alive for ten minutes.
It’s not as big of an issue these days, so you might not have to worry about it.
Ok so now that we understand how step scaling works, we can move onto looking at target tracking
Target tracking works in a similar way to step scaling in that you set up a metric that you want the auto scaling group to monitor. For this example, we are going to still use the CPU Utilization as our metric of choice.
When using step scaling, we are in charge of determining when to add or remove instances, based on alarms, thresholds, and whatnot. This gives us a range in which the CPU utilization can sit, and we are in charge of how quickly that scales up and down.
Now target tracking is a more simplified version of step scaling that takes you out of the equation. For Target Tracking, you just need to tell the system about where you want the CPU utilization to sit ( let say we want the auto scaling group to track at 40% CPU utilization), and it will do the rest.
Target tracking will automatically set up the alarms and scaling mechanisms just like step scaling - but without you having to do any of the leg work. It will do its best to adjust and manage everything to keep that tracked metric at or very near to what you set.
Something to keep in mind however is that when you have smaller autoscaling groups (ones in which there are not many instances) it becomes much harder for the system to keep the metrics at the tracked level. This is because each addition or subtraction of an instance causes a large swing in capacity. For example, if you only have 4 instances, and you add another one, that is a 25% increase in new capacity. As opposed to having 100 instances and adding one is just 1% more capacity.
In regards to scaling up the system will do so aggressively. It will be even more aggressive as the number you are monitoring gets farther and farther out of your desired target. However, just like we recommended for step scaling, the system will scale back slowly in order to ensure the best availability and experience for users.
As a final note, it is important to mention that you should not delete any of the CloudWatch alarms that target tracked autoscaling creates, otherwise it will not function properly. They will be automatically removed if you ever delete the scaling policy.
William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.