1. Home
  2. Training Library
  3. Designing Resilient Architectures

Amazon Auto Scaling


How to design high availability and fault tolerant architectures
Start course
2h 21m

Designing Resilient Architectures. In this module, we explore the concepts of business continuity and disaster recovery, the well-architected framework and the AWS services that help us design resilient, fault-tolerant architectures when used together.

We will firstly introduce the concepts of high availability and fault tolerance and introduce you to how we go about designing highly available, fault-tolerant solutions on AWS. We will learn about the AWS Well Architected Framework, and how that framework can help us make design decisions that deliver the best outcome for end users. Next, we will introduce and explain the concept of business continuity and how AWS services can be used to plan and implement a disaster recovery plan.

We will then learn to recognize and explain the core AWS services that when used together can reduce single points of failure and improve scalability in a multi-tier solution.  Auto Scaling is a proven way to enable resilience by enabling an application to scale up and down to meet demand. In a hands-on lab we create and work with Auto Scaling groups to improve add elasticity and durability. Simple Queue service increases resilience by acting as a messaging service between other services and applications, thereby decoupling layers, reducing dependency on state. Amazon Cloudwatch is a core component of maintaining a resilient architecture - essentially it is the eyes and ears of your environment, so we next learn to apply the Amazon CloudWatch service in a hands-on environment. 

We then learn to apply the Amazon CloudFront CDN service to add resilience to a static website that is served out of Amazon S3. Amazon Cloudfront is tightly integrated with other AWS services such as Amazon S3, AWS WAF and Amazon GuardDuty making Amazon CloudFront an important component to increasing the resilience of your solution.


- [Instructor] Auto Scaling is a core component of increasing durability and availability on AWS. Auto Scaling enables you to provision based on actual demand rather than on estimated demand. Auto Scaling has health checking built-in and it can drop EC2 instances that are not responding and replace them newly spun up healthy versions. All of this functionality works across multiple availability zones within a region, helping you achieve high availability with minimal manual intervention. Auto Scaling should be used when you want to ensure that you have at least say one EC2 instance running at all times or perhaps one or two EC2 instances in one or two of your availability zones always running. Another important use case is when you need to scale based on increased demand or when you know of an up and coming event where your EC2 instances may be under a heavy workload. You can also ensure you scale back down to a normal EC2 presence and not waste money on idle resources. Auto Scaling makes horizontal scaling easy based on demand or predetermined schedules. You can increase or decrease the number of EC2 instances running based on CloudWatch metrics. In the example on my screen, the Auto Scale policy will add an instance if the average CPU utilization equals or exceeds 70% for more than 300 seconds which is five minutes. And the other part of the policy will remove instances from the Auto Scale group if the average CPU for the group equals or is less than 20% for five minutes. 

Core components of Auto Scaling. The launch configuration, your group uses a launch configuration as a template for its EC2 instances. When you create a launch configuration, you can specify information such as the AMI ID, instance type, key pair, security groups, and block device mapping for your instances. An Auto Scaling group enables you to specify minimum, maximum, and desired number of EC2 instances in that group. A scaling plan tells Auto Scaling when and how to scale. You can base a scaling plan on the occurrence of specified conditions, dynamic scaling for example. Consider an Auto Scaling group that has two availability zones, a desired capacity of two instances and scaling policies that increase and decrease the number of instances by one when certain thresholds are met. When the threshold for the scale out policy is met, the policy takes effect and Auto Scaling launches a new instance. The Auto Scaling group now has three instances. When the threshold of the scale in policy is met, the policy takes effect and Auto Scaling terminates one of the instances. If the group does not have a specific termination policy assigned to it, Auto Scaling uses the default termination policy. Auto Scaling selects the availability zone with two instances and terminates the instance launched from the oldest launch configuration. If the instances were launched from the same launch configuration, then Auto Scaling selects the instance that is closest to the next billing hour and terminates that.

 This helps you maximize the use of your EC2 instances while minimizing the number of hours you are billed for Amazon EC2 usage. If there is more than one unprotected instance closest to the next billing hour, Auto Scaling selects one of these instances at random. All right, so how about creating your own termination policy? You have the option of replacing the default policy with a customized one. When you customize the termination policy, Auto Scaling first assesses the availability zones for any imbalance. If an availability zone has more instances than other availability zones that are used by the group, then Auto Scaling applies your specified termination policy on the instances from the imbalanced availability zone. If the availability zones used by the group are balanced, then Auto Scaling applies the termination policy that you specified. Now by default, instance protection is disabled. Instance protection does not protect Auto Scaling instances from manual termination through the Amazon EC2 console, the terminate instances command or the terminate instances API. 

Let's test our Auto Scaling knowledge with a Cloud Academy quiz question. The question reads, a user has defined an Auto Scaling termination policy to first delete the instance with the nearest billing hour. So this is a custom policy. Auto Scaling has launched three instances in the US-East-1A region and two instances in the US-East-1B region. One of the instances in the US-East-1B region is running nearest to the billing hour. Which instance will Auto Scaling terminate first when executing the termination action? A, random instance from US-East-1B. Nope, it only uses random policies when none of the other logic can be completed. B, random instance from US-East-1A, nope. C, instance with the nearest billing hour in US-East-1B, hmm, no, that's what the policy says. However, remember it chooses based on which region has the most number of instances. So answer D is instance with the nearest billing hour in the US-East-1A which is correct. So even though the user has configured the termination policy as a custom policy, before Auto Scaling selects an instance to terminate, it first identifies the availability zone that has more instances than the other availability zones used by the group. Within the selected availability zone, it identifies the instance that matches the specified termination policy.

About the Author
Andrew Larkin
Head of Content
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.