1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. How to Architect with a Design for Failure Approach

Auto-Scaling our application


Testing against failures
Start course

The gold standard for high availability is five 9s, meaning guaranteed uptime 99.999% of the time. That means just five and a half minutes of downtime throughout an entire year. Achieving this kind of reliability requires some advanced knowledge of the many tools AWS provides to build a robust infrastructure.

In this course, expert Cloud Architect Kevin Felichko will show one of the many possible alternatives for creating a high availability application, designing the whole infrastructure with a Design for Failure. You'll learn how to use AutoScaling, load balancing, and VPC to run a standard Ruby on Rails application on an EC2 instance, with data stored on an RDS-backed MySQL database, and assets stored on S3. Kevin will also touch on some advanced topics like using CloudFront for content delivery and how to distribute an application across multiple AWS regions.

Who should take this course

As an intermediate/advanced course, you will need to have some experience with EC2, S3 and RDS, and at least a basic knowledge of AutoScaling, ELB, VPC, Route 53 and CloudFront.

Test your knowledge of the material covered in this course: take a quiz.

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


Welcome to lesson four in our series on How to Architect with a Design for Failure Approach. This lesson is all about auto scaling our application. Auto scaling is one of the most important services to building highly available applications.

When properly used, it can eliminate a single point of failure by distributing traffic across availability zones. It can self heal when instances stop responding by launching new instances.

It is simply an amazing service. There's a lot to cover, so let's get started. Before we can make a launch configuration and set up auto scaling, we need to create our elastic load balancer.

The elastic load balancer is a service provided by AWS to distribute incoming traffic evenly across healthy EC2 instances that are under its control.

Healthy is the keyword here. The elastic load balancer performs periodic, configurable health checks and makes decisions on where to send traffic.

Let's head to the EC2 dashboard. Under network and security, click on the load balancer's link. From here click the create load balancer button. Like most resources in AWS, we have to give it a name. We will create the ELB in our one and only VPC. We will leave the create an internal load balancer option unchecked.

This will direct the DNS name to a public IP address. If checked, the DNS name would be pointed to a private IP instead. Let's check the enable advanced VPC configuration option which will let us assign subnets to the ELB in a later step.

The listener configuration allows us to map incoming ELB traffic to EC2 instance ports. The default Port 80 mapping will suffice for our application. Next up, we create our health check. Our options include standard HTTP, TCP, HTTPS and SSL. We will stick with standard HTTP and direct it to our robots.txt file. If our web server cannot serve up the static request, then we can safely assume something is wrong with the instance and no further traffic should be sent to it until it becomes healthy.

With the current settings under advanced details, an EC2 instance will be checked every 30 seconds. It has five seconds to respond to the request. Failure to respond in the allocated time means the instance could be unhealthy. Two consecutive unhealthy checks will put the EC2 instance into out-of-service status. To become healthy again, it must pass 10 consecutive health checks before it will begin receiving traffic. These thresholds are acceptable for our application. Since this ELB will be limited to our web tier, we will add each subnet we created for our web servers. It is important to note that we can only add one subnet per availability zone. Now we need to set a security group for our ELB. We are going to select our pre-configured ELB security group and continue. The add EC2 instances step allows us to add running instances.

Since we do not have any currently running, our list is empty. Note, we are not required to add any during the creation process. EC2 instances that are created by auto scaling will be attached to this ELB automatically. The settings we are most concerned about on this step are under the availability zone distribution section. We need to ensure that enabled cross zone load balancing is checked.

Without it our high availability design is useless. The other option enable connection draining determines how traffic is handled when an instance is being de-registered or has been declared unhealthy. We're going to leave connection draining enabled and continue. Everything on this review step appears as it should, so we can go ahead with clicking create.

It takes a few seconds for the ELB to be created. Once it is finished, we are ready to create our launch configuration and auto scaling policy. Before we can set up our auto scaling policy, we have to configure what to launch.

First, we select the Amazon machine image or AMI to launch. There are many options available to us. We can create a basic system which is an AWS optimized operating system. When auto scaling we might select this option if we are going to run scripts to pull our latest source code and configuration. We can select an AMI that we've created from an EC2 instance. This is exactly what we're going to do for our application using an AMI that was created prior to this lesson. The other options include using a marketplace AMI or community AMI. Both are AMIs created by a third party of which include an hourly cost on top of the standard EC2 instance rates. We can see those additional fees before we select AMI. We're not going to go deeper into this as we're not going this route. Once the AMI is selected, we move on to selecting an instance type.

Selecting an instance type is a critical step in ensuring performance and cost. Our application can and will run under a T1 micro instance, albiet slowly with even the most minimal user loads. Much like the RDS instance we created earlier, a T1 micro instance is not the best choice for many situations. Be sure to select an instance type that works best for you.

Next, we name the launch configuration. The purchasing option allows us to specify if we want to use spot instances.

In short, spot instances let you bid on unused server capacity to fire up your EC2 instances. You are bidding against other AWS users. If your bid does not exceed the current spot price, no instance is launched. If your bid exceeds the current spot price, you instance get launched and will run until either you terminate it or the spot price exceeds your bid. The latter means that your instance could be shut down in the middle of processing requests. Not ideal for our simple web application. Therefore, we're not going to use spot instances.

Our instances do not need an IAM role since they are not accessing any secured resources. CloudWatch detailed monitoring will take metrics every minute and aggregate data across the instances in the launch configuration. Standard CloudWatch monitoring in five minutes is acceptable for our application. We're not making any changes under advanced details since it's not relevant to our high availability design. The root volume with eight gigabytes is perfect to launch our snap shot. For the volume type, we will accept standard which is great for light IO with or without a lot of bursting. Other options include provisioned IOPS and general purpose. Provisioned IOPS are capable of consistently high IOPS performance. Provisioned and general purpose storage use solid state drives while standard uses magnetic drives. Each instance will be configured using our pre-created web app security group which allows HTTP traffic from our load balancer only. Direct HTTP traffic to our launched instances is not permitted in this configuration by design. We also do not allow SSH connections with our launched instances which generates a warning when we attempt to move on. For our application we are limiting access as a standard security practice. You may do things differently in your environment. We will acknowledge the SSH warning and review our launch configuration. Everything looks good, so we can click the create launch configuration button. We are presented with a prompt regarding a key pair to use.

Our options include to use an existing key pair, create a new key pair or proceed without one. We will use our existing key pair and acknowledge the disclaimer about our private key. Once our launch configuration has been created, we begin the setup of our auto scaling group. As you can see the launch configuration we created has already been selected.

Next, we will name the auto scaling group. Group size will tell AWS how many instances to start with. We want to start with three instances. To have our instances evenly distributed among our availability zones, we need to select our VPC and add the subnets to use. In our example, we will add each of the web east availability zones. Under advanced details, we need to check the received traffic from elastic load balancers option. This will give us the option to select which ELBs to use. We add our one and only load balancer to the list and continue to the next step. There are two options for scaling policies. Keep the group at its initial size and use scaling policies to adjust the capacity of this group. If we choose keeping the group at its initial size, we will always have our three instances running regardless of capacity. Choosing to use scaling policy allows us to set min and max instances as well as set CloudWatch alarms to increase or decrease the size of our group. This is useful when your application needs to handle a steady and sustained increase in load. It also helps save money by decreasing the number of running instances as the increased load begins to decline.

We do not need a scaling policy for our application. We can change these settings later if we need to. Auto scaling policies can send out notifications when a new instance is launched, an instance is terminated and failed events. We don't have a need to do this for our application though it is recommended that you consider adding notifications for your setup.

Next, we will tag our instances with a name. This will identify which EC2 instances are part of the auto scaling group; very useful if we have many EC2 instances running in one region. We need to make sure that tag new instances check box is checked to pass this tag through to created EC2 instances.

This is the only tag we will use. We could add up to 10 tags if needed. Finally, we review and create our auto scaling group. It takes a few seconds to create.

Once created, we can navigate to our EC2 instance list and see three instances have been launched, one in each availability zone per our requirements. Before these instances begin receiving traffic from the ELB, they have to pass the health checks we configured earlier when setting up the elastic load balancer. Our ELB checks our EC2 instances every 30 seconds and each instance must pass 10 consecutive checks, meaning it will take at least 300 seconds or five minutes before traffic flows through. After waiting for at least five minutes, we can see our EC2 instances are in service. A quick check of our ELB DNS name shows our website is operational.

With our application running, we can move to our next lesson on using S3 and CloudFront in order to deliver content faster as well as meet our service degradation goals.

About the Author
Kevin Felichko
Solutions Architect

Kevin is a seasoned technologist with 15+ years experience mostly in software development.Recently, he has led several migrations from traditional data centers to AWS resulting in over $100K a year in savings. His new projects take advantage of cloud computing from the start which enables a faster time to market.

He enjoys sharing his experience and knowledge with others while constantly learning new things. He has been building elegant, high-performing software across many industries since high school. He currently writes apps in node.js and iOS apps in Objective C and designs complex architectures for AWS deployments.

Kevin currently serves as Chief Technology Officer for PropertyRoom.com, where he leads a small, agile team.