AWS Cloud Practitioner & Elastic Load Balancing/Auto Scaling
Start course

This portion of the lecture will introduce you to elastic Load Balancing (ELB) and auto scaling.

We will discuss the main functions of ELB and the three types supported by AWS:
- Network Load Balancer: ultra-high performance while maintaining very low latencies; operates at the connection level; capable of handling millions of requests per second
- Application Load Balancer: flexible feature set for web applications running HTTP or HTTPS protocols; operates at the request level; provides advanced routing, TLS termination, and visibility features
- Classic Load Balancer: primarily used for applications built in EC2-Classic environment; operates at both the connection and request level

The lecture explains how to set up an ELB quickly and easily by following a couple setup steps:
- define the load balancer
- assign security groups
- configure security settings
- configure health checks
- add EC2 instances
- add tags

Next, we will discuss what auto scaling is and the benefits of using it:
- automatically increases or decreases your to meet demand based off metrics and thresholds
- scale in and scale out your EC2 fleet
- reduction in cost

We will define the steps required to set up auto scaling by creating a launch configuration and then creating the auto scaling group. After, we will walk through the complete configuration process.

By the end of the lecture, you will see the benefit of combining ELB and auto scaling. They are both very effective and useful compute features of AWS.


Resources referenced within this lecture

Lab: Creating your first Classic Load Balancer

Lab: Creating your first Auto-Scaling Group

Lab: Launching Auto Scaling Groups behind a classic load balancer


Hello and welcome to this lecture where we are going to discuss two very useful and very helpful features that enable you to manage the consumption of your fleet of EC2 Compute resources. These being elastic load balancing, ELB, and auto scaling.

Let's start with ELB, elastic load balancing. The main function of ELB is to direct and route traffic destined to your fleet of EC2 instances across an even distribution, which helps to maintain high availability and resilience of your environment. 

AWS supports three types of ELB. Firstly, the network load balancer. This is used for ultra-high performance for your application while at the same time maintaining very low latencies. It operates at the connection level routing traffic to targets within your vPC. It's also capable of handling millions of requests per second. Next is the application load balancer. This provides a flexible feature set for your web applications running the HTTP or the HTTPS protocols. Whereas the network load balancer operates at the connection level, the application load balancer operates at the request level. It also provides advanced routing, TLS termination, and visibility features targeted to application architectures, allowing you to route traffic to different ports on the same EC2 instance. Finally, the classic load balancer. This is primarily used for applications that were built in the existing EC2-Classic environment, and this operates at both the connection and request level. More detailed information on the different types of ELBs can be found using our existing blog post on this topic. 

ELB can help you to maintain high availability of your applications running on your EC2 resources. So let's see how an ELB can do this.

Imagine you have a single web server receiving traffic from your customers visiting your website. Over time, this traffic increases and you decide to add another web server to manage the load. You now need a mechanism to direct traffic between the two servers equally to handle the incoming load of requests. This is where your classic load balancer comes in.

Any traffic directed at your website is sent to a ELB, which will then distribute the requests across both web servers. Should one of your web servers or instances fail, the ELB will detect this and stop sending traffic to that particular instance. However, traffic will continue to be sent to your remaining fleet. This ensures the availability of your web servers remains up and running.

As traffic increases, you can deploy more and more web servers that can all communicate with the same ELB. Again, this increases availability of your web service. To enhance resiliency even more, you can even configure your ELB to load balance your traffic across multiple availability zones, just in case one AZ goes down within your AWS, your web infrasstructure will continue to function.

Setting up an ELB is quick and easy, as there are only a few components to configure. So let's take a look at the setup steps.

  • Define the load balancer
  • assign security groups
  • configure the security settings
  • configure some health checks
  • add your EC2 instances
  • and finally, add tags

Let's take a look at each step in more detail. The first thing you'll be required to decide is if you want the ELB to be an internal load balancer or an external load balancer. So what's the difference here?

When creating an ELB within your virtual private cloud, you have two options. Either an internal or external load balancer. Let's consider the following network diagram to help explain the difference between the two. An external load balancer will have a public IP address and will typically sit in front of your web servers, receiving traffic from the internet, and then distribute those requests to your fleet of web servers.

An internal load balancer will have an internal IP address and can only be accessed from within your private network. In this diagram, we have an internal ELB between the web servers and the two databases. The web server will send data to the databases via the internal ELB. This will then manage the traffic to the database layer. Once you have selected either an internal or external load balancer, you can then select which availability zones you would like the ELB to distribute traffic to. For best practice in high availability reasons, it's a good choice to select at least two different availability zones just in case AWS experiences an AZ outage. However, you must ensure that the instances that you want to load balance exist in these availability zones.

One component that's required by a ELB is a listener configuration. A listener configuration enables your ELB to check for connection requests both from end clients to your ELB and also from your ELB to your backend instances over a specified protocol and pull. The protocols that are supported for this configuration are HTTP, HTTPS, TCP, and SSL. For a typical external ELB, your listener configuration would likely be HTTPS as the protocol for additional security. To manage instance level security to and from your ELB, security groups are used in much the same way as in normal EC2 instances, which we discussed in the previous lecture.

If you selected either HTTPS or SSL in the listener configuration for an external load balancer, you'll be prompted to configure an additional security policy. This allows you to deploy an SSL certificate on your load balancer to help decrypting requests before passing them onto your web server fleet of instances. This configuration is outside of the scope of this fundamentals course, but if you would like additional information, please see the following link.

If you recall from earlier, I mentioned that should one of your web server instances fail, the ELB will detect this and stop sending traffic to that particular instance. Well, this is managed by configuring a health check on your ELB. This health check monitors the health of any instance associated with the ELB to ensure that it is only routing requests to instances that are functioning correctly. If a health check fails for an instance, then that instance is automatically removed from the ELB. The health check is carried out by performing a ping using a specified protocol to the instance.

For example, performing a ping over port 80 to the /index.html file on the web servers. If a response is received, then the instance is classed as operational and traffic can be routed to that instance. If a response is not received over a specified timeframe and a number of attempts, then the instance will be marked as unhealthy and no further requests will be routed to that instance.

Once your health checks are configured, you then need to select which instances you would like added and associated to the ELB. If you are configuring an external ELB, then you'd select your web servers here. If you do not have your EC2 instances deployed at this point it doesn't matter, as you can add these at a later stage. The final configuration element of your ELB is to configure any tags that you want.

Tags are key-value pairs that are used to add metadata to your resource for enhanced management as your environment grows. For example, you could add a key of Name with a value of My External Load Balancer. Or a key of Project, and a value of Web Application Infrastructure. Once your ELB is created, you can always go back and edit most of this configuration, such as the security groups, which instances they're associated, and the health checks, etc. Now we have an understanding of what an elastic load balancer is and the different elements of its configuration.  Let's now take a look at auto scaling to see how these two AWS components can work together to provide an architectural advantage to your deployment solutions.

So what exactly is auto scaling? Auto scaling is a mechanism that automatically allows you to increase or decrease your resources to meet demand based off of custom-defined metrics and thresholds. So for example, let's say you had one EC2 instance acting as a web server. When the average CPU reaches 75% utilization, you want to automatically deploy another EC2 instance to enable you to distribute traffic between the two servers, bringing the average CPU percentage down. Therefore load balancing the traffic evenly and reducing the chance of one web server failing or slowing down due to the high CPU usage. Similarly, when the average CPU usage reduces to 20%, then you automatically want an EC2 instance to be terminated as the demand has dropped.

Auto scaling can achieve this through defined metrics. You can increase, scale out, and decrease, scale in, the size of your EC2 fleet. This has many advantages, and here are some of the key points.

As this provides automatic provisioning based off of custom defined thresholds, your infrastructure will start to manage itself, preventing you from having to monitor and perform manual deployments. This will ultimately provide a better experience for your users. If there is always enough capacity within your environment, it's unlikely the end user will experience performance problems which may prevent them from using your services again.

Cost reduction. With the ability to automatically reduce the amount of resources you have when demand drops, you will stop paying for those resources. You only pay for an EC2 resource when it's up and running. When you couple auto scaling with elastic load balancing, you get a real sense of a scale of what a flexible architecture for your EC2 Compute instances.

When I discussed the elastic load balancer, I mentioned during its creation that there was a section, which can also be configured afterwards, that asked you to add your EC2 instances. However, if we are using auto scaling, does this mean that we would need to add any additional instances to the ELB as they are created? Logically you would think so. However, auto scaling and ELB have been integrated to work together.

Once you have your auto scaling group configured, you can attach your ELB to the group. When done so, your ELB will automatically add any EC2 instances to the ELB as the group grows in size. Likewise, they would also be removed from the ELB when the group scales back in again. Let's take a look at how auto scaling is configured.

There are two steps to configuring auto scaling.

  1. The first is the creation of a launch configuration
  2. and the second part is the creation of the auto scaling group

The launch configuration is simply a template that the auto scaling group uses to launch new instances. This launch configuration answers a number of questions required when launching a new instance, such as which AMI to use, which instance type to use, if you'd like to use spot instances to help lower costs, if and when public IP addresses should be used for your instances, is there any user data required for automatic scripting on first boot, what storage volume configuration should be used, and which security groups should be used. Most of these steps you'll be familiar with from when you are creating an EC2 instance. It's much the same.

Without this launch configuration template, auto scaling would not know which instance it was launching and to which configuration. So once you have configured the launch configuration, what does the auto scaling group do?

Well, the auto scaling group defines the desired capacity and other limitations of the group using scaling policies. And where the group should scale resources to, such as which AZ. Let's look at these details a bit further for another demonstration. During this demonstration we will perform the following steps. Create an auto scaling group based on a previous launch configuration, set up an auto scaling policy defining when to increase and decrease the group size, and attach an existing ELB to the auto scaling group.

Okay, so I'm at the AWS management console and I've clicked on Services, and now I'll need to go down to the Compute section and select EC2. Now from here, it'll bring up the EC2 dashboard, and along the left hand side, we can see our load balancers and our auto scaling configuration.

Now if we look at our launch configurations, we can see that we already have one here called a CA Demo, and we can look at the details down here. We can see which AMI it's built upon and we can also see the instance type as well, it's a t2.micro. So any instances launch from this launch configuration or launch as a t2.micro using this AMI here. So we're going to create an auto scaling group based on this launch configuration. And then we'll connect a load balancer to that. And we should have a load balancer set up. Just go into Load Balancers, and you can see I just have this here, load balancer demo. So that's all set up and running. You can see which availability zones it's in, etc. And so we'll attach that to our auto scaling group.

Okay, so if we go to our Auto Scaling Groups... We'll click on the blue button, Create Auto Scaling group. Now we can create a new launch configuration or create an auto scaling group from an existing launch configuration. Now this is the option we're going to select 'cause we already have an existing launch configuration like I just showed you.

So we're going to select the CA Demo. Click on Next Step. Now I need to give this auto scaling group a name. Let's call it auto scaling demo. And then we need to tell it how many instances we want to start with. Let's say two. And we select the VPC. We'll just select our default VPC for now. And then select any subnets that you want the instances to be deployed in. Let's say A and B. Now if we go down to Advanced Details, here we can actually select our load balancer to associate to the auto scaling group. Click on this box, receive traffic from one or more load balancers. We can then select our load balancer that I showed you earlier. So by selecting that load balancer we have now associated that ELB with this auto scaling group.

Now we can change the health check type to either the elastic load balancer or the EC2 instances. And there's a health check grace period here of 300 seconds, which is the length of time that auto scaling waits before checking in the instance's health. This grace period begins when an instance comes into service. If you don't put any value in there, then the default value of 300 seconds will be made.

Okay, so following on from there, if we go down to Configure Scaling Policies. And now we can set different policies depending on how we want the auto scaling group to scale. Now we can keep this group at its initial size and just leave it as that, or we can use scaling policies to adjust the capacity of the group. Which is what we want to do.

We want to increase the amount of instances if the CPU usage gets to a certain percentage, and then decrease the amount of instances when the CPU usage comes down again as well. So this is how we set up the scaling policies. So looking at the Increase Group Size. So we want to execute this policy when a specific alarm is triggered. So let's add a new alarm to be triggered.

We click on Create Topic. Send a notification to increase size topic. And we can send that to myself. I'll get an email notification to say that it's been triggered. Whenever the average CPU utilization is greater than or equal to 75%. For at least one consecutive period of five minutes. So what that's saying is if there's a period of five minutes where the average CPU utilization is greater than or equal to 75%, then trigger this alarm. So let's create that alarm.

So now I need to specify what action to take when this policy is triggered. So we want to add one instance when the CPU utilization goes above 75%. Now we need to do the same to decrease the group size. So let's add a new alarm. Create a new topic. And we'll call this decrease. Send that to myself. So we want this to trigger whenever the average CPU utilization is less than or equal to 40% for at least one consecutive period of five minutes. We want to start removing EC2 instances. So this is the alarm to decrease the group size. So let's create that alarm. And the action here is to remove one instance when the CPU utilization is less than 40%.

So that's our scaling policies defined. So when the average CPU utilization goes above 75%, we will add a new instance. And when the average CPU utilization gets to 40% or below, we'll then remove an instance. Now we'll go to Configure Notifications.

So here we can look at our notifications that we just sent up of our topics. So we got our increase topic here, so I can get a notification whenever an instance is launched, terminated, or failed to launch or failed to terminate. And we can do the same for the decrease topic here as well. So I'm just going to leave that as default.

Next we can go to our tags. Just give this a name, just call this demo. Go to review. And we can just review our settings here. So we can see we got the group name, want the group size set as two EC2 instances initially, we specified the subnets, and we have our load balancer attached as well. So we have our scaling policy set up, when to increase and when to decrease. And we also have our notifications and tags configured. And that's it.

So if we create our auto scaling group... And that should now automatically create two EC2 instances for us. So if we go to our EC2 instances and take a look... We can see that it's fired up two EC2 instances from the auto scaling group that we just created, which has used the launch configuration to dictate which instance to run.

So if we look at this instance, we can see that it's a t2.micro. And if we look at this instance, we can also see that it's a t2.micro as well. If we look at our launch configuration, we can see that that was an instance type of a t2.micro. So let's go back to our instances. And we can see here that it's split instances between the two availability zones. Now if we go down to our auto scaling group that we've just configured... We can see a lot of details about the auto scaling group.

If we look at the activity history, we can see where it launched the two instances that we wanted as a minimum. We can see our scaling policies that we created earlier here. We can look at the instances, we can see that they're in service. And which availability zone that they're in and that they're healthy. Look at the monitoring, we can see a number of metrics for the instances there. Notifications, tags, and scheduled actions. And that's how you create an auto scaling group based off an existing launch configuration and how you attach an existing elastic load balancer to it as well.

Hopefully you can now see the benefit of combining ELB and auto scaling to help manage and automatically scale your EC2 Compute resources, both in and out. They are both very effective and useful compute features of AWS.

If you'd like some hands-on experience then we do offer free labs, which cover topics covered in this lecture. So feel free to take a look. Creating your first elastic load balancer, creating your first auto scaling group, and launching an auto scaling group behind a classic load balancer.

That brings us to the end of this lecture. Coming up next, I'm going to be discussing a different compute service, the Amazon EC2 Container Service.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.