Using Elastic Load Balancers and EC2 Auto Scaling to Support AWS Workloads

Combining Elastic Load Balancers with EC2 Auto Scaling helps to manage and control your AWS workloads. This combination supports the demands put upon your infrastructure, while minimizing performance degradation. With this in mind, engineers and solution architects should have a deep understanding of how to implement these features.

In this article, we’ll cover the basics about Elastic Load Balancers and EC2 Auto Scaling. To dive deeper and discover how to implement and configure load balancing and auto scaling to build a scalable, flexible architecture, check out my newest course: Using Elastic Load Balancing and EC2 Auto Scaling.

Elastic Load Balancer Course

Elastic Load Balancers

The main function of an Elastic Load Balancer, commonly referred to as an ELB, is to help manage and control the flow of inbound requests to a group of targets by distributing these requests evenly across the targeted resource group. These targets could be a fleet of EC2 instances, AWS Lambda functions, a range of IP addresses, or even containers. The targets defined within the ELB could be situated across different availability zones (AZs) for additional resilience or all placed within a single AZ.  

Let’s look at this from a typical scenario. For example, let’s suppose you just created a new application currently residing on a single EC2 instance within your environment which is being accessed by a number of users. At this stage, your architecture can be logically summarized as shown below.

Users -> EC2 Instance Diagram

If you are familiar with architectural design and best practices, then you would realize that using a single instance approach isn’t ideal; although, it would certainly work and provide a service to your users. However, this infrastructure layout brings some challenges. For example, the one instance where your application is located can fail — perhaps from a hardware or a software fault. If that happens, your application will be down and unavailable to your users. 

Also, if you experience a sudden spike in traffic, your instance may not be able to handle the additional load based on its performance limitations. To strengthen your infrastructure and help remediate these challenges, such as the unpredictable traffic spikes and high availability, you should introduce an Elastic Load Balancer and additional instances running your application into the design as shown below.

Users -> Elastic Load Balancer -> EC2 Instances Diagram

As you can see in this design, the AWS Elastic Load Balancer acts as the point for receiving incoming traffic from users and evenly distributes the traffic across a greater number of instances. By default, the ELB is highly available since it is an AWS managed service, which works to ensure resilience so we don’t have to. Although it might seem the ELB is a single point of failure, the ELB is in fact comprised of multiple instances managed by AWS. Also, in this scenario we now have three instances running our application.

Now let me revisit the challenges we discussed previously. If any of these three instances fail, the ELB will automatically detect the failure based on defined metrics and divert any traffic to the remaining two healthy instances. Also, if you experienced a surge in traffic, then the additional instances running your application would help you with the additional load.

One of the many advantages of using an ELB is the fact that it is managed by AWS and it is, by definition, elastic. This means that it will automatically scale to meet your incoming traffic as the incoming traffic scales both up and down. If you are a system administrator or a DevOps engineer running your own load balancer by yourself, then you would need to worry about scaling your load balancer and enforcing high availability. With an AWS ELB, you can create your load balancer and enable dynamic scaling with just a few clicks.

Depending on your traffic distribution requirements, there are three AWS Elastic Load Balancers available: 

  1. First, the Application Load Balancer: This provides a flexible feature set for your web applications running the HTTP or HTTPS protocols. The application load balancer operates at the request level. It also provides advanced routing, TLS termination, and visibility features targeted at application architectures, allowing you to route traffic to different ports on the same EC2 instance.
  2. Next, there is the Network Load Balancer: This is used for ultra-high performance for your application, while maintaining very low latencies at the same time. It operates at the connection level, routing traffic to targets within your VPC. It’s also capable of handling millions of requests per second.
  3. Finally, the Classic Load Balancer: This is primarily used for applications that were built in the existing EC2 classic environment and operates at both the connection and request level.

Now let me now talk a little about the components of an AWS Elastic Load Balancer and some of the principles behind them.

Listeners: For every load balancer, regardless of the type used, you must configure at least one listener. The listener defines how your inbound connections are routed to your target groups based on ports and protocols set as conditions. The configurations of the listener itself differ slightly depending on which ELB you have selected.

Target Groups: A target group is simply a group of your resources that you want your ELB to route requests to, such as a fleet of EC2 instances. You can configure the ELB with a number of different target groups, each associated with a different listener configuration and associated rules. This enables you to route traffic to different resources based upon the type of request.

Rules: Rules are associated to each listener that you have configured within your ELB, and they help define how an incoming request gets routed to which target group.   

Elastic Load Balancer Components

As you can see, your ELB can contain one or more listeners, each listener can contain one or more rules, each rule can contain more than one condition, and all conditions in the rule equal a single action. An example rule could look as follows, where the IF statement resembles the conditions and the THEN statement acts as the action if all the conditions are met.  

If/then statement

Depending on which listener the request was responded to by the ELB, a rule based upon a priority listing would be associated containing these conditions and actions.  If the request comes from within the network range (condition 1) and was trying to carry out a HTTP PUT request (condition 2) then the request would be sent to the target group entitled ‘Group1’ (Action).

Health Checks: The ELB associates a health check that is performed against the resources defined within the target group. These health checks allow the ELB to contact each target using a specific protocol to receive a response. If no response is received within set thresholds, then the ELB will mark the target as unhealthy and stop sending traffic to the target.  

Internal or Internet-Facing ELBs: There are two different schemes that can be used for your Elastic Load Balancers, either internal or internet-facing.  

  • Internet-Facing: As the name implies, the nodes of an ELB that are defined as internet-facing are accessible via the internet and so have a public DNS name that can be resolved to its public IP address, in addition to an internal IP address as well. This allows the ELB to serve incoming requests from the internet before distributing and routing the traffic to your target groups, which in this instance could be a fleet of web servers receiving HTTP or HTTPs requests. When your internet-facing ELB communicates with its target group, it will only use the internal IP address meaning that your target group do not need public IP addresses.
  • Internal ELB: An internal ELB only has an internal IP address. This means that it can only serve requests that originate from within your VPC itself. For example, you might have an internal load balancer sitting between your web servers in the public subnet and your application servers in a private subnet.

ELB Nodes: During the creation process of your ELBs, you are required to define which availability zone you’d like your ELB to operate within. For each AZ selected, an ELB node will be placed within that AZ. As a result, you need to ensure that you have an ELB node associated to an AZ for which you want to route traffic to. Without the AZ associated, the ELB will not be able to route traffic to any targets within the AZ even if they are defined within the target group. This is because the nodes are used by the ELB to distribute traffic to your target groups.

Cross-Zone Load Balancing: Depending on which ELB option you select, you may have the option of enabling and implementing cross-zone load balancing within your environment. 

Let’s presume you have two availability zones activated for your ELB with each associated load balancer receiving equal amount of traffic. One AZ has six targets and the other has four, as shown below. When cross-load load balancing is disabled, each ELB in its associated AZ will distribute its traffic with the targets within that AZ only. As we can see from the image, this results in an uneven distribution of traffic for each target across the availability zones.  

With cross-zone load balancing enabled, regardless of how many targets are in an associated AZ, the ELBs will distribute all incoming traffic evenly between all targets, ensuring each target across the AZs have an even distribution.  

Cross-Zone Elastic Load Balancer

EC2 Auto Scaling

So what exactly is EC2 Auto Scaling? Put simply, auto scaling is a mechanism that automatically allows you to increase or decrease your EC2 resources to meet the demand based off custom-defined metrics and thresholds.  

Let’s look at an example of how EC2 Auto Scaling can be used in practice. Let’s say you had a single EC2 instance acting as a web server receiving requests from the public users across the internet. As the requests increase (the demand), so does the load on the instance increase and additional processing power will be required to process the additional requests; therefore, the CPU utilization would also increase. To avoid running out of CPU resource on your instance — which would lead to poor performance experienced by your end users — you would need to deploy another EC2 instance to load balance the demand and process the increased requests.  

With auto scaling, you can configure a metric to automatically launch a second instance when the CPU utilization gets to 75% of the first instance. By load balancing traffic evenly, it would reduce the demand put upon each instance and reduce the chance of the first web server failing or slowing due to high CPU usage. Similarly, when the demand on your web server reduces, so would your CPU utilization, so you could also set a metric to scale back. In this example, you could configure auto scaling to automatically terminate one of your EC2 instances when the CPU utilization dropped to 20% as it would no longer be required due to the decreased demand. Scaling your resources back helps you to optimize the cost of your EC2 fleet as you only pay for resources when they are running.  

Through these customizable and defined metrics, you can increase (scale out) and decrease (scale in) the size of your EC2 fleet automatically with ease. This has many advantages, and here are some of the key points:

  • Automation – As this provides automatic provisioning based off of custom defined thresholds. Your infrastructure can elastically provision the required resources to prevent your operations team from manually deploying and removing resources to meet demands put upon your infrastructure.
  • Greater customer satisfaction – If you are always able to provision enough capacity within your environment when the demand increases, then it’s unlikely your end users will experience performance issues which will help with user retention.
  • Cost reduction –  With the ability to automatically reduce the amount of resources you have when the demand drops, you will stop paying for those resources. You only pay for an EC2 resource when it’s up and running, which is based on a per second basis.

When you couple EC2 Auto Scaling with Elastic Load Balancer, you get a real sense of how beneficial building a scalable and flexible architecture for your resources can be.

In my new course, Using Elastic Load Balancing and EC2 Auto Scaling to Support AWS Workloads, you’ll discover:

  • The differences between the the Elastic Load Balancers available in AWS, these being ALBs, NLBs and Classic.
  • How ELBs handle different types of requests, including those that are encrypted.
  • The different components of ELBs and what they are used for.
  • How to configure each of the three ELBs.
  • When and why you might need to configure a SSL/TLS certificates.
  • How to configure auto scaling launch configurations, launch templates, and auto scaling groups.
  • Why you should use ELBs and auto scaling together within your infrastructure.

If you have any experience using ELBs with EC2 Auto Scaling, feel free to drop a comment below. We’re always looking for great tips to add value to our community!

Cloud Academy