Combining Elastic Load Balancers with EC2 Auto Scaling helps to manage and control your AWS workloads. This combination supports the demands put upon your infrastructure, while minimizing performance degradation. With this in mind, engineers and solution architects should have a deep understanding of how to implement these features.
In this article, we’ll cover the basics about Elastic Load Balancers and EC2 Auto Scaling. To dive deeper and discover how to implement and configure load balancing and auto scaling to build a scalable, flexible architecture, check out my newest course: Using Elastic Load Balancing and EC2 Auto Scaling.
The main function of an Elastic Load Balancer, commonly referred to as an ELB, is to help manage and control the flow of inbound requests to a group of targets by distributing these requests evenly across the targeted resource group. These targets could be a fleet of EC2 instances, AWS Lambda functions, a range of IP addresses, or even containers. The targets defined within the ELB could be situated across different availability zones (AZs) for additional resilience or all placed within a single AZ.
Let’s look at this from a typical scenario. For example, let’s suppose you just created a new application currently residing on a single EC2 instance within your environment which is being accessed by a number of users. At this stage, your architecture can be logically summarized as shown below.
If you are familiar with architectural design and best practices, then you would realize that using a single instance approach isn’t ideal; although, it would certainly work and provide a service to your users. However, this infrastructure layout brings some challenges. For example, the one instance where your application is located can fail — perhaps from a hardware or a software fault. If that happens, your application will be down and unavailable to your users.
Also, if you experience a sudden spike in traffic, your instance may not be able to handle the additional load based on its performance limitations. To strengthen your infrastructure and help remediate these challenges, such as the unpredictable traffic spikes and high availability, you should introduce an Elastic Load Balancer and additional instances running your application into the design as shown below.
As you can see in this design, the AWS Elastic Load Balancer acts as the point for receiving incoming traffic from users and evenly distributes the traffic across a greater number of instances. By default, the ELB is highly available since it is an AWS managed service, which works to ensure resilience so we don’t have to. Although it might seem the ELB is a single point of failure, the ELB is in fact comprised of multiple instances managed by AWS. Also, in this scenario we now have three instances running our application.
Now let me revisit the challenges we discussed previously. If any of these three instances fail, the ELB will automatically detect the failure based on defined metrics and divert any traffic to the remaining two healthy instances. Also, if you experienced a surge in traffic, then the additional instances running your application would help you with the additional load.
One of the many advantages of using an ELB is the fact that it is managed by AWS and it is, by definition, elastic. This means that it will automatically scale to meet your incoming traffic as the incoming traffic scales both up and down. If you are a system administrator or a DevOps engineer running your own load balancer by yourself, then you would need to worry about scaling your load balancer and enforcing high availability. With an AWS ELB, you can create your load balancer and enable dynamic scaling with just a few clicks.
Depending on your traffic distribution requirements, there are three AWS Elastic Load Balancers available:
Now let me now talk a little about the components of an AWS Elastic Load Balancer and some of the principles behind them.
Listeners: For every load balancer, regardless of the type used, you must configure at least one listener. The listener defines how your inbound connections are routed to your target groups based on ports and protocols set as conditions. The configurations of the listener itself differ slightly depending on which ELB you have selected.
Target Groups: A target group is simply a group of your resources that you want your ELB to route requests to, such as a fleet of EC2 instances. You can configure the ELB with a number of different target groups, each associated with a different listener configuration and associated rules. This enables you to route traffic to different resources based upon the type of request.
Rules: Rules are associated to each listener that you have configured within your ELB, and they help define how an incoming request gets routed to which target group.
As you can see, your ELB can contain one or more listeners, each listener can contain one or more rules, each rule can contain more than one condition, and all conditions in the rule equal a single action. An example rule could look as follows, where the IF statement resembles the conditions and the THEN statement acts as the action if all the conditions are met.
Depending on which listener the request was responded to by the ELB, a rule based upon a priority listing would be associated containing these conditions and actions. If the request comes from within the 10.0.1.0/24 network range (condition 1) and was trying to carry out a HTTP PUT request (condition 2) then the request would be sent to the target group entitled ‘Group1’ (Action).
Health Checks: The ELB associates a health check that is performed against the resources defined within the target group. These health checks allow the ELB to contact each target using a specific protocol to receive a response. If no response is received within set thresholds, then the ELB will mark the target as unhealthy and stop sending traffic to the target.
Internal or Internet-Facing ELBs: There are two different schemes that can be used for your Elastic Load Balancers, either internal or internet-facing.
ELB Nodes: During the creation process of your ELBs, you are required to define which availability zone you’d like your ELB to operate within. For each AZ selected, an ELB node will be placed within that AZ. As a result, you need to ensure that you have an ELB node associated to an AZ for which you want to route traffic to. Without the AZ associated, the ELB will not be able to route traffic to any targets within the AZ even if they are defined within the target group. This is because the nodes are used by the ELB to distribute traffic to your target groups.
Cross-Zone Load Balancing: Depending on which ELB option you select, you may have the option of enabling and implementing cross-zone load balancing within your environment.
Let’s presume you have two availability zones activated for your ELB with each associated load balancer receiving equal amount of traffic. One AZ has six targets and the other has four, as shown below. When cross-load load balancing is disabled, each ELB in its associated AZ will distribute its traffic with the targets within that AZ only. As we can see from the image, this results in an uneven distribution of traffic for each target across the availability zones.
With cross-zone load balancing enabled, regardless of how many targets are in an associated AZ, the ELBs will distribute all incoming traffic evenly between all targets, ensuring each target across the AZs have an even distribution.
So what exactly is EC2 Auto Scaling? Put simply, auto scaling is a mechanism that automatically allows you to increase or decrease your EC2 resources to meet the demand based off custom-defined metrics and thresholds.
Let’s look at an example of how EC2 Auto Scaling can be used in practice. Let’s say you had a single EC2 instance acting as a web server receiving requests from the public users across the internet. As the requests increase (the demand), so does the load on the instance increase and additional processing power will be required to process the additional requests; therefore, the CPU utilization would also increase. To avoid running out of CPU resource on your instance — which would lead to poor performance experienced by your end users — you would need to deploy another EC2 instance to load balance the demand and process the increased requests.
With auto scaling, you can configure a metric to automatically launch a second instance when the CPU utilization gets to 75% of the first instance. By load balancing traffic evenly, it would reduce the demand put upon each instance and reduce the chance of the first web server failing or slowing due to high CPU usage. Similarly, when the demand on your web server reduces, so would your CPU utilization, so you could also set a metric to scale back. In this example, you could configure auto scaling to automatically terminate one of your EC2 instances when the CPU utilization dropped to 20% as it would no longer be required due to the decreased demand. Scaling your resources back helps you to optimize the cost of your EC2 fleet as you only pay for resources when they are running.
Through these customizable and defined metrics, you can increase (scale out) and decrease (scale in) the size of your EC2 fleet automatically with ease. This has many advantages, and here are some of the key points:
When you couple EC2 Auto Scaling with Elastic Load Balancer, you get a real sense of how beneficial building a scalable and flexible architecture for your resources can be.
In my new course, Using Elastic Load Balancing and EC2 Auto Scaling to Support AWS Workloads, you’ll discover:
If you have any experience using ELBs with EC2 Auto Scaling, feel free to drop a comment below. We’re always looking for great tips to add value to our community!
It's Flash Sale time! Get 50% off your first year with Cloud Academy: all access to AWS, Azure, and Cloud…
In this blog post, we're going to answer some questions you might have about the new AWS Certified Data Engineer…
This is my 3rd and final post of this series ‘Navigating the Vocabulary of Gen AI’. If you would like…