Scaling and Resilience
This course will show you how to design and create web applications using the Azure App Service.
By the end of this course, you'll have gained a firm understanding of the key components that comprise the Azure App Service. Ideally, you will achieve the following learning objectives:
- A solid understanding of the foundations of Azure web app development and deployment.
- The use cases of Azure Web Jobs and how to implement them.
- How to design and scale your applications and achieve consistent resiliency.
This course is intended for individuals who wish to pursue the Azure 70-532 certification.
You should have work experience with Azure and general cloud computing knowledge.
This Course Includes
- 1 hour and 35 minutes of high-definition video.
- Expert-led instruction and exploration of important concepts surrounding Azure Web Apps.
What You Will Learn
- How to deploy and configure Azure web apps.
- How to implement Azure Web Jobs.
- Scaling and resilience with Azure web apps.
Hello, and welcome to the session on configuring web app scaling and resilience. In this section, we will discuss the key topics associated with web app scaling and resilience.
Azure web apps provide a variety of ways to scale web applications up and down by changing the number of virtual machine instances handling requests Known as scaling in and out. Or by adjusting the instance size to increase or decrease the number of CPUs, available memory, storage, et cetera which is known as scaling up or down.
As your web apps can also be automatically scaled in or out meaning that the number of instances hosting the web app can be decreased or increased automatically based on the schedule or metric. This enables you to, for example to increase the number of instances hosting your web app during busy periods, such as business peak times for example. You can then decrease the instance count during off-peak periods to reduce costs.
However, what if your busy periods are not predictable? What if you want to be ready for a sudden burst of activity any time? In these situations, you can scale by metric. You can, for example, say that if the average CPU load is at or above 80-percent for more than 15 minutes you want to add another instance to the pool. At the end, we will also discuss the Traffic Manager which can help you direct requests to your web app for fail-over or performance reasons.
As described earlier, the purpose of auto-scaling by schedule is to allow you to define the number of instances serving your web app during predefined periods. Firstly, your web app will need to be on the standard app service plan or above. Free, shared, and basic plans do not support auto-scaling. Though the basic plan does support manual scaling. When setting up a schedule, you can use one of the predefined recurring schedules. Separate recurrent schedules for day and night and or weekdays and weekends. You can also define your own custom schedule for example, a set of non-overlapping date ranges.
Let's see this in practice now in the Azure portal. I'm here on the classic Azure portal, and I want to show you how you can set up auto-scaling. Setting up auto-scaling by schedule is fairly straightforward. All we need to do is click on the set up schedule times which is the green button that you can see here on the screen. And we can scale for day or by night and for weekdays, and for weekends. Let's select both of those, and we can select to scale on different times so let's say 7:30 and 8:30, prime time. We need to scale up and we can click OK. And now we can select drop-down lists and we can see the options that we've supplied here. So let's set the scale for the weekend and scaling will happen on the CPU and we'll say that there will be another CPU instance and then we'll target for a specific threshold and that will be when the CPU usage is between 70 and 80-percent. We click save, and now we've defined an auto-scale policy by schedule.
Auto-scale by metric allows us to scale based on the current environment metrics. These include CPU percentage, memory usage disk or HTTP queue length, as well as data in and data out. Metric values are aggregates across all of the instances hosting your web app and they're usually averages. This means that if one instance has CPU 80-percent, and another instance at 20-percent it will be the average of the two, which is 50-percent which will be the value compared to the threshold. However, this can be configured in the new portal each scheduling rule you define requires that you choose a metric such as CPU usage, a condition such as greater than and a threshold such as 80-percent. You can also specify multiple rules using different conditions, metrics, and actions.
The frequency with which scaling events happen is a significant consideration. You don't want to be constantly adding and removing instances. The recommended strategy is to create rules that scale up aggressively to make sure that you do not hit the result ceiling and impact the functionality of your web app and scale down conservatively to ensure stability. During the creation of an auto-scale rule you can specify a cool-down period. A period of time after a scaling event during which, scaling events will not be triggered. This time lets the web app and associated metrics settle down and helps avoid frequent scaling events.
On a related point, should be noted that scale up and scale down thresholds do not need to be contiguous. For example, you can specify the scale up rule when CPU utilization is above 80-percent and the scale-down threshold at 10-percent.
An example of an auto-scale rule is as follows: Based on the CPU percentage metric when the average CPU utilization is greater than 50-percent over the past 15 minutes create one more instance and wait one hour before reassessing. Even if the CPU utilization remains above 50-percent no further scaling will be performed.
Let's see how we can configure this in the Azure portal now. I'm here in the Azure portal and I've got our application and I'm going to click on the all settings tab and bring up the settings menu. And if we scroll down to app service plan and bring that into view we can click on the scale out selection under app service plan. Bring that into view a bit more. This drop-down list has an instance count that I entered manually, CPU percentage and schedule and performance rules. I'm gonna go ahead and take the default that's already selected for us. And we're gonna create a rule based on one of the other available metrics. So I'm gonna click add profile. Just down here, it doesn't look like it's select-able but you can actually click on that. It brings up another panel I'm gonna move that into view a bit more. We'll give this profile a name. So let's say "test profile." And we'll leave it on always and we'll just click OK.
Now we can add a rule under this profile. So we're gonna click add rule underneath the test profile. We can now configure the rule. Let's select our resource. So we'll just stick with the standard plan and then we'll select the metric and for this one, we're gonna take memory percentage. Let's input an operator and we'll say greater than so that's nicely selected for us already. And we can say the threshold of say 70. And that can be for a duration of say, five minutes for that one. And it would just take the standard average for this one. So we'll just leave the standard average aggregation. And when our scale rule triggers we want to increase the count just by one. So we take the value there of one and we can also specify the cool down in minutes but we'll leave it at that. We've specified a scaling rule based on the average memory usage on our instance. And we click OK. And we're done.
The instance size determines the processing power, storage, and memory available to your web app. It also defines the feature available such as the deployment slots, the maximum number of instances, custom domains, automated backups among other things. The choice to scale up or down depends largely on your resource and feature requirements. Higher costs are associated with the more feature-rich and powerful instances so it's best to analyze the resources and features required by your app.
It's important to note that when you change instance size you are changing the instance size for the app service plan rather than the individual web app. The change effects all web apps under that plan. Let's see in the web portal how we can change the instance size.
Changing the instance size, also known as scaling up or down is as simple as selecting the pricing tier that we did when we created our app for the first time. We navigate to the scale up blade and select the new tier that we're after. The change will take a moment to take effect and that's all we need to do.
In a real-world scenario you may have multiple instances of your web app. You may choose to have these instances or endpoints in different data centers around the globe to provide a fast and reliable experience for your users. You may be accessing your web app from a variety of locations. When a user sends a request to your web app you may want that request to be handled by the closet endpoint to that user. You may also want them directed to another instance if that particular endpoint is down or having issues.
The Azure Traffic Manager provides domain name system or DNS-level traffic management which we'll discuss in detail later. The job of the Traffic Manager is to direct users to the most appropriate instance based on the set of configurable rules. The Traffic Manager also performs active endpoint monitoring to detect which endpoints are operational and can handle requests. Meaning that the Traffic Manager can avoid sending clients to endpoints which might be down or having issues.
It's important to note that traffic does not flow through the Traffic Manager. The Traffic Manager performs DNS-level redirection. In simple terms, this means that the redirection only happens when the user's client is seeking the IP address-issued connector to reach your web app. Once connected, traffic between the client and your web app does not go through the Traffic Manager.
Let's work for an example, the user wishes to access your web app at www.mywebapp.com. The user's browser sends a DNS request from myapp.com your DNS configuration has a CNAME record for the www sub domain of mywebapp.com that directs the user to mywebapp.trafficmanager.net. When the user's browser sends the DNS request for mywebapp.trafficmanager.net the Traffic Manager handles this request and determines the best endpoint to send the user to by running through the configured rules. In this case, the Traffic Manager will respond with the domain name of the specific endpoint which in our example, is mywebapp-uswest.azurewebsites.net. The user's browser sends a DNS request from myweb-appuswest.azurewebsites.net which will provide it with the IP address it should connect to.
Let's go through this again with a diagram. This diagram helps us visualize the flow of events. Note that it is simplified for clarity and the domain name to IP address lookups has been removed. The user requests the address of mywebapp.com. The DNS server responds with the CNAME record pointing the user to mywebapp.trafficmanager.net. The user queries Traffic Manager for mywebapp.trafficmanager.net. The Traffic Manager evaluates the configuration rules to determine the best endpoint to redirect the user to. Traffic Manager replies with an endpoint domain mywebapp-uswest.azurewebsites.net. Having resolved the IP address of mywebapp-uswest.azurewebsites.net the user connects to the web app. A very important note is that once the client resolves the IP address that you should connect to it will cache that value for some time. The amount of time that the client caches the IP address is based on the time to live, or TTL. That is configurable and provided by the DNS server along with the response. Until the TTL expires, the client will continue to send requests to that endpoint it initially connected to regardless of changing network or endpoint conditions. This is a key limitation of the Traffic Manager DNS-level approach.
The TTL is a balancing act between performance and cost. With a higher TTL, clients won't need to make as many DNS requests, reducing the delay when setting up a connection. The DNS servers, as well as the Traffic Manager will also have to handle fewer requests, reducing costs. However, with a lower TTL, clients can be redirected more quickly if required. However, the volume of Traffic Manager requests will increase, increasing cost as well as DNS server load.
Lastly, the Traffic Manager offers several load-balancing methods. Each method can define it's own TTL, list of endpoints and monitoring configuration. The key difference between those methods is how they effect that Traffic Manager's choice of endpoints when responding to the user. The first method is failover. Using this method, the Traffic Manager will return the first endpoint from an ordered list where that endpoint is deemed healthy based on the Traffic Manager's endpoint monitoring. The second method is round robin. As the name suggests, the traffic monitor returns endpoints from the list, trying to balance the requests out evenly in a round robin fashion. The third method is performance. This is the method that's concerned with selecting the closest endpoint to the client. The endpoint with the lowest latency. To achieve this the Traffic Manager makes use of Azure data center latency and monitoring data that describes the latency to various parts of the world. The Traffic Manager will then use this data as a lookup to find the data center with the lowest latency to the client. It should be noted that all of these methods have a failover component. As the Traffic Manager will not select an endpoint if that endpoint is detected as being down. Regardless of the load balancing method.
Let's go to the Azure portal and see how we can configure Traffic Manager now. I'm here on the classic portal. Let's have a look at how we can create and configure Traffic Manager for our web app. Okay, let's scroll down on this menu on the left until we can find Traffic Manager. There's Traffic Manager. Select that and we now need to create a Traffic Manager profile. So click on that, and we can create a name for it so we can say "testcatm" for Traffic Manager. Click create, 'cause we will create that based on the performance load balancing. Now we can select and then configure our endpoints. Let's add an endpoint and we'll add a web app. For this one, we'll just select one of these. We'll select the new app ca and click OK. We can now see our endpoint. You can see that it's checking our endpoint making sure that it's functioning. This is the Traffic Manager doing the endpoint monitoring that lets it detect when the endpoint has failed. We can now click on the configure tab. The configure tab lets us configure some of the options we've discussed including the time to live the load balancing method, so we've selected performance but it also has round robin and failover. That's it. We've now configured Traffic Manager for our web app.
In a real-world scenario, we'd repeat the steps and add multiple endpoints. For example, we might have instances of our web app deployed around the world. We'd add endpoints for each of these along with the load balancing method appropriate to our strategy.
We covered the topics of auto-scaling your web apps based on pre-configured thresholds or on a schedule. We discussed changing instance sizes to match your web app's resource and feature needs. Lastly, we looked at the Traffic Manager and DNS-level traffic management as well as the available traffic management strategies.
Stay tuned for the next section where we can see how we can design and implement applications for scale and resilience.
About the Author
Isaac has been using Microsoft Azure for several years now, working across the various aspects of the service for a variety of customers and systems. He’s a Microsoft MVP and a Microsoft Azure Insider, as well as a proponent of functional programming, in particular F#. As a software developer by trade, he’s a big fan of platform services that allow developers to focus on delivering business value.