In this short course, you’ll learn about three Azure features that can be used to improve availability and scalability for virtual machines: availability sets, availability zones, and scale sets. You’ll also learn about regional pairs.
Learning Objectives
- Describe the features of availability sets
- Describe the usage of availability zones
- Describe the features of scale sets
- Describe regional pairs
Intended Audience
- People who want to learn how to improve availability and scalability for Azure virtual machines
- People preparing to take the Azure Fundamentals exam
Prerequisites
- Basic knowledge of Azure (or take our Overview of Azure Services course)
Virtual machines have gone from being revolutionary to being a standard part of nearly every organization’s infrastructure. Now containers are the revolutionary technology, but VMs are still very important. Virtual machines give you full control over not only the software that you want to run but also the operating system. This is especially useful when you’re migrating existing servers to the cloud. When you use VMs to run mission-critical applications, you need to architect the solution so that it will keep running even if there’s a hardware failure, a system update, or a spike in demand.
First, let’s look at high availability, which means that an application will continue to run even if there’s a hardware failure or another event that would normally cause the application to go down. Microsoft offers a few different services to help with high availability. The first one is called an availability set. Microsoft doesn’t recommend using availability sets anymore because they have better offerings now, but I’ll still tell you about them because you’ll likely still see references to them.
An availability set is a group of virtual machines that’s designed to handle both planned and unplanned VM downtime. Planned downtime is when Azure updates the infrastructure underlying your VMs, and this update requires a reboot of the VMs. Unplanned downtime is when a VM goes down unexpectedly, such as when it has a critical hardware failure.
To handle planned downtime, an availability set groups its VMs into what are called update domains. Azure will only perform planned maintenance on one update domain at a time. That way, while the VMs in a particular update domain are being rebooted, the VMs in the other update domains will keep running. You can configure up to 20 update domains when you create an availability set.
Unplanned downtime is handled in a similar way by using fault domains. Each fault domain has a separate power source and network switch. This limits the downtime caused by hardware failures. For example, if there’s a power failure in one fault domain, the VMs in the other fault domains should keep running because they don’t use the same power source. The maximum number of fault domains you can use depends on the region where your VMs reside. In many regions, the maximum is three, but in some, it’s only two.
Each VM is in both an update domain and a fault domain. Azure will distribute your VMs into these domains automatically. To qualify for Microsoft’s 99.95 percent uptime guarantee under its service level agreement, you need to have at least two VMs, at least two fault domains, and at least two update domains.
Using availability sets will give your applications a high level of availability, but they won’t protect against a data center failure. That’s where availability zones come in. They’re an alternative to availability sets. An availability zone is a physically separate zone within an Azure region. So, if one zone goes down, the other zones will likely stay up. Not every region offers availability zones, but in the ones that do, there are always three of them.
To take advantage of this capability, you should deploy multiple replicas of your application’s VMs in different availability zones. You can specify a particular availability zone when you create each VM.
However, even this level of redundancy won’t protect you against an outage that affects an entire region. Regional outages aren't common, but they can happen. So, if you need higher availability than a single region can provide, you'll have to use multiple regions. In most cases, it’d be sufficient to simply back up your VMs to another region. Then if the region where your VMs are deployed goes down, you can temporarily bring up replacement VMs in the second region by using your backups. If you can’t tolerate almost any downtime, then you could have VMs running in the backup region all the time.
When choosing a backup region, you should take into account regional pairs. Nearly every one of Azure’s regions is paired with another region. Some Azure services replicate their data across regional pairs if you choose certain options. For example, if you choose the geo-redundant storage option for an Azure Storage account, then your data will be replicated to the paired region.
Virtual machines don’t replicate across regional pairs, but you should still consider storing your VM backups in the paired region. That’s a best practice because Microsoft tries to ensure that at least one region in each pair is available. In the event of a multi-regional outage, Microsoft will prioritize the recovery of one region in each pair, so your safest option for a backup region is the paired region.
All right, now that we’ve covered availability, let’s move on to how you can configure your application to handle spikes in demand. This is referred to as scalability.
The simplest way to scale is to switch an individual VM to a larger size. This is known as vertical scaling. It’s easy to do, but there are limits to that approach. Horizontal scaling, on the other hand, is when you scale by adding more VMs. It’s more complicated to scale horizontally, though.
First, you need to architect your application so it can run across multiple identical machines. Ideally, you should make that tier of your application stateless. That is, the VMs should not store any data locally. Otherwise, the application wouldn’t scale well because client requests would be tied to particular VMs. So the application should save data in a shared external datastore.
Next, you need to create a scale set. This is similar to an availability set because the VMs are distributed across fault domains and update domains, but you can do a lot more with it.
You can configure a scale set to automatically increase or decrease the number of VMs in it according to rules you define. For example, you can create a rule that says if the average CPU usage goes above 80%, then add 3 more VMs. You can also use disk and network metrics in your rules. If you need to scale based on guest operating system metrics, such as available memory or number of processes, then you can enable the diagnostics extension. This will even let you use custom metrics based on something specific in your application logs.
You’ll also usually want to set limits on how far a scale set can scale up or down by configuring a maximum and minimum number of VMs. Considering that a scale set can have up to 1,000 VMs in it, setting a maximum is a good idea. Note that if you’re using custom VM images rather than Azure’s standard images, then the maximum is 600 VMs per scale set.
By default, a scale set is deployed in a single zone. This is called a zonal scale set. But for the ultimate in availability and scalability, you can deploy a scale set across availability zones. This is called a regional scale set. It evenly distributes VMs across the three availability zones in a region. This distribution happens both when you create the scale set and also when it automatically adds or removes VMs during scaling operations. As you can see, combining scale sets with availability zones gives you the best of both worlds.
And that’s it for virtual machine availability and scalability.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).