Azure Compute Infrastructure
The course is part of these learning pathsSee 3 more
Microsoft Azure offers services for a wide variety of compute-related needs, including traditional compute resources like virtual machines, as well as serverless and container-based services. In this course, you will learn how to design a compute infrastructure using the appropriate Azure services.
Some of the highlights include:
- Designing highly available implementations using fault domains, update domains, availability sets, scale sets, availability zones, and multi-region deployments
- Ensuring business continuity and disaster recovery using Azure Backup, System Center DPM, and Azure Recovery Services
- Creating event-driven functions in a serverless environment using Azure Functions and Azure Log Apps
- Designing microservices-based applications using Azure Container Service, which supports Kubernetes, and Azure Service Fabric, which is Microsoft’s proprietary container orchestrator
- Deploying high-performance web applications with autoscaling using Azure App Service
- Managing and securing APIs using Azure API Management and Azure Active Directory
- Running compute-intensive jobs on clusters of servers using Azure Batch and Azure Batch AI
- Design Azure solutions using virtual machines, serverless computing, and microservices
- Design web solutions using Azure App Service
- Run compute-intensive applications using Azure Batch
- People who want to become Azure cloud architects
- People preparing for a Microsoft Azure certification exam
- General knowledge of IT architecture
Virtual machines have gone from being revolutionary to being a standard part of nearly every organization’s infrastructure. Now containers are the revolutionary technology, but VMs are still very important. Virtual machines give you full control over not only the software that you want to run but also the operating system. This is especially useful when you’re migrating existing servers to the cloud.
When you use VMs to run mission-critical applications, you need to architect the solution so that it will keep running even if there’s a hardware failure, a system update, or a spike in demand.
In order to understand how to implement high availability for VMs, it's important to consider the different causes for a system to become unavailable. There are two Azure related causes for a VM to become unavailable, which are planned and unplanned maintenance. Planned maintenance is where Microsoft makes some sort of update to the Azure platform. This could be any sort of improvement. For example, it could be security patches, performance fixes, or anything like that.
A lot of the time you won't notice that anything has even happened. However, there will be times when your VM will need to be rebooted, such as when Microsoft patches the hypervisor that's running the VM. Now, unplanned maintenance, as the name suggests, happens due to some unexpected event, such as a power failure, a hardware failure, or a network failure.
Because of these two possible causes for outages, Azure has the concept of update domains and fault domains. These correspond to the two types of outages I just mentioned. To ensure that your solutions remain available during a planned maintenance event, you can use update domains. Update domains put virtual machines into groups where the VMs can be rebooted at the same time.
To help with hardware failures, Azure offers fault domains. A fault domain represents a shared power source, storage, and network switch. The maximum number of fault domains you can use depends on the region where your VMs reside. In many regions, the maximum is 3, but in some, it’s only 2. Now because your VMs are provisioned in 2 or 3 separate groups, if there’s a failure in one fault domain, the VMs in the other fault domains will still be available.
The way that you use fault and update domains is through an availability set. The availability set allows you to change the number of fault and update domains. In order to ensure high availability and have an SLA of at least 99.95 percent, you need to have at least two VMs, at least two fault domains, and at least two update domains. The default number of update domains is 5, but you can set it to anything between 2 and 20. Azure will automatically alternate the distribution of the VMs into the fault and update domains.
Here's an example, if you have two fault domains and four update domains, then the first VM will be created in fault domain one and update domain one. The next VM will be in fault domain two and update domain two. The third VM is going to be in fault domain one and update domain three. The fourth VM will be in fault domain two and update domain four. And then if you created a fifth VM, it would belong to fault domain one and update domain one.
Notice how it loops through the fault and update domains to evenly distribute the virtual machines. When you plan out the design of highly available applications, it's common to create an availability set for each application tier (typically one for the web layer and one for the business layer because the database layer is often handled differently).
When it comes time to create an availability set, you'll need to determine how many fault and update domains to use for your application based on its expected usage. Using availability sets will help to ensure that your application has a high level of availability.
They won’t protect against a data center failure, though. That’s where availability zones come in. They’re an alternative to availability sets. An availability zone is a physically separate zone within an Azure region. So, if one zone goes down, then the other zones will likely still be up. Not every region offers availability zones, but in the ones that do, there are always three of them.
To take advantage of this capability, you should deploy multiple replicas of your application’s VMs in different availability zones. You can specify a particular availability zone when you create each VM.
However, even this level of redundancy won’t protect you against an outage that affects an entire region. Regional outages aren't common, but they can and do happen. So, if you need higher availability than a single region, you'll have to use multi-region deployment.
When it comes to multi-region deployments, there are different options for how you might configure things depending on your availability requirements and your budget.
If you need an extremely high level of availability, then you can use an active/passive model with hot standby. With this approach, you have another version of your solution running in a secondary region and it doesn't serve up any traffic unless there's a failure in the primary region.
A variation on that is the active/active model with geo-location based request routing. This is similar to the previous option, but the solution that's running in the secondary region is actively serving up requests to the users who are closer to that region than the primary.
Then there's the active/passive model with cold standby, which basically means that there's not a solution running in a secondary region. Instead, it's dynamically created when the first region is unavailable. This is a great option if you want to balance the cost versus the SLA. The switchover is not going to be immediate, but with a well-defined automation plan, this is a viable option.
Another reason to deploy multiple VMs is for scaling purposes. Of course, you can scale individual VMs quite easily by switching them to a larger size, but there are limits to that approach. This is known as vertical scaling. Horizontal scaling, on the other hand, is when you scale by adding more VMs. It’s more complicated to scale horizontally, though.
First, you need to architect your application so it can run across multiple identical machines. Ideally, you should make that tier of your application stateless. That is, the VMs should not store any data locally. Otherwise, the application wouldn’t scale well because client requests would be tied to particular VMs. So the application should save data in a shared external datastore.
Next, you need to create a scale set. This is similar to an availability set because the VMs are distributed across fault and update domains, but you can do a lot more with it.
You can configure a scale set to automatically increase or decrease the number of VMs in it according to rules you define. For example, you can create a rule that says if the average CPU usage goes above 80%, then add 3 more VMs. You can also use disk and network metrics in your rules. If you need to scale based on guest OS metrics, such as available memory or number of processes, then you can enable the diagnostics extension. This will even let you use custom metrics based on something specific in your application logs.
You’ll also usually want to set limits on how far a scale set can scale up or down by configuring a maximum and minimum number of VMs. Considering that a scale set can have up to 1,000 VMs in it, setting a maximum is a good idea. By the way, if you’re using custom VM images rather than Azure’s standard images, then the maximum is 600 VMs per scale set.
If you have a web application that can run in containers, then there’s another autoscaling option. It’s called Web App for Containers. This service can pull an image from Docker Hub, Azure Container Registry, or a private registry, and deploy it for you. It will take care of OS patching as well. If you tell it to use multiple containers, then it’ll handle load balancing automatically.
You can configure autoscaling in a similar way to scale sets except that it adds or removes containers instead of VMs. Like with scale sets, you define rules based on metrics such as CPU or memory percentage, but since this service hosts web applications, you can also scale based on HTTP queue length.
I’ll go into more depth about containers and web apps later in the course.
Azure Cloud Services sounds like an umbrella term for all of Azure’s services, but it’s actually a specific offering. Cloud Services provides managed VM hosting. That is, you just tell it how many VMs you need and it will provision them and maintain their operating systems. It offers two types of VMs: web roles and worker roles.
It’s a bit confusing calling a VM a role, but this refers to what role the VM performs in your application. A web role is a VM running the IIS web server and a worker role is a VM that’s not running IIS.
Azure Cloud Services is essentially a legacy product. If you want to run a web application, then it’s usually easier to run it with Azure Web Apps. The only reason you would consider using Cloud Services instead is if you need remote access to your VMs or you need to install custom software on them, which is something you can’t do with Web Apps. However, if you need this level of control, then you’d probably be better off creating VMs in a scale set. This would give you the same high availability and autoscaling as Cloud Services. This would be a good solution instead of both web roles and worker roles. The only disadvantage to using VMs directly is that you would have to take care of OS updates, but even that won’t be a problem for long, because Microsoft has introduced a preview of the automatic OS image upgrade feature for scale sets.
If you have any VMs that will be running for at least a year, then you can save a lot of money by using reserved VM instances. You have to prepay for either one or three years, but it’s up to 72% cheaper than pay-as-you-go pricing.
There may be times when you need to have faster networking between the VMs in a virtual network. The solution is to enable Accelerated Networking on the network interfaces of the VMs. This allows the VMs to bypass the virtual switch when they communicate with each other. All of the policy enforcement functions that would normally be performed by the virtual switch are instead performed in hardware.
To work properly, accelerated networking needs to be enabled on both VMs that are communicating with each other. These VMs also need to be in the same virtual network. Otherwise, their network performance won’t see any improvement.
Another use case is providing VMs for software development and testing. Azure DevTest Labs, which is a subset of Azure Lab Services, makes it easy to spin up non-production environments. You could do this with ARM templates, of course, but DevTest Labs gives you some extra capabilities, such as allowing administrators to control costs by setting limits on how many VMs can be deployed at once and ensuring that VMs are shut down when they’re not in use
To provision a VM, you can select a base image that has all of the necessary tools pre-installed. You also have the option to select a formula, which is a list of settings to use when creating the VM, such as its size and virtual network. You can also specify artifacts in a formula. This is how you can deploy your application on the VM. Artifacts are JSON files that say how to install your application.
If an administrator spins up some VMs in a shared pool, then developers and testers can “claim” them. They would do this by looking in the list of "Claimable virtual machines" and choosing one with the right configuration. Once a lab user claims a VM, then no-one else can use it. When the user is done with the VM, then they can unclaim it and put it back in the shared pool.
And that’s it for virtual machines.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).