Azure Backup and Azure Site Recovery
Azure Backup and Azure Site Recovery

Microsoft Azure offers services for a wide variety of compute-related needs, including traditional compute resources like virtual machines, as well as serverless and container-based services. In this course, you will learn how to design a compute infrastructure using the appropriate Azure services.

Some of the highlights include:

  • Designing highly available implementations using fault domains, update domains, availability sets, scale sets, availability zones, and multi-region deployments
  • Ensuring business continuity and disaster recovery using Azure Backup, System Center DPM, and Azure Recovery Services
  • Creating event-driven functions in a serverless environment using Azure Functions and Azure Log Apps
  • Designing microservices-based applications using Azure Container Service, which supports Kubernetes, and Azure Service Fabric, which is Microsoft’s proprietary container orchestrator
  • Deploying high-performance web applications with autoscaling using Azure App Service
  • Managing and securing APIs using Azure API Management and Azure Active Directory
  • Running compute-intensive jobs on clusters of servers using Azure Batch and Azure Batch AI

Learning Objectives

  • Design Azure solutions using virtual machines, serverless computing, and microservices
  • Design web solutions using Azure App Service
  • Run compute-intensive applications using Azure Batch

Intended Audience

  • People who want to become Azure cloud architects
  • People preparing for a Microsoft Azure certification exam


  • General knowledge of IT architecture

Most Azure customers also have on-premises infrastructure, so Microsoft’s cloud services usually work with local resources as well. This is especially true for business continuity and disaster recovery. For example, Azure Backup doesn’t just back up Azure VMs—it also backs up on-premises VMs and servers. What I find surprising, though, is that in some cases, it even stores the backups on-premises. That’s a pretty unusual feature for a cloud service.

As you can see, Azure Backup has a rather complicated set of components for different scenarios. I’ll go through the highlights.

Azure VM Backup is the most straightforward component. As the name implies, it only backs up Azure VMs. Once a day, it backs up each VM’s disks, including application-aware snapshots. It stores the backups in a Recovery Services vault, which is also on Azure. All three of the other components provide some mix of Azure and on-premises options.

The Azure Backup Agent supports both cloud-based and local VMs, as well as physical servers. However, it only supports Windows, and you have to install the agent on every virtual and physical machine you want backed up. It will handle files, folders, and system state, but it’s not application aware. In fact, this is the only component that’s not application aware. It stores its backups in a Recovery Services vault. Confusingly, the Azure Backup Agent is usually referred to as the Microsoft Azure Recovery Services (or MARS) agent.

System Center DPM can basically back up anything except Oracle workloads. For example, it can back up Linux VMs on Hyper-V and VMware. You have three choices for where to store your backups: a Recovery Services vault, locally attached disk, and even tape (which is an on-premises only option).

Azure Backup Server is almost the same except it doesn’t require a System Center license and it doesn’t support tape backup.

Note that both System Center DPM and Azure Backup Server use the Azure Backup Agent to send data to Azure.

To enable backups for an individual Azure VM, you just select Backup from the Operations menu for that VM and click the Enable Backup button. This is really easy, but what if you want to enable backups on all of your VMs? It would be a pain to have to do this manually for each VM, and you might forget to enable backups on all of them, especially when you add new VMs.

A much easier way is to create a Recovery Services vault and then set a backup policy there. It will list all of the VMs you have in the same region as the vault, so you can enable backups for all of them at the same time. It only applies to VMs in the same region, though, so if you use multiple regions, you’ll have to create a Recovery Services vault in each one.

Speaking of regions, if you want your Recovery Services vault to survive a regional outage, then it needs to be configured to use geo-redundant storage. Luckily, that’s the default. If you want, you can change it to locally-redundant storage to save money, but if there’s a regional outage where your vault resides, then you won’t be able to recover your data.

When you want to back up on-premises data to Azure, one potential problem is the huge quantity of data that would need to be transferred over your network connection to Azure for the initial backup. One of the best ways to deal with that problem is to use the Azure Import/Export service. This allows you to ship physical disks to Microsoft, so they can be uploaded to Azure directly.

Another potential issue is related to installing the Azure Backup Agent. During installation, you need to register the machine. This involves downloading the vault credentials into the agent and also setting an encryption passphrase. It is very important that you save this passphrase somewhere secure. When you’re restoring a backup, you need to provide the vault credentials and the passphrase. Fortunately, if you lose them, you can download the vault credentials again, and you can regenerate the passphrase

OK, let’s move on to the Site Recovery service. Its purpose is to get you up and running again as quickly as possible in the event of an outage. It does this by failing over to another location. Once again, this service handles both Azure and on-premises servers. It supports three failover scenarios: Azure to Azure, on-premises to Azure, and on-premises to secondary site.

Replicating between Azure regions is pretty straightforward, but replicating between an on-premises site and either Azure or a secondary site is more complicated, so I’ll give you an overview of those.

These scenarios are further subdivided based on whether you’re replicating physical servers, VMware servers, or Hyper-V servers. To support physical or VMware servers, you need a Configuration server that manages replication, a Process server that sends the replication data to Azure, and a master target server that handles replication data during failback. You also need to install a Mobility service on each server that needs to be replicated.

If you’re replicating to a secondary site, then you need the same components, but the Process server is in the primary site, and the Configuration and Master target servers are in the secondary site.

The architecture for Hyper-V replication is simpler. You install the Azure Site Recovery Provider and Recovery Services agent on each Hyper-V host or cluster node. If you’re using the System Center Virtual Machine Manager (or VMM), then that’s where you need to install the Site Recovery Provider. If you’re replicating to a secondary site, then you have to use VMM.

There’s a matrix of supported operating systems for the replicated machines in each of the scenarios. For physical servers, the replicated machines must be running a minimum of Windows Server 2008 R2 with at least SP1. VMware servers must be running at least vSphere6.5 or vCenter 6.5. Hyper-V servers must be running at least Windows Server 2012 R2, although guest VMs on Hyper-V only need to be running Windows Server 2008 R2 or higher.

In the event of an outage, you have 6 different options for which recovery point to failover to. Latest is the default. This would give you the lowest recovery point objective because it’s the latest recovery point. Considering that, you might be wondering why you wouldn’t always use this option. Well, it has the disadvantage that it delays the failover because it has to process all of the latest data that it received and turn it into a recovery point.

An alternative is to choose Latest processed. This ignores all of the unprocessed data, so the failover happens very quickly. This gives you a much lower recovery time objective, but a higher recovery point objective, because it doesn’t use the latest recovery point.

The next 3 options are all variations of this one since they all ignore unprocessed data. They are Latest application-consistent, Latest multi-VM processed, and Latest multi-VM application-consistent, which is a combination of the previous two. Finally, you can choose a custom recovery point.


And that’s it for this lesson.

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).