High Availability and Disaster Recovery Concepts


HADR Introduction
Course Introduction
Always On Availability Groups
Database Backups
HADR Summary
Start course
1h 6m

High availability disaster recovery (HADR) is an integral part of any organization’s data strategy. It is also an essential string to every DBA's bow. In an online world that operates 24 hours a day, going offline or losing customers' data cannot be tolerated. This course examines the features that Azure provides to help you make sure your SQL databases, whether they are managed in the cloud or on-premise, are not the point of failure in your systems.

High availability disaster recovery encompasses two fundamental concepts. Firstly, how to minimize the amount of time your databases will be offline in the event of unforeseen events like hardware failures, power outages, or any number of natural disasters. Secondly, it looks at how to minimize data loss when any of these events occur. It is one thing to be offline, but it is another thing to lose data that is already in your custody.

In this course, we will cover the full range of Azure options from turnkey to custom and do-it-yourself solutions. If you have any feedback relating to this course, feel free to reach out to us at

Learning Objectives

  • Understand the concepts and elements of high availability and disaster recovery
  • Learn about hyperscale databases and how they are used
  • Learn how to combine on-premise with the cloud to replicate and sync your data
  • Understand what always on availability groups are and how they operate
  • Implement geo-replication and failover groups
  • Learn about the wide range of options you have for backing up your databases with Azure

Intended Audience

This course is intended for anyone who wants to implement high availability and disaster recovery procedures in place for their Azure SQL databases.


This is an intermediate to advanced course and so to get the most out of it, you should be familiar with SQL Server management studio, database operations like backup and restore, and T-SQL, and also have some familiarity with basic active directory concepts like users and permissions.


As I said earlier, recovery time objective and recovery point objective do on the face of it seem similar, but they are independent of each other and can potentially be opposites. An example of this is a scenario where you have synchronous database mirroring happening within a data center and the primary server fails. Unless you have implemented an automatic cutover known as a failover to the secondary server it could be some time before your system is up and running again but there is no data loss as users are unable to use the system. On the other hand, you may have a hot-swap or automatic failover to a second system set up, but the data is only replicated to the backup system once every 24 hours. So, you could be in a situation where the end-user will not even know that the system has gone down but you could suffer a day’s worth of data loss. 

To put this in a way that we can all understand and perhaps have experienced ourselves, it’s the difference between working in Google Docs or Microsoft Word 365 with autosave and working on a document being saved to your local hard drive. In the first instance, you can have a hard drive failure where your computer will stop working, and it may be some days before its operational again but your document is as you left it, so no data loss. On the other hand, if you have been working for several hours on a document without saving it and there is a momentary power outage, then your desktop may only be off for as long as it takes to reboot but you’ve lost hours of work. Let’s next look at how you might implement high availability and disaster recovery strategies for different database environments.

Here we have a table showing Azure database service tiers with high availability disaster recovery features and their relative costs. Obviously, there are a lot of features apart from disaster recovery that differentiates these tiers like auto-scaling, but we are only interested in the DR features. Just as a point of interest the SLA of 99.99% does seem quite impressive but if you work that out over the course of the year that could potentially be 52 ½ minutes of downtime possibly in one hit which would definitely be impactful on any service you may provide. Also, in the business-critical tier when we look at Geo-replication what Microsoft says, “for 100% of deployed hours”. The assumption here is that there will be no downtime except for the 30 seconds for each failure as outlined in the recovery time objective. We will next look at what replicas, zone redundant, and Geo-replication means in the context of Azure SQL.

In the previous service tier SLA slide under the business-critical tier, we had zone redundant and not zone redundant. Before we carry on, I just want to define what that means. Azure has the concept of geography, which are areas or markets where particular legislation and legal requirements apply. This allows customers to conform to any data residency compliance obligations they might have. A region is a group of data centers that share a low latency network in relative physical proximity. An availability zone is a smaller area within a region where one or more data centers reside and are typically used to provide redundancy. Not every region has availability zones.

A highly redundant and therefore resilient configuration would be setting up your databases with replication within an availability zone, then with a failover copy within another availability zone in another region. So replication of a highly redundant architecture within another region. A DNS based load balancer called Azure Traffic Manager is used to optimally route data between Azure regions. At the moment only the business-critical tier supports zone redundancy and that is only when the Gen5 compute hardware is being used. It should also be noted that zone redundancy is not available with SQL managed instances.


Course Introduction - Overview - Hyperscale - Combining On-Premises with the Cloud - Always On Availability Groups - Failover in the Cloud - Database Backups - Course Summary


About the Author
Learning Paths

Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a  Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.