Always On Availability Groups
High availability disaster recovery (HADR) is an integral part of any organization’s data strategy. It is also an essential string to every DBA's bow. In an online world that operates 24 hours a day, going offline or losing customers' data cannot be tolerated. This course examines the features that Azure provides to help you make sure your SQL databases, whether they are managed in the cloud or on-premise, are not the point of failure in your systems.
High availability disaster recovery encompasses two fundamental concepts. Firstly, how to minimize the amount of time your databases will be offline in the event of unforeseen events like hardware failures, power outages, or any number of natural disasters. Secondly, it looks at how to minimize data loss when any of these events occur. It is one thing to be offline, but it is another thing to lose data that is already in your custody.
In this course, we will cover the full range of Azure options from turnkey to custom and do-it-yourself solutions. If you have any feedback relating to this course, feel free to reach out to us at email@example.com.
- Understand the concepts and elements of high availability and disaster recovery
- Learn about hyperscale databases and how they are used
- Learn how to combine on-premise with the cloud to replicate and sync your data
- Understand what always on availability groups are and how they operate
- Implement geo-replication and failover groups
- Learn about the wide range of options you have for backing up your databases with Azure
This course is intended for anyone who wants to implement high availability and disaster recovery procedures in place for their Azure SQL databases.
This is an intermediate to advanced course and so to get the most out of it, you should be familiar with SQL Server management studio, database operations like backup and restore, and T-SQL, and also have some familiarity with basic active directory concepts like users and permissions.
There are many elements and offerings for implementing high availability disaster recovery with Azure SQL, but broadly speaking they fall into two categories. Firstly, there is the fully synchronized always-on availability groups functionality that was originally developed with SQL Server and has been migrated to the cloud with the premium and business-critical service tiers. Secondly, we have the Azure SQL and Managed Instance DR solutions such as Active Geo-replication and Failover groups.
If you want a solution that meets no compromise RTO and RPO goals, that will involve some form of the always-on availability groups, whether that’s on-premise, on an Azure virtual machine, or using premium service tiers. Having said that the reasonably new hyperscale architecture shows an exciting departure from the traditional cluster technology, by separating functionality out into storage engines that in turn access separately stored files where backups make use of file snapshots that backup the data incredibly quickly.
In any case, there is no getting away from distance and physics. Having fully synchronized replication across Azure regions or vast distances is always going to be a challenge. This is borne out by the fact that as we move down the service tiers what passes for automatic failover is not that helpful when the minimum grace period is 60 minutes.
As the number of data centers increase and regions that support availability zones also increase the cost of redundancy and replication will decrease. It is only a major issue if a complete Azure region goes off-line, and while not an impossible event would be a very rare one.
By default, database backups are automatically replicated to a paired region, so every service tier has geographic redundancy is built-in at some level. Both Azure SQL and SQL Managed Instances do allow you to replicate data across regions but it is asynchronous, and while the latency can be very low there is no guarantee that data will not be lost in the event of a catastrophic failure.
On the plus side, all of these high availability disaster recovery solutions give you the benefit of read-only replicas that are accessible during normal operation. This allows you to offload read operations like reporting and backups from the primary database, so there may actually be something as a free lunch.
If we were to put all the different technologies onto a high availability disaster recovery spectrum, at one end we would have the low-cost but time and labor expensive plain backup and restore. And at the other end always-on availability group and failover cluster type technology, which is orders of magnitude more expensive, but offering automatic and near-instantaneous recovery. In the middle we have the compromise, both active geo-replication, and failover groups rely on manual intervention in the case of a failure but only a limited amount of data could be potentially lost.
Even though automatic failover is an option, both would require manual intervention to failover in a timely fashion. Failover groups have the advantage of being supported by SQL Managed Instances and application endpoints will not have to be changed as they point to the failover group URL and will automatically connect to the new primary server.
If system downtime is not a major priority but you still want to minimize loss of data in the event of a database outage, you can achieve this with traditional backup and restore. We saw at the most minimal level of automation how you can create scheduled jobs to do full, differential, and transaction log backups with SQL agent jobs. The frequency of backups, especially transaction log backups can help meet a quite stringent recovery point objective.
Depending on database size and compute resources, that is the amount of effort it requires to do backups, full backups can be done weekly or daily. Depending on the full backup schedule differentials can range from once or twice a day to every three or four hours, and again dependent on differential backup frequency and the amount of data change, transaction log backup frequency can range from once every 60 to perhaps once a minute. It really depends on RPO requirements. SQL Server management studio has a useful wizard for conducting a point in time restore using a combination of the aforementioned backups.
In Azure SQL and SQL Managed Instance, database backups are automatically implemented and managed. There is a point in time restore function that hides the complexity of backup selection. The use of active geo-replication or failover groups will go a long way to minimize the amount of data loss in the event of a region-wide failure.
High availability disaster recovery is a bit like insurance if you want to minimize excess, that is reduce time to recovery and loss of data, you will have to pay higher premiums, so increase service level tier. We have looked at many options, some of which can be combined and others not, but all will meet your requirements once you have determined what they are.
Course Introduction - Overview - High Availability and Disaster Recovery Concepts - Hyperscale - Combining On-Premises with the Cloud - Always On Availability Groups - Failover in the Cloud - Database Backups
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.