High availability disaster recovery (HADR) is an integral part of any organization’s data strategy. It is also an essential string to every DBA's bow. In an online world that operates 24 hours a day, going offline or losing customers' data cannot be tolerated. This course examines the features that Azure provides to help you make sure your SQL databases, whether they are managed in the cloud or on-premise, are not the point of failure in your systems.
High availability disaster recovery encompasses two fundamental concepts. Firstly, how to minimize the amount of time your databases will be offline in the event of unforeseen events like hardware failures, power outages, or any number of natural disasters. Secondly, it looks at how to minimize data loss when any of these events occur. It is one thing to be offline, but it is another thing to lose data that is already in your custody.
In this course, we will cover the full range of Azure options from turnkey to custom and do-it-yourself solutions. If you have any feedback relating to this course, feel free to reach out to us at support@cloudacademy.com.
Learning Objectives
- Understand the concepts and elements of high availability and disaster recovery
- Learn about hyperscale databases and how they are used
- Learn how to combine on-premise with the cloud to replicate and sync your data
- Understand what always on availability groups are and how they operate
- Implement geo-replication and failover groups
- Learn about the wide range of options you have for backing up your databases with Azure
Intended Audience
This course is intended for anyone who wants to implement high availability and disaster recovery procedures in place for their Azure SQL databases.
Prerequisites
This is an intermediate to advanced course and so to get the most out of it, you should be familiar with SQL Server management studio, database operations like backup and restore, and T-SQL, and also have some familiarity with basic active directory concepts like users and permissions.
Perhaps it’s easier to start by looking at what the ideal situation is when it comes to a production environment. Obviously, that is for nothing to go wrong and our data is available 24 by 7. Well for many reasons from human error to natural disasters this isn’t possible, or should I say it’s unlikely. So, what mechanisms need to be put in place such that we can achieve close to this ideal, or at least mimic it.
As with everything in life it comes down to a trade-off between cost and outcome. If money is not the chief consideration, then 100% uptime is feasible. Of course, money is always a consideration and maybe the business does not require 100% availability at all times.
There are a few concepts that will help us quantify our data availability needs. Firstly, we have SLA, which is a service level agreement. In a cloud environment, this is what your provider commits to in terms of ensuring the availability of their service. When you are providing a service to customers this is what you guarantee to them in terms of your service availability. SLAs are great for understanding each party’s expectations and contractual obligations and help frame the disaster recovery plan but are of little use when things go badly wrong, that is failures outside the plan - the lawyers can fight it out later.
Two other concepts, RTO, recovery time objective, and RPO, recovery point objective are far more useful in determining how to implement a disaster recovery plan. Recovery time objective is the amount of time the business can tolerate for a system to be down. That is the time from when the system went off-line to when it became 100% functional again. This will vary from system to system. For example, e-commerce systems will typically only tolerate a few seconds of downtime while a payroll system might not have such stringent requirements.
Recovery point objective refers to a business’s tolerance for data loss after a failure. RTO and RPO may seem like similar things. After all, the more time a system is down would equate to more data loss. However, when talking about databases, when there is a failure RPO relates to the amount of information that has been added to, or changed in the database since the last backup that could potentially be lost. In this case, a backup refers to full, differential, and log backups.
So, RTO/RPO spans the full spectrum from low availability where a full backup, which is potentially up to a week-old is manually restored through to synchronized database mirroring with automatic failover. The first scenario of manual backup and restoration would be described as low availability and minimal disaster recovery and couldn’t really be described as a strategy.
Lectures
Course Introduction - High Availability and Disaster Recovery Concepts - Hyperscale - Combining On-Premises with the Cloud - Always On Availability Groups - Failover in the Cloud - Database Backups - Course Summary
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.