Introduction & Overview
Designing and Building an HADR for SAP Workloads
The course is part of this learning path
High availability and disaster recovery are key to ensuring reliable business continuity. While SAP workloads are mainly confined to Azure's infrastructure layer, it is still possible to utilize many Azure functions and features to enhance system reliability with relatively little effort. This course looks at when, where, and how to use Azure's built-in infrastructure redundancy to improve system resiliency and how various database high availability options are supported.
- Understand the key aspects of high availability and disaster recovery
- Learn about availability and availability zones
- Learn about Azure Site Recovery and how to implement it through the Azure portal
- Learn how to set up an internal load balancer in the context of SAP workloads
- Understand the Azure support options for Pacemaker and STONITH
- Learn how to implement Data Guard mirroring via the Azure CLI
- Set up Windows Failover Cluster and SQL Server Always On through the Azure portal
This course is intended for anyone who wants to use Azure's built-in infrastructure redundancy to enhance the reliability and resiliancy of their SAP workloads.
To get the most out of this course, you should be familiar with Azure, Azure CLI, SAP, SQL Server, and STONITH.
Before we get into specifics, I want to go over some definitions so we're all on the same page, as some of the terms sound like they could mean the same thing but are subtly different. High availability relates to a functioning system as opposed to the system's data. Redundancy, usually in the form of servers, but it can be disks or pretty much any infrastructure, is most often associated with high availability. When a server experiences a hardware failure, network traffic is re-routed to the mirror or backup server, often with users utterly unaware of the failure. The Recovery Time Objective is the time taken to switch between a primary server or system and its backup, most commonly referred to by the acronym RTO. In an ideal world, we would like this switch over time to be zero, or as it is usually expressed, 100% available. 99.99% availability sounds impressive but still translates into just under an hour's downtime annually, which could be very painful if it happened all at once during a system's maximum load. Implicit in high availability is load balancing and the ability for network traffic to be re-routed quickly and correctly in the event of a node failure.
There is the assumption that switching from a primary server to a backup means no loss of data. However, this is not the case for two reasons.
Data may be lost during the failover process, even if that is a fraction of a second.
Suppose the data from the primary system isn't replicated to the backup system in real-time. In that case, data loss equals the difference between the backup system taking over and the last time data was updated to the backup system.
Recovery Point Object, RPO, is how much data you are willing to sacrifice in the case of a system outage and roughly relates to the frequency of data replication to back-ups or a backup system. You could say RTO relates to high availability and RPO is analogous to disaster recovery. These two concepts are related but subtly and crucially different. As we'll see, guaranteeing 100% availability with no data loss is an expensive proposition, especially when large amounts of data are involved.
The system data divide is also present when we think of the different elements within an SAP system. SAP utilizes Azure's Infrastructure as a service model, meaning from an application server perspective, virtual machine replication happens agnostically, that is, Azure doesn't care what's on the VM. The same cannot be said for data replication, where database-specific tools must replicate and backup data.
High availability and disaster recovery are implemented in the geographical contexts of a data center, a group of data centers, called a zone, and Azure regions. Within a data center, you are looking to protect yourself from hardware failure, and implementing high availability and data replication is relatively inexpensive and efficient due to close proximity. As you extend system resilience to include situations where a whole data center or an Azure region goes down, costs go up, and replication efficiency goes down.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.