In this course, we discuss planning for data recovery, including disaster recovery of SAP workloads in AWS. We present and discuss some of the design and best practices gathered by AWS customers, AWS Experts, and SAP Specialists running SAP workloads on AWS.
Learning Objectives
We introduce best practices for business continuity and disaster recovery related to SAP workloads on AWS. The recommendations are aligned with the Reliability pillar of the Well-Architected Framework and focus on planning for data protection and recovery of SAP solutions implemented using AWS services.
Intended Audience
This course is intended for SAP architects and SAP Operators who deploy and maintain SAP workloads on AWS. This course also aligns with the objectives of the AWS Certified: SAP on AWS Specialty (PAS-C01) exam.
Prerequisites
To get the most from this course, you will need to meet the requirements for the AWS Solutions Architect Associate or AWS SysOps Associate certifications or the equivalent experience. This includes the function, anatomy, and operation of core AWS services that are relevant to SAP implementations, such as:
- The AWS global infrastructure, Amazon VPCs, Amazon EC2, EBS, EFS, S3, Glacier, IAM, CloudWatch, CloudTrail, the AWS CLI, Amazon Route 53
- The Well-Architected Framework
It is also assumed that you are familiar with SAP software workloads and their implementation. SAP is well known for enterprise resource planning (ERP) applications, including SAP Business Suite, SAP Net weaver, SAP S/4HANA solutions, and supporting products.
General Backup and Recovery in AWS. A big part of reliability is about backing up all your data to durable storage as well as implement recovery procedures according to your pre-defined Recovery Time Objective or RTO, Recovery Point Objectives or RPO, and Mean Time To Recovery or MTTR metrics. The main guiding principle is to assume everything fails, and then, design backwards with enough redundancy to eliminate single points of failure that can bring down an entire system.
Your applications should continue to deliver results even if the underlying infrastructure fails or is replaced. It is important to define requirements around your expected Recovery Time Objective, Recovery Point Objective, and Mean Time To Recovery metrics. RTO answers the question of how quickly your systems must recover. It can be measured in days, hours, minutes, seconds, or even fractions of a second. RPO answers the question of how much data can you afford to lose.
In other words, for how long can data collection stop on your systems and not impact your business severely? Clearly, the recovery point objective is going to be a part of the total recovery time objective. The Mean Time To Recovery, as the name suggests, represents the mean amount of time for the recovery of your systems after a major failure. These expectations in terms of metrics will dictate how you need to invest in your architecture to meet them.
The architecture, backup schedules, frequency, and data retention periods are dictated by the RTO, RPO, and MTTR metrics. Recoverability is often an item that requires improvement. In the event of a natural disaster, some components can become unavailable or as a result, your primary data source becomes unavailable. In this case, you need to be able to restore service quickly and without losing data. The most important detail is to test your recovery procedures before you actually meet them in a real situation. The Mean Time To Recovery metric is obtained during testing or "game day" scenarios. You will want to make sure to schedule and perform enough tests to have a reliable metric.
The guiding principle is to: "Test beyond destruction to make sure recovery procedures are automatic, successful, and as expected." Backup and Recovery of SAP workloads on AWS assumes that you are familiar with implementing and operating SAP solutions, including familiarity with the general SAP backup and restore recommendations as explained in the SAP technical operations manual. The most significant change in backing up an SAP system on AWS compared to a traditional implementation, is the backup destination. AWS uses Amazon S3 instead of tape as the final resting place for your datasets.
As such, by leveraging S3 for storage, you get a highly durable storage designed to provide 11 9s of durability and by default, 4 9s of availability over a given year. The levels of availability are represented as shown on your screen. Notice, 1 nine of availability represents about 90% of uptime over a given year, a maximum amount of about 36.5 days per year, and the equivalent downtime per day of about 2.4 hours. 4 nines of availability represents about maximum downtime of about 52.6 minutes across a year or 8.6 seconds less than 10 seconds equivalent downtime per day. The gold standard in terms of availability is 5 nines of availability; 99.999%.
And this equates to a maximum downtime per year of about 5.25 minutes and an equivalent downtime per day of less than a second, 0.86 seconds. All SAP on AWS backup and restore strategies rely on Amazon S3 as the storage service. Using S3 fills the offsite storage requirement automatically. There are rudimentary ways for you to use S3 for backups. The first is to store data directly, the second entails backing up your data indirectly by using any of the mechanisms built into some AWS services like EBS snapshots, for example, or use the AWS backup service or even third party backup solutions that are able to read and write to Amazon S3 as the final resting place for your datasets. AWS allows for High Availability deployments across multiple availability zones in the same region and across different regions. You can actually use High Availability and Disaster Recovery using multiple availability zones and regions any time you desire. Amazon S3 is able to support both type of topologies.

Experienced in architecture and delivery of cloud-based solutions, the development, and delivery of technical training, defining requirements, use cases, and validating architectures for results. Excellent leadership, communication, and presentation skills with attention to details. Hands-on administration/development experience with the ability to mentor and train current & emerging technologies, (Cloud, ML, IoT, Microservices, Big Data & Analytics).