1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Solution Architect Professional for AWS - Domain One - High Availability, Scalability and Business Continuity

Designing a back-up and recovery solution


Advanced High Availability
Setting the Scene
43m 44s
Start course
3h 31m

Course Description

In this course, you'll gain a solid understanding of the key concepts for Domains One and Seven of the AWS Solutions Architect Professional certification: High Availability, Scalability and Business Continuity. 

Course Objectives

By the end of this course, you'll have the tools and knowledge you need to successfully accomplish the following requirements for this domain, including:

  • Demonstrate ability to architect the appropriate level of availability based on stakeholder requirements.
  • Demonstrate ability to implement DR for systems based on RPO and RTO.
  • Determine appropriate use of multi-Availability Zones vs. multi-Region architectures.
  • Demonstrate ability to implement self-healing capabilities.
  • Demonstrate ability to implement the most appropriate data storage scaling architecture
  • High Availability vs. Fault Tolerance.
  • Scalability and Elasticity.

Intended Audience

This course is intended for students seeking to acquire the AWS Solutions Architect Professional certification. It is necessary to have acquired the Associate level of this certification. You should also have at least two years of real-world experience developing AWS architectures. 


As stated previously, you will need to have completed the AWS Solutions Architect Associate certification, and we recommend reviewing the relevant learning path in order to be well-prepared for the material in this one. 

This Course Includes

  • 1 hour and 13 minutes of high-definition video.
  • Expert-led instruction and exploration of important concepts. 
  • Coverage of critical concepts for Domain one and Domain Seven of the AWS Solutions Architect - Professional certification exam. 

What You Will Learn

  1. Designing a back-up and recovery solution.
  2. Implementing DR based on RTO/ RPO.
  3. RDS back up and restore and self healing capabilities.
  4. Points to remember for the exam.

When you develop a strategy for backing up and restoring data, you need to first identify the failure or disaster situations that can occur and their potential business impact. In some situations, you also need to consider requirements for data security, privacy and record retention. You should implement backup processes that will offer the appropriate level of granularity to meet the RTO and RPO requirements of the business. The areas you need to consider include file-level recovery so how would you back up and restore versions of files, volume-level recovery what type of mirroring or striping do you need to enable recoveries and archives, application-level recovery and you can include your database strategy in here and image-level recovery. When you perform a backup, it's best to have the system in a state where it's not performing any IO. In the ideal case, the machine isn't accepting traffic. So for this reason, you need to stop or acquiesce the file system or database in order to make it a clean backup. The best way to do this depends on your database or file system, of course. The process is generally for databases if possible, putting the database into hot backup mode, running the Amazon EBS Snapshot commands, take the database out of the hot backup mode or if you're using a read replica, terminate the read replica instance. The process for a file system is similar but depends on the capabilities of the operating system or file system itself. If your file system doesn't support the ability to freeze, you should unmount it, issue the snapshot command and then remount the file system. You could also use a logical volume manager that supports the freezing of IO. Because the snapshot process continues in the background and the creation of the snapshot is fast to execute and captures a point in time, the volumes you are backing up only need to be unmounted for a matter of seconds generally. Because the backup window is as small as possible, the outage time is predictable and can be scheduled. EBS provides the ability to create snapshots of any Amazon EBS volume. It takes a copy of the volume and places it in Amazon S3 where it's stored redundantly in multiple availability zones. The first snapshot is a full copy of the volume then ongoing snapshots store incremental block-level changes only so this is a fast and reliable way to restore full volume data. If you only need a partial restore, you can attach the volume to the running instance under a different device name, mount it and then use operating system copy commands to copy the data from the backup volume to the production volume. Amazon EBS Snapshots can also be copied between AWS regions using the Amazon EBS Snapshot copy command. You can use this feature to store your backup in other region without having to manage the underlying replication technology. A few points to keep in mind about EBS Snapshots. If you make periodic snapchats of a volume, the snapchats are incremental so that only the blocks on the device that have changed after your last snapshot are saved to the new snapshot. Snapshots occur asynchronously. The point-in-time snapshot is created immediately but the status of the snapshot is pending until the snapshot is complete. There's a limit of five pending snapshots for a single volume. If you receive a ConcurrentSnapshotLimitExceeded error or have a question about it while trying to create multiple concurrent snapshots of the same volume, wait for one or more of the pending snapshots to complete before creating another snapshot on that volume. Although you can take a snapshot of a volume while a previous snapshot of that volume is in the pending status, having multiple pending snapshots of a volume may result in reduced performance until a snapshot is complete. Snapshots that are taken from encrypted volumes are automatically encrypted and volumes that are created from encrypted snapshots are also automatically encrypted. The data in your encrypted volumes and any associated snapshots is protected both at rest and in motion. By default, only you can create volumes from snapshots that you own. However, you can share your unencrypted snapshots with specific AWS accounts. For others to use your shared encrypted snapshot, you must also share your CMK key that was used to encrypt the volume. Another option for sharing is to copy the contents to a non-encrypted volume and make a snapshot of that unencrypted volume and share that. You can take a snapshot of an attached volume that's in use. However, snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued. Now, this might exclude data that has been cached by any applications or even by the operating system so our best practice is to pause any file rights to the volume long enough to take that snapshot. Now, if you can't pause or file rights to the volume, you should unmount the volume from within the instance, issue the snapshot command and then remount the volume to ensure a consistent and complete snapshot. You can remount and use your volume while the snapshot status is pending. To create a snapshot for Amazon EBS volumes that service root devices, you should stop the instance before taking the snapshot. AWS stores system images as Amazon machine images. These images consist of a template for the root volume required to launch an instance. You can use the AWS Management Console or the AWS EC2 create image command to back up the root image as an AMI. When you register an AMI, it is stored in your account using Amazon EBS Snapshots. These snapshots reside in Amazon S3 and so are replicated across AZs in the region which makes them highly durable. You can then use the AMI to recreate the instance or launch more copies of that instance. You can also copy AMIs from one region to another for application migration or disaster recovery plans. When you need to preserve data or for compliance or corporate reasons, you generally have to archive it. Unlike backups which are usually performed to keep a copy of their production data for a short duration to recover from data corruption or data loss, archiving maintains all copies of data until the retention policy expires. So a good archive needs to be data durable for long-term integrity. It needs to be data secure and it needs to have ease of recoverability. Now, low cost immutable data stores can be another regulatory or compliance requirement. Amazon Glacier provides archives at low cost with native encryption of data at rest. You've got 11 9's of durability and unlimited capacity. Amazon S3 standard and frequent access is a good choice for use cases that require the quick retrieval of data and Amazon Glacier is a good choice for use cases where data is infrequently accessed and the retrieval times of say several hours are going to be acceptable. Objects can be tiered into Amazon Glacier either through life cycle rules in S3 or the Amazon Glacier API. The Amazon Glacier Vault Lock feature allows you to easily deploy and enforce compliance controls for individual Amazon Glacier vaults with a Vault Lock policy. You can specify controls such as write once, read many or WORM in a vault lock policy and lock the policy from future edits. All right, some of the AWS tools that we can use. First one, AWS import and export. So AWS import/export accelerates moving large amounts of data in and out of AWS by using portable storage devices for transport. So AWS import/export bypasses the internet and transfers your data directly onto and off of storage devices by means of high speed internal networks at Amazon. So for data sets of large size, AWS import/export is often faster than internet transfer and more cost effective than upgrading your connectivity and you can use AWS import/export to migrate data in and out of Amazon S3 buckets and Amazon Glacier vaults or into Amazon EBS Snapshots. So in backup and recovery modes, it's a perfect way of being able to move data offsite and back onsite quickly when you need to. And AWS Import/Export Snowball is a fantastic device, that you literally get shipped to you, you put the data back onto it and then you ship it back. Another tool is AWS Storage Gateway. So AWS Storage Gateway is a service that connects an on-premise software appliance with cloud-based storage to provide seamless and highly secure integration between your on-premise IT environment and the storage infrastructure of AWS. AWS Storage Gateway supports three different configurations. First, Gateway Cached Volumes where you can store your primary data in Amazon S3 and retain your frequently-accessed data locally. Now, the Gateway Cached Volumes provide substantial cost savings on primary storage and they minimize the need to scour your storage on-premise and they retain low latency access to your frequently accessed data. The second option is Gateway Stored Volumes. Now that's good in the event where you need low latency access to your entire data set and you can configure your Gateway to store your primary data locally and asynchronously back up point in time snapshots of this data to Amazon S3. So Gateway Stored Volumes provide durable and inexpensive offsite backups that you can recover locally or from Amazon EC2 if for example you need replacement capacity for disaster recovery. Now, the third option with Storage Gateway is Virtual Tape Libraries or Gateway VTL and with Gateway VTL, you can have an almost limitless collection of virtual tapes that are stored in the virtual tape library so it feels and looks like a virtual tape library to you and your users. All three of these options can be met as iSCSI drives so it's seamless to the end user. It can be set up from the AWS Console. And with the Gateway VTL, Virtual Tape Libraries, you can also archive those to Amazon Glacier. So all three are very effective for backup and recovery and disaster recovery scenarios. So let's assume that you are managing an environment where you're backing up Amazon EC2 instances, standalone servers, virtual machines and databases. The environment has say 800 servers and you back up the operating system, file data, virtual machine images and databases. Let's say we've got 25 databases which can be a mixture of MySQL, Microsoft SQL Server and Oracle. Your backup software has agents that back up operating systems, virtual machine images, data volumes, SQL Server databases and Oracle databases using the RMAN command. For those applications like MySQL that your backup software doesn't have an agent for, you can use the mysqldump client utility to create a database dump file where standard backup agents can then protect that data. So protect this environment, your third-party backup software most likely will have a global catalog server or master server that controls the backup archive and restore activities. Now, my backup vendor support Amazon S3 or Amazon Glacier and you can use an AWS Storage Gateway virtual appliance to bridge the gap because it uses generic techniques such as iSCSI-based volumes and Virtual Tape Libraries or VTLs. AWS Storage Gateway supports the following Hypervisor versions: VMware ESXi Hypervisor, the Microsoft Hypervisor or an EC2 instance. Now, the AWS Storage Gateway provides an Amazon machine image that contains the Gateway VM image but only Gateway Cached Volumes and Gateway VTLs can be deployed on the Amazon EC2 instance. Now, the Gateway VM will need local disks and which you need to allocate for two purposes. First, as a cache storage and the cache storage access a durable store for data that is waiting to upload to Amazon S3 from the upload buffer. So if your application reads data from a virtual tape, the Gateway saves the data to a cache storage. The Gateway stores recently accessed data in the cache storage for low latency access. So if your application requests tape data, the Gateway first checks the cache storage for the data before downloading the data from AWS. The other usage for the disk store is the upload buffer and the upload buffer provides a staging area for the Gateway before it uploads the data to any virtual tape. The upload buffer is also critical for creating recovery points that you can use to recover tapes from unexpected failures. So as your backup application writes data to your Gateway, the Gateway copies data to both the cache storage and the upload buffer before acknowledging completion of the right operation to your backup application. Just a couple of things that can catch you out there. I'm gonna have to explain how to set up a Storage Gateway, but there are often questions about what the differences are between the Gateway cached and Gateway stored.

About the Author
Andrew Larkin
Head of Content
Learning Paths

Head of Content

Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe.  His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.