“You can’t predict a disaster, but you can be prepared for one!” Disaster recovery is one of the biggest challenges for infrastructure. Amazon Web Services allows us to easily tackle this challenge and ensure business continuity. In this post, we’ll take a look at what disaster recovery means, compare traditional disaster recovery versus that in the cloud, and explore essential AWS services for your disaster recovery plan.
What is Disaster Recovery?
There are several disaster scenarios that can impact your infrastructure. These include natural disasters such as an earthquake or fire, as well as those caused by human error such as unauthorized access to data, or malicious attacks.
“Any event that has a negative impact on a company’s business continuity or finances could be termed a disaster.”
In any case, it is crucial to have a tested disaster recovery plan ready. A disaster recovery plan will ensure that our application stays online no matter the circumstances.Ideally, it ensures that users will experience zero, or at worst, minimal issues while using your application.
If we’re talking about on-premise centers, a disaster recovery plan is expensive to maintain and implement. Often, such plans are insufficiently tested or poorly documented. As such, it’s adequate for protecting resources. More often than not, companies with a good disaster recovery plan aren’t capable of conducting it because it was never tested in a real environment. As a result, users cannot access the application and the company suffers significant losses.
Let’s take a closer look at some of the important terminology associated with disaster recovery:
Business Continuity. All of our applications require Business Continuity. Business Continuity ensures that an organization’s critical business functions continue to operate or recover quickly despite serious incidents.
Disaster Recovery. Disaster Recovery (DR) enables recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.
RPO and RTO. Recover Point Objective (RPO) and Recovery Time Objective (RTO) are the two most important parts of a good DR plan for our workflow. Recover Point Objective (RPO) is the maximum targeted period in which data might be lost from an IT service due to a major incident. Recovery Time Objective (RTO) is a targeted time period after which a business process must be restored after a disaster or disruption to service.
Traditional Disaster Recovery plan (on-premise)
A traditional on-premise Disaster Recovery plan often includes a fully duplicated infrastructure that is physically separate from the infrastructure that contains our production. In this case, an additional financial investment is required to cover expenses related to hardware and for maintenance and testing. When it comes to on-premise data centers, physical access to the infrastructure is often overlooked.
These are the security requirements for an on-premise data center disaster recovery infrastructure:
- Facilities to house the infrastructure, including power and cooling.
- Security to ensure the physical protection of assets.
- Suitable capacity to scale the environment.
- Support for repairing, replacing, and refreshing the infrastructure.
- Contractual agreements with an internet service provider (ISP) to provide internet connectivity that can sustain bandwidth utilization for the environment under a full load.
- Network infrastructure such as firewalls, routers, switches, and load balancers.
- Enough server capacity to run all mission-critical services. This includes storage appliances for the supporting data, and servers to run applications and backend services such as user authentication, Domain Name System (DNS), Dynamic Host Configuration Protocol (DHCP), monitoring, and alerting.
Obviously, this kind of disaster recovery plan requires large investments in building disaster recovery sites or data centers (CAPEX). In addition, storage, backup, archival and retrieval tools, and processes (OPEX) are also expensive. And, all of these processes, especially installing new equipment, take time.
An on-premise disaster recovery plan can be challenging to document, test, and verify, especially if you have multiple clients on a single infrastructure. In this scenario, all clients on this infrastructure will experience problems with performance even if only one client’s data is corrupted.
To understand how cloud storage fits in with DR and the different considerations when preparing to design a solution to backup your on-premise data to AWS, a great course to start with is Using AWS Storage for On-Premise Backup & Disaster Recovery.
Disaster Recovery plan on AWS
There are many advantages to implementing a disaster recovery plan on AWS.
Financially, we will only need to invest a small amount in advance (CAPEX), and we won’t have to worry about the physical expenses for resources (for example, hardware delivery) that we would have in on an “on-premise” data center.
AWS enables high flexibility, as we don’t need to perform a failover of the entire site in case only one part of our application isn’t working properly. Scaling is fast and easy. Most importantly, AWS allows a “pay as you use” (OPEX) model, so we don’t have to spend a lot in advance.
Also, AWS services allow us to fully automate our disaster recovery plan. This results in much easier testing, maintenance, and documentation of the DR plan itself.
This table shows the AWS service equivalents to infrastructure inside an on-premise data center.
|On-premise data center infrastructure||AWS Infrastructure|
|Web/app servers||EC2/Auto Scaling|
|AD/authentication||AD failover nodes|
|Dana centers||Availability Zones|
Essential AWS Services for Disaster Recovery
While planning and preparing a DR plan, we’ll need to think about the AWS services we can use. Also, we need to understand our selected services support data migration and durable storage. These are some of the key features and services that you should consider when creating your Disaster Recovery plan:
AWS Regions and Availability Zones – The AWS Cloud infrastructure is built around Regions and Availability Zones (“AZs”). A Region is a physical location in the world that has multiple Availability Zones. Availability Zones consist of one or more discrete data centers, each with redundant power, networking, and connectivity housed in separate facilities. These AZs allow you to operate production applications and databases that are more highly available, fault-tolerant, and scalable than would be possible from a single data center.
Amazon S3 – Provides a highly durable storage infrastructure designed for mission-critical and primary data storage. Objects are redundantly stored on multiple devices across multiple facilities within a region and are designed to provide a durability of 99.999999999% (11 9s).
Amazon Glacier – Provides extremely low-cost storage for data archiving and backup. Objects are optimized for infrequent access, for which retrieval times of several hours are adequate.
Amazon EBS – Provides the ability to create point-in-time snapshots of data volumes. You can use the snapshots as the starting point for new Amazon EBS volumes. And, you can protect your data for long-term durability because snapshots are stored within Amazon S3.
AWS Import/Export – Accelerates moving large amounts of data into and out of AWS by using portable storage devices for transport. The AWS Import/Export service bypasses the internet and transfers your data directly onto and off of storage devices using Amazon’s high-speed internal network.
AWS Storage Gateway is a service that connects an on-premise software appliance with cloud-based storage. This provides seamless, highly secure integration between your on-premise IT environment and the AWS storage infrastructure.
Amazon EC2 – Provides resizable compute capacity in the cloud. In the context of DR, the ability to rapidly create virtual machines that you can control is critical.
Amazon EC2 VM Import Connector enables you to import virtual machine images from your existing environment to Amazon EC2 instances.
Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.
Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances.
Amazon VPC allows you to provision a private, isolated section of the AWS cloud. Here, you can launch AWS resources in a virtual network that you define.
Amazon Direct Connect makes it easy to set up a dedicated network connection from your premises to AWS.
Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud.
AWS CloudFormation gives developers and systems administrators an easy way to create a collection of related AWS resources and provision them in an orderly and predictable fashion. You can create templates for your environments and deploy associated collections of resources (called a stack) as needed.
Disaster Recovery Scenarios with AWS
There are several strategies that we can use for disaster recovery of our on-premise data center using AWS infrastructure:
- Backup and Restore
- Pilot Light
- Warm Standby
Backup and Restore
The Backup and Restore a scenario is an entry-level form of disaster recovery on AWS. This approach is the most suitable one in the event that you don’t have a DR plan.
In on-premise data centers, data backup would be stored on tape. Obviously, it will take time to recover data from tapes in the event of a disaster. For Backup and Restore scenarios using AWS services, we can store our data on Amazon S3 storage, making them immediately available if a disaster occurs. If we have a large amount of data that needs to be stored on Amazon S3, ideally we would use AWS Export/Import or even AWS Snowball to store our data on S3 as soon as possible.
AWS Storage Gateway enables snapshots of your on-premise data volumes to be transparently copied into Amazon S3 for backup. You can subsequently create local volumes or Amazon EBS volumes from these snapshots.
The Backup and Restore plan is suitable for lower level business-critical applications. This is also an extremely cost-effective scenario and one that is most often used when we need backup storage. If we use a compression and de-duplication tool, we can further decrease our expenses here. For this scenario, RTO will be as long as it takes to bring up infrastructure and restore the system from backups. RPO will be the time since the last backup.
The term “Pilot Light” is often used to describe a DR scenario where a minimal version of an environment is always running in the cloud. This scenario is similar to a Backup and Restore scenario. For example, with AWS you can maintain a Pilot Light by configuring and running the most critical core elements of your system in AWS. When the time comes for recovery, you can rapidly provision a full-scale production environment around the critical core.
A Pilot Light scenario is suitable for solutions that require a lower RTO and RPO. This scenario is a mid-range cost DR solution.
A Warm Standby scenario is an expansion of the Pilot Light scenario where some services are always up and running. As we plan a DR plan, we need to identify crucial points of our on-premise infrastructure and then duplicate it inside the AWS. In most cases, we’re talking about web and app servers running on a minimum-sized fleet. Once a disaster occurs, infrastructure located on AWS takes over the traffic and performs its scaling and converting to a fully functional production environment with minimal RPO and RTO.
The Warm Standby scenario is more expensive than Backup and Restore and Pilot Light because in this case, our infrastructure is up and running on AWS. This is a suitable solution for core business-critical functions and in cases where RTO and RPO need to be measured in minutes.
The Multi-Site scenario is a solution for an infrastructure that is up and running completely on AWS as well as on an “on-premise” data center. By using the weighted route policy on Amazon Route 53 DNS, part of the traffic is redirected to the AWS infrastructure, while the other part is redirected to the on-premise infrastructure.
Data is replicated or mirrored to the AWS infrastructure.
In a disaster event, all traffic will be redirected to the AWS infrastructure. This scenario is also the most expensive option, and it presents the last step toward full migration to an AWS infrastructure. Here, RTO and RPO are very low, and this scenario is intended for critical applications that demand minimal or no downtime.
There are many options and scenarios for Disaster Recovery planning on AWS.
The scope of possibilities has been expanded further with AWS’ announcement of its strategic partnership with VMware. Thanks to this partnership, users can expand their on-premise infrastructure (virtualized using VMware tools) to AWS, and create a DR plan via resources provided by AWS using VMware tools that they are already accustomed to using.
Don’t allow any kind of disaster to take you by surprise. Be proactive and create the DR plan that best suits your needs.
New on Cloud Academy: Red Hat, Agile, OWASP Labs, Amazon SageMaker Lab, Linux Command Line Lab, SQL, Git Labs, Scrum Master, Azure Architects Lab, and Much More
Happy New Year! We hope you're ready to kick your training in overdrive in 2020 because we have a ton of new content for you. Not only do we have a bunch of new courses, hands-on labs, and lab challenges on AWS, Azure, and Google Cloud, but we also have three new courses on Red Hat, th...
Cloud Academy’s Blog Digest: Azure Best Practices, 6 Reasons You Should Get AWS Certified, Google Cloud Certification Prep, and more
Happy Holidays from Cloud Academy We hope you have a wonderful holiday season filled with family, friends, and plenty of food. Here at Cloud Academy, we are thankful for our amazing customer like you. Since this time of year can be stressful, we’re sharing a few of our latest article...
Google Cloud Platform Certification: Preparation and Prerequisites
Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2019, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the second consecuti...
New Lab Challenges: Push Your Skills to the Next Level
Build hands-on experience using real accounts on AWS, Azure, Google Cloud Platform, and more Meaningful cloud skills require more than book knowledge. Hands-on experience is required to translate knowledge into real-world results. We see this time and time again in studies about how pe...
New on Cloud Academy: AWS Solution Architect Lab Challenge, Azure Hands-on Labs, Foundation Certificate in Cyber Security, and Much More
Now that Thanksgiving is over and the craziness of Black Friday has died down, it's now time for the busiest season of the year. Whether you're a last-minute shopper or you already have your shopping done, the holidays bring so much more excitement than any other time of year. Since our...
Understanding Enterprise Cloud Migration
What is enterprise cloud migration? Cloud migration is about moving your data, applications, and even infrastructure from your on-premises computers or infrastructure to a virtual pool of on-demand, shared resources that offer compute, storage, and network services at scale. Why d...
6 Reasons Why You Should Get an AWS Certification This Year
In the past decade, the rise of cloud computing has been undeniable. Businesses of all sizes are moving their infrastructure and applications to the cloud. This is partly because the cloud allows businesses and their employees to access important information from just about anywhere. ...
AWS Regions and Availability Zones: The Simplest Explanation You Will Ever Find Around
The basics of AWS Regions and Availability Zones We’re going to treat this article as a sort of AWS 101 — it’ll be a quick primer on AWS Regions and Availability Zones that will be useful for understanding the basics of how AWS infrastructure is organized. We’ll define each section,...
Application Load Balancer vs. Classic Load Balancer
What is an Elastic Load Balancer? This post covers basics of what an Elastic Load Balancer is, and two of its examples: Application Load Balancers and Classic Load Balancers. For additional information — including a comparison that explains Network Load Balancers — check out our post o...
Advantages and Disadvantages of Microservices Architecture
What are microservices? Let's start our discussion by setting a foundation of what microservices are. Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs). ...
Kubernetes Services: AWS vs. Azure vs. Google Cloud
Kubernetes is a popular open-source container orchestration platform that allows us to deploy and manage multi-container applications at scale. Businesses are rapidly adopting this revolutionary technology to modernize their applications. Cloud service providers — such as Amazon Web Ser...
AWS Internet of Things (IoT): The 3 Services You Need to Know
The Internet of Things (IoT) embeds technology into any physical thing to enable never-before-seen levels of connectivity. IoT is revolutionizing industries and creating many new market opportunities. Cloud services play an important role in enabling deployment of IoT solutions that min...