Designing for high availability, fault tolerance and cost efficiency
High Availability in RDS
High Availability in Amazon Aurora
High Availability in DynamoDB
SAA-C02- Exam Prep
The course is part of this learning path
This section of the Solution Architect Associate learning path introduces you to the High Availability concepts and services relevant to the SAA-C02 exam. By the end of this section, you will be familiar with the design options available and know how to select and apply AWS services to meet specific availability scenarios relevant to the Solution Architect Associate exam.
- Learn the fundamentals of high availability, fault tolerance, and back up and disaster recovery
- Understand how a variety of Amazon services such as S3, Snowball, and Storage Gateway can be used for back up purposes
- Learn how to implement high availability practices in Amazon RDS, Amazon Aurora, and DynamoDB
- [Andy] Okay, let's review the key points on high availability to remember for the exam. Remember the different options available on how you get your data in and out of AWS? We've got AWS Direct connect, VPN, internet connection, AWS Snowball, AWS Snowmobile, and the AWS storage gateway. How much data you actually need to import or export in or out of AWS affects your chosen solution. And as a general rule, if your data will take longer than a week to transport, you should consider using AWS Snowball.
Now if you need to shift more than 50 terabytes of data, using a Snowball device is usually the best option unless there's very fast networking described. Recovery time objective is the speed in which you need your system back up and running. The lower the recovery time objective target, the most failover you need to factor in. So for example, anything under four hours as a recovery time objective would require a warm standby or pilot light designs, and a major factor in recovery time objective is restoring the data. For example, backup and restore as a strategy from a tape library or storage gateway is unlikely to be able to recover large volumes of data in under eight hours, so keep that in mind. The recovery point objective is the point in time you need to go back to in your data. Now if this is another low value, then you probably need to consider Multi A-Z databases with automatic failover as a solution.
Amazon S3 is an ideal backup solution for on-premise corporate data centers, and there's three different classes of Amazon S3, Standard, Infrequent Access, and Amazon Glacier. Standard has 11 nines of durability and it maintains data across multiple devices and multiple availabilities it owns within a single region. It has four nines of availability and security features, including encryption, access controls, and data management functionality, such as lifecycle policies. Infrequent Access class is very similar, however there are two main differences. Firstly, only three nines of availability as opposed to four, which is offered by Standard, and secondly, the cost. Infrequent Access is cheaper than Standard, making this a very good choice for backup data. Amazon Glacier is the cheapest option of the three classes and it's used as cold storage for data archiving. It uses different security mechanisms, such as vault policies, and it can be used with S3 lifecycle rules, or with the SDKs, to move data from one of the other classes to Amazon Glacier. So Glacier does not offer immediate access to data and the speed of data retrieval will depend on which method you choose, those being expedited, standard, or bulk. So remember those three different ways you can get data back from Glacier. Generally, if you need something back in under an hour, then Glacier is not going to cut it. Cross region replication is used to help with disaster recovery by reducing latency of data retrievals and complying with governance and compliance controls. Multipart uploads improve performance by uploading object parts in parallel for multithreaded performance. So, if you're confronted with a scenario where you need to upload large files over a stable network, use multipart upload to maximize the use of your available bandwidth. The benefits of multipart uploads are speed and throughput, interruption recovery, and management of data. The security options we have with storage, IAM policies, bucket policies, access control lists, lifecycle policies, multifactor authentication, delete, and versioning. Remember the AWS Snowball is a service used to securely transfer large amounts of data in and out of AWS. When you do that via the data center to Amazon S3, or from S3 back to your data center using this physical appliance known as a Snowball. The appliance comes in either a 50 terabyte or 80 terabyte size, and the Snowball appliance is built for high speed using the RJ45, SFP+ Copper, and SFP+ Optical. All data copied to a Snowball device is encrypted by default via KMS keys and the AWS Snowball is HIPAA compliant. Now the storage gateway allows you to provide a gateway between your own data center storage systems and Amazon S3 and Glacier.
The storage gateway itself is a software appliance that can be installed within your own data center. And the appliance can be downloaded as a virtual machine and it's stored on one of your own local hosts. The different configuration options available for storage gateway are file gateways, volume gateways, and tape gateways. Looking at file gateways, these allow you to securely store your files as objects within Amazon S3, which is presented as an NFS share in which clients can mount or map a drive to. So NFS, great for file gateway. Any data is sent over HTTPS connections and all objects are automatically encrypted using SSE-S3. A local cached is provisioned in the creation of a file gateway which uses on-premise storage to access the most recently accessed files to optimize latency. So we've got the file gateway which we've just talked about, and we also have the volume gateway, and there's two types of volume gateway, and then there's also a tape gateway but let's talk about the volume gateway now. Volume gateway has stored volume gateways and cached volume gateways. Stored volume gateways often are used as a way to back up your local storage volumes to Amazon S3. Your entire data library is also kept on-premise for minimal latency, and during its creation, volumes are created and backed up by Amazon S3 and are mapped directly to on-premise storage. Stored volumes are presented as SCSI devices allowing communication from your application service. And as data is written to these volumes it's first stored using the on-premise maps storage before a storage gateway then copies the same data asynchronously to S3. So if you need an asynchronous solution, stored volume gateways. Snapshots of the volume can be taken which are then stored as EBS snapshots on Amazon S3. So volume sizes can be between one gigabyte and 16 terabytes and hold up to 32 volumes giving a total storage of 512 terabytes. Data is stored in a buffer using the on-premise storage before being written to S3 using an SSL connection.
And in a disaster, the EBS snapshots could be used to create new EBS volumes which can then be attached to EC2 instances. Now, the cached volume gateway, the primary data storage is actually Amazon S3 rather than your own on-premise storage solution as is the case with stored volume gateways. So a cache is held locally, using on-premise storage for buffering, and to access recently accessed data. That minimizes latency. Volumes are presented as iSCSI devices allowing connectivity from your application service and all data's sent to S3 uses an SSL connection and this is encrypted using SSE-S3. Volumes can be 32 terabytes in size, with a total of 32 volumes giving a total storage of 1024 terabytes. Again, snapshots of these volumes can be also taken which are stored on S3 as EBS snapshots. And again, in a disaster, the EBS snapshots can be used to create new EBS volumes which can be attached to EC2 instances. Tape gateways are known as virtual tape libraries, and they allow you to backup data to S3 from your own corporate data center. But they also leverage Amazon Glacier for data archiving. So virtual tape libraries are essentially a cloud-based tape backup solution. Applications and backup software can mount the tape drives along with a media changer as iSCSI devices to make the connection. When virtual tapes are archived, the data is simply moved from S3 to Glacier. Okay, so let's look at some exam questions to put our knowledge into practice. First question, you have deployed resources for a web application in separate regions as part of a disaster recovery plan. You want resources in the primary region to be available the majority of the time and resources in the secondary region to be on standby. As long as a single resource is healthy to receive traffic in the primary region, you do not want to failover to the secondary region. Which Route 53 failover type should you configure? Okay, so in Amazon Route 53 you can use the active-passive fail over configuration when you want a primary group of resources to be available the majority of the time. And you want a secondary group of resources to be on standby in case of all of the primary resources becoming unavailable. When responding to queries, Amazon Route 53 includes only the healthy primary resources. So if all of the primary resources are unhealthy, Amazon route 53 begins to include only the healthy secondary resources in responses to DNS queries. So our best option here is option D, active-passive fail-over with multiple primary and secondary resources.
Active-active is a very expensive solution to maintain, it's more suited to maintaining very high availability over a disaster recovery design, and we mentioned disaster recovery in this question. We can discount active-passive failover with weighted records as weighting is not going to help us determine the best failover target should there be an incident. Active-passive failover with one primary and one secondary source is good but it doesn't give us that ability to only select resources that are available at any particular time. With active-passive fail over with multiple primary and secondary resources, you can also associate multiple resources with the primary record, the secondary record or both. And in this configuration, Route 53 considers the primary failover record to be healthy, as long as at least one of the associated resources is healthy. So the best option is option D, active-passive failover with multiple primary and secondary resources. Okay, next question. Designing your highly available web application in AWS, you have a VPC that spans four availability zones and two elastic load balances in different availability zones spreading along with two EC2 instances in each availability zone.
You want each ELB to be able to direct traffic to any EC2 instance in any of the four availability zones. Which ELB setting is critical to this being possible? So while each setting here of these four deserves close consideration, the one setting that determines whether your ELB can communicate with EC2 instances within a different availability zone is the cross-zone load balancing setting. So at the load balancer, with cross-zone load balancing enabled, can communicate with any registered instance in any availability zone. This is especially useful when high availability is crucial as it allows ELBs to monitor the health and direct traffic evenly across multiple zones. Without that enabled, each ELB will only be able to direct traffic and check the status of the instances in it's own zone. So with the app design in question, this would leave two sets of instances without the benefits of load balancing and make your application more susceptible to service issues due to instance failures. So the first option, the cross-zone load balancing setting is the most important one for us to choose. Okay, so that's a good start into high availability, we're getting well-prepared for this exam. Don't be afraid to get into the library and try the lab challenges. These are the best way to prepare yourself for the certification. They provide you with scenarios that are very much in line with the exam questions and they're all practical. So give that a go, it's a great way to prepare, see in the next one.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 90+ courses relating to Cloud reaching over 100,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.