Designing for Failure
Designing for high availability, fault tolerance and cost efficiency
High Availability in RDS
High Availability in Amazon Aurora
High Availability in DynamoDB
The course is part of this learning path
This section of the Solution Architect Associate learning path introduces you to the High Availability concepts and services relevant to the SAA-C03 exam. By the end of this section, you will be familiar with the design options available and know how to select and apply AWS services to meet specific availability scenarios relevant to the Solution Architect Associate exam.
- Learn the fundamentals of high availability, fault tolerance, and back up and disaster recovery
- Understand how a variety of Amazon services such as S3, Snowball, and Storage Gateway can be used for back up purposes
- Learn how to implement high availability practices in Amazon RDS, Amazon Aurora, and DynamoDB
Okay, let's review the key points on high availability to remember for the exam. Let's talk design patterns. First migration. If you're migrating large amounts of objects, say to S3 or EFS or Amazon FSX for Windows File Server then consider AWS Datasync. It can be a way to reduce operational costs. Great for moving cold data to Glacier or S3 storage classes. Things like machine learning and the life sciences where you have a large volume to data, good connectivity, and you just need a really simple way of moving those into AWS. AWS Datasync has encryption built in and it also gives you IAM support. You'd need to have a really good network performance to use data sync.
So you'd probably need to consider using it with AWS Direct Connect. So you've got that throughput, and if you don't have the throughput and it's a really large volume, say over 50 terabytes then you're really better using the AWS Snowball device. So, if you're migrating data from an Oracle or a Microsoft SQL Server database, then it's a good option to consider the AWS Database Migration Service. Frankly, it doesn't tend to crop up much as an exam scenario, but just remember that the Database Migration Service helps you migrate data from any of the common database platforms, e.g. Oracle, Microsoft SQL Server, or my SQL to Amazon RDS.
So it's a really simple way to transfer schemas and to migrate the data itself as that can migrate say, an Oracle Database to Oracle on Amazon RDS in a Multi-AZ deployment. Also always remember your storage options. Often the design requirements specify the need for maximum IO-performance. So if a solution needs storage, I always say interpret that to mean persistent storage unless it specifically says the storage does not need to be kept, all right? If it doesn't need to be kept then you can consider Ephemeral storage which is part of the EC2 Instance as that gives you a very high IO-option. Otherwise think EBS with enhanced networking, compute, or memory, that's it's going to be your best for maximum performance. And think for object storage, Amazon S3 is the best durable data storage, Glacier for your archives.
Remember that Amazon FSX, and Amazon EFS, the Elastic File System provide fully managed network file systems that are well-suited to provide shared storage to 99% of application designs. But there are a few use cases and exceptions to keep in mind. For example, migrating highly available DB-Clusters sometimes requires an architectural pattern that will allow you to access storage from multiple hosts simultaneously. If you get stuck having to lift and shift a cluster to AWS and you don't want to have to refactor a cluster-aware file system, like the Oracle Cluster File System or the Red Hat Global File System, then you can consider using the EBS Multi-Attach feature to coordinate storage access between instances and prevent data inconsistencies.
The MultiAttach can support up to 16 Linux instances that are using the Nitro system and that are in the same availability zone. So if you need to migrate a persistent database from on-premise to AWS where that database solution needs very high IOPS, say 60,000-plus IOPS and it also needs to be run on a single EBS volume, then you could consider the Nitro-based Amazon EC2 Instance with the Amazon EBS-provisioned IOPS SSD io1 or io2 with the volume provision to the required IOPS. Now the io1 and io2 Nitro Instances support that EBS Multi-Attach feature. This design pattern can help you if you need to run a database cost on one EBS Instance and have more than EC2 Instancez access it without there being data inconsistencies.
Remember the different options available on how you get your data in and out of AWS. We've got AWS Direct connect, VPN, internet connection, AWS Snowball, AWS Snowmobile, and the AWS Storage Gateway. How much data do you actually need to import or export in or out of AWS affects your chosen solution? And as a general rule, if your data will take longer than a week to transport, you should consider using AWS Snowball. And if you need to shift more than 50 terabytes of data using a Snowball device is usually the best option unless there's very fast networking described.
Recovery Time Objective is the speed in which you need your system back up and running. The lower that Recovery Time Objective target, the most fail-over you need to factor in. So for example, anything under four hours as a Recovery Time Objective would require warm standby or pilot light designs and a major factor in Recovery Time Objective is restoring the data. For example, backup and restore as a strategy from a tape library or Storage Gateway is unlikely to be able to recover large volumes of data in under eight hours, so keep that in mind. The Recovery Point Objective is the point in time you need to go back to in your data.
Now, if this is another low value, then you probably need to consider Multi-AZ Databases with automatic failover as a solution. Amazon S3 is an ideal backup solution for on-premise corporate data centers. And there's three different classes of Amazon S3. Standard, Infrequent Access and Amazon Glacier. Standard has eleven nines of durability and it maintains data across multiple devices and multiple availability zones within a single region. It has four nines of availability and security features including encryption, access controls and data management functionality such as lifecycle policies. Infrequent Access Class is very similar. However, there are two main differences. Firstly only three nines of availability as opposed to four, which is offered by standard. And secondly the cost.
Infrequent Access is cheaper than Standard, making this a very good choice for backup data. Amazon Glacier is a cheapest option of the three classes and it's used as cold storage for data archiving. It uses different security mechanisms such as volt policies, and it can be used with S3 lifecycle rules or with the SDKs to move data from one of the other classes to Amazon Glacier. So Glacier does not offer immediate access to data. And the speed of data retrieval will depend on which method you choose, those being expedited, standard or bulk. So remember those three different ways you can get data back from Glacier. Generally, if you need something back in under an hour then Glacier is not going to cut it.
Cross-region replication is used to help with disaster recovery by reducing latency of data retrievals and complying with governance and compliance controls. Multipart uploads improve performance by uploading object parts in parallel for multi-threaded performance. So if you're confronted with a scenario where you need to upload large files over a stable network, use multipart upload to maximize the use of your available bandwidth. And the benefits of multi-part uploads are speed and throughput, interruption, recovery and management of data. The security options we have with storage, IAM policies, bucket policies, access control lists, lifecycle policies, multifactor authentication, delete, and versioning.
Remember the AWS Snowball is a service used to securely transfer large amounts of data in and out of AWS. And you do that via the data center to Amazon S3 or from S3 back to your data center. Using this physical appliance known as a Snowball. The appliance comes in either a 50 terabyte or 80 terabyte size and the Snowball appliance is built for high speed using the RJ45, SFP+ Copper, and SFP+ Optical. All data copied to a Snowball device is encrypted by default via KMS keys, and the AWS Snowball is HIPAA compliant. Now the Storage Gateway allows you to provide a gateway between your own data center storage systems and Amazon S3 and Glacier.
The Storage Gateway itself is a software appliance that can be installed within your own data center. And the appliance can be downloaded as a virtual machine and is stored on one of your own local hosts. The different configuration options available for Storage Gateway are File Gateways, Volume Gateways, and Tape Gateways. Looking at File Gateways, these allow you to securely store your files as objects within Amazon S3, which is presented as an NFS-share in which clients can mount or map a drive to, so NFS, great for File Gateway. Any data is sent over HTTPS connections and all objects are automatically encrypted using SSE-S3. A local cache is provisioned in the creation of a File Gateway, which uses on-premise storage to access the most recently access files to optimize latency. So we've got the File Gateway, which we've just talked about and we also have the Volume Gateway, and there's two types of Volume Gateway, and there's also a Tape Gateway, but let's talk about the Volume Gateway now.
Volume Gateway has stored Volume Gateways and cached Volume Gateways. Stored Volume Gateways often are used as a way to back up your local storage volumes to Amazon S3. Your entire data library is also kept on premise for minimal latency and during its creation, volumes are created and backed up by Amazon S3 and are mapped directly to on-premise storage. Stored volumes are presented as SCSI devices allowing communication from your application service. And as data is written to these volumes it's first stored using the on-premise MIPS storage, before Storage Gateway then copies the same data asynchronously to S3. So if you need an asynchronous solution, Stored Volume Gateways.
Snapshots of the volume can be taken which are then stored as EBS snapshots on Amazon S3. So volume sizes can be between one gigabyte and 16 terabytes and hold up to 32 volumes giving a total storage of 512 terabytes. Data is stored in a buffer using the on-premise storage before being written to the S3 using an SSL connection and in a disaster the EBS snapshots could be used to create new EBS volumes which can then be attached to EC2 Instances. Now, the cached Volume Gateway, the primary data storage is actually Amazon S3, rather than your own on-premise storage solution as the case with stored Volume Gateways.
So a cache is held locally using on-premise storage for buffering and to excess recently accessed data, that minimizes latency. Volumes are presented as iSCSI devices allowing connectivity from your application service and all data sent to S3 uses an SSL connection. And this is encrypted using SSE-S3. Volumes can be 32 terabytes in size with a total of 32 volumes giving a total storage of 1024 terabytes. Again, snapshots of these volumes can be also taken which are stored on S3 as EBS snapshots and again, in a disaster, the EBS snapshots can be used to create new EBS volumes which can be attached to EC2 Instances.
Tape Gateways are known as virtual tape libraries, and they allow you to backup data to S3 from your own corporate data center but they also leverage Amazon Glacier for data archiving. So virtual tape libraries are essentially a cloud-based tape backup solution. Applications and backup software can mount the tape drives along with a media changer as iSCSI devices to make the connection. When virtual tapes are archived, the data is simply moved from S3 to Glacier.
Okay, so that's a good start into High Availability. We're getting well-prepared for this exam. Don't be afraid to get into the library and try the Lab Challenges. These are the best way to prepare yourself for the certification. They provide you with a scenarios that are very much in line with the exam questions and they're all practical. So give that a go. It's a great way to prepare. See you in the next one.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.