This course covers the core learning objective to meet the requirements of the 'Designing for disaster recovery & high availability in AWS - Level 2' skill
Learning Objectives:
- Analyze the amount of resources required to implement a fault-tolerant architecture across multiple AWS availability Zones
- Evaluate an effective AWS disaster recovery strategy to meet specific business requirements
- Understand SLA for AWS services to ensure the high availability of a given AWS solution
- Analyze which AWS services can be leveraged to implement a decoupled solution
When the idea of decoupling is usually introduced, it shows up in the context of application development and messaging services, so let's get started with that. What does decoupling an application mean?
A decouple application allows each component to perform its tasks independently. It allows components to remain completely autonomous and unaware of each other. A change in one component shouldn't require a change anywhere else. More importantly, a failure in one layer of the application should not propagate to other layers but remain isolated where the component failure took place. Consider tightly coupled application. We're keeping it simple for the sake of illustration. We can define a received layer that invokes a transcode layer which then invokes the published and notified layer. This will be a simple three layer image processing application. In this type of implementation, a failure in one of the layers may cause a negative impact on the following layer and impair the functioning of the entire application.
The possibility of decoupling presents itself by introducing a messaging mechanism where messages can be sent and received by the different layers. Ideally, the messaging model needs to be a one-to-one message passing, and that one message is generated by an existing layer. That message is put into a queue, and that one message is then picked up by the next processing layer. The Amazon Simple Queue Service, or SQS, lends itself for this type of implementation, behaving like an email system for different application layers, and it's able to maintain a copy of the message received even if there's no consumers listening to pick up the request for processing. We can then represent the application using a diagram as shown. A little bit different, because in this case we are integrating a queue between each of the layers. In this particular implementation, we are going to be using Amazon SQS, and we will implement each of the application layers as fleet of EC2 instances inside auto-scaling groups.
The architecture diagram then begins to look similar to what is shown on screen. By using the SQS queues between each process and layer, we have achieved a loose coupling of the systems which are now exchanging messages in order to transfer requests between layers. This is an asynchronous connectivity of the systems and allows you to increase or decrease the number of EC2 instances that receive and process the messages in parallel. You can also configure auto-scaling to grow and shrink the size of each application layer fleet based on usage and demand. If an EC2 instance fails to process a message, it is retained in the corresponding, which will then be picked up upon restoration of the EC2 instance or by another EC2 instance on the same auto-scaling group for that layer. In this case, Amazon SQS behaves as the equivalent of an email system for different application layers. In general for SQS, the equivalent of a mailbox is called a queue. Applications that put messages into a queue are called producers and applications picking up messages are called consumers.
This is the vocabulary and terminology used by the AWS documentation with this service. The general flow of a message in a queue is as follows. First, an application produces a message and sends it to a queue. Second, an application consumer is usually listening or polling the queue for new messages and picks it up when requested in order to process it. When a message is picked up by a consumer, the message is locked and a visibility timeout is set, so that the message becomes invisible to all other polling consumers. Therefore, this ensures that each message is maintained until the consumer is finished processing the message and issues a delete message call to the queue in order to delete the message. Each message is processed at least once. If for any reason the message is not processed successfully and the delete message call is not issued, the visibility time out for the message expires. The message becomes visible and available once again to be picked up by another consumer or by the restored consumer that failed to complete the initial processing. In the topology diagram shown, we display the effect of visibility timeout as in regards to the received message request. Notice that as the visibility time out is active, a message is not returned when the request is made.
There is a type of queue known as a delay queue, where once a message is received, you can delay the delivery of the message for a number of seconds. In this type of queue, the visibility is managed as shown in the diagram. The predefined delay behaves just as a visibility timeout and that message is not returned when the received message request is made. Once the predefined delay is complete, the message and the queue behave as clarified. However, once the message is picked up, the visibility timeout becomes active. The minimum and default delay for a message is zero seconds, and the maximum is 15 minutes. Some details about Amazon SQS are: number one, the visibility time out for a message in a queue by default is 30 seconds. The minimum is zero seconds, the maximum is 12 hours. If processing a message will take more than 30 seconds, you will want to increase the visibility timeout accordingly to meet your application's processing time.
This will make sure that your applications consume a message only once and have enough time to process them. The visibility timeouts can be set for the entire queue or for an individual message if needed. For an individual message, you can use the change message visibility invocation and with a visibility timeout parameter in seconds. The change message visibility call has no impact on the other received message commands issued later. Please note that if your consumer needs longer than 12 hours to process a message, you need to consider perhaps using step functions instead of SQS.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.