Building High Availability into your environment
Understanding SLAs in AWS
Which services should I use to build a decoupled architecture?
Managing RTO and RPO for AWS Disaster Recovery
The course is part of this learning path
This course covers the core learning objective to meet the requirements of the 'Designing for disaster recovery & high availability in AWS - Level 2' skill
- Analyze the amount of resources required to implement a fault-tolerant architecture across multiple AWS availability Zones
- Evaluate an effective AWS disaster recovery strategy to meet specific business requirements
- Understand SLA for AWS services to ensure the high availability of a given AWS solution
- Analyze which AWS services can be leveraged to implement a decoupled solution
In general, there are two ways in which a consumer can listen for messages in a queue, and they are called short polling and long polling. By default, queues use short polling. Using short polling, a consumer issues the received message request to find messages available, and SQS sends a response even if the request found no messages available. Short polling takes place when the WaitTimeSeconds parameter of a ReceiveMessage request is set to 0, and this can happen in two ways. The first one, the ReceiveMessage call sets WaitTimeSeconds parameter to 0. The second, the ReceiveMessage call doesn't set the WaitTimeSeconds parameter, but the queue attribute called ReceiveMessageWaitTimeSeconds is set to 0. To use long polling, the consumer issues the ReceiveMessage request with a WaitTimeSeconds parameter greater than 0 and less than or equal to 20 seconds. Long polling can also happen if the ReceiveMessageWaitTimeSeconds queue attribute is set to a number greater than 0. In this case, SQS sends a response after it collects at least one message available and up to the maximum number of messages specified in the request by the MaxNumberOfMessages parameter.
An empty response with long polling only happens when the specified polling wait time expires. In general, the minimum message size is 1 byte or 1 character, and the maximum message size by default is 256 kilobytes. The Amazon SQS extended library for Java is very useful in enabling the processing of large messages up to 2GB by leveraging Amazon S3 along with SQS messaging. If you need to deal with messages larger than 256 kilobytes, the SQS Extended Client Library allows you to define if messages are to be stored in Amazon S3 all the time or if only when the message is bigger than the 256 kilobyte limit. You can also send a message with a link to an object store in an S3 bucket, you can get the message object from Amazon S3, and delete the message object from the S3 bucket if needed. Once again, the maximum message size when using the SQS Extended Client Library for Java is 2GB. The standard SQS queues make an effort to maintain the order of the messages but do not guarantee that message order will be maintained and only guarantees what is called At-least-once delivery of the messages. There can be a maximum of 120,000 in-flight messages in a standard queue, which means these are messages that have been received from a queue by a consumer but not yet deleted from the queue after processing completes.
If you use short polling, this quota will cause your consumer to get an OverLimit error message if it tries to receive a message and you have that many messages in processing. If you use long polling, SQS returns no error messages. You should always delete messages from the queue after their process in order to avoid breaching this maximum quota. In order to guarantee message order and implement guaranteed EXACTLY-ONCE delivery, you need to use what is called a FIFO queue, which is a different type of queue than the standard queue. FIFO queues perform a little slower than standard queues and that should make sense on account of the mechanism to maintain message order as First-In-First-Out and implement the EXACTLY-ONCE delivery mechanism. FIFO queues can have a maximum of 20,000 messages while processing, which means messages have been received from a queue by a consumer but not yet deleted from the queue after processing completes. Please keep the performance comparison between standard and FIFO queues in mind when designing your applications. It is also a common data point tested during exams. The difference between short polling and long polling is also an important detail to remember for your applications and for exams and certification.
Next up, we already discussed what happens when a message is not processed successfully. Basically, the application fails in the processing and the delete message call is not issued by the consumer application, and therefore the visibility timeout for the message expires making the message available to consumers once again. This situation assumes that the failure to process the message was routed on some form of compute malfunction which can be restored through using CloudWatch alarms and an automated remediation in general. However, there is a second possibility for such a situation and that is when the actual message is malformed or otherwise corrupted. This can potentially cause a never-ending cycle of the message being consumed by the ReceiveMessage request, the message is not processed accordingly because it's malformed, and the delete message call is never issued by the consumer application. The message then becomes visible again for the cycle to repeat itself. In order to guard against this possibility of message corruption and infinite attempts to process it, you can define what is called a dead letter queue in order to capture messages that cannot be processed when the message has been delivered for processing a maximum number of times as defined by the max receive count for a queue.
Every time a message is picked up by the ReceiveMessage request, the receive count for that queue is incremented by one. Reaching this predefined limit will remove the message from normal circulation and place it into the dead letter queue for examination as to the reason why it cannot be processed. Once the issue has been repaired, you can move the message back to the queue that delivered it using the dead letter queue redrive capability. Please note that in this case, dead letter queues can potentially break the order of messages in FIFO queues. And redrive allow policy is the resource that defines source queues and their corresponding dead letter queues as well as the conditions to move messages from one type of queue to the other. As such, it is important that dead letter queues be monitored carefully and messages arriving get examined as soon as possible by either automated functionality, such as a lambda function, or human examination. This will require a mechanism for notifications in the form of a messaging service. The Simple Notification Service or SNS is commonly used in combination with SQS for dispatching notifications and trigger automatic remediation and human intervention via push notifications at the same time.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.