Building High Availability into your environment
Understanding SLAs in AWS
Which services should I use to build a decoupled architecture?
Managing RTO and RPO for AWS Disaster Recovery
The course is part of this learning path
This course covers the core learning objective to meet the requirements of the 'Designing for disaster recovery & high availability in AWS - Level 2' skill
- Analyze the amount of resources required to implement a fault-tolerant architecture across multiple AWS availability Zones
- Evaluate an effective AWS disaster recovery strategy to meet specific business requirements
- Understand SLA for AWS services to ensure the high availability of a given AWS solution
- Analyze which AWS services can be leveraged to implement a decoupled solution
A service level agreement is basically a handshake. You agree to use a company’s services and they agree to a certain level of performance for those services. If they don’t meet the level of performance they publicly stated they would meet, you then get compensated.
So, let's break down both your agreement and the service's commitment.
The first thing we’ll talk about is the service’s commitment to a certain level of performance. With AWS SLAs, the performance we’re talking about here refers to availability - not durability or even reliability. Availability measures the amount of time that a service is available for you to use. And the way AWS indicates availability for SLAs is through uptime.
For every paid, generally available AWS service, AWS publishes a percentage of uptime that usually looks like a percentage with a bunch of 9s in it. For example, let’s take the AWS Key Management Service. AWS publishes a commitment that they would do their best to achieve at least 99.999% of uptime - that’s 5 nines.
This percentage dictates how much uptime, and therefore, how much downtime a company might experience in any given billing cycle if they were to use this service. Let’s take a look at a table that has a few examples of percentages of uptime and the degree of downtime it corresponds with over a year, a month, a week, and a day.
The first row is the key management service SLA - five 9s, which corresponds to about 5.256 minutes of downtime per year, and .438 minutes per month. This is very high uptime and generally speaking, this is the goal we’d all like to achieve, as this would certainly make any boss proud.
Now let’s go a little lower to 99.99%, which is four 9s. This corresponds to about 52 minutes of downtime per year, and 4.38 minutes of downtime per month. Still relatively high uptime, and the boss is still probably happy.
Then we have the lowest uptime in the chart, which is 99% or two 9s. This corresponds to 87.6 hours of downtime per year, and 7.3 hours of downtime per month. This is significantly more downtime and I think the boss probably wouldn’t be so proud of this metric.
So for every AWS service SLA, you’ll see an uptime that looks like the number in the first column, which informs how much downtime you’ll have. And as a result, this additionally informs how happy or unhappy your boss will be.
So what happens when the service or AWS doesn’t meet this level of expectation and you have more downtime than you expected? Well, you are entitled to compensation through the form of service credits. Let’s see how this works using AWS Lambda as an example.
Lambda commits to a monthly uptime percentage of 99.95% for each region for any billing cycle. Let’s say they miss that number - how much service credit do you get? Well, it’s dependent on how much they miss this number and how much you spend on the service for the billing cycle.
For example, let’s say you spent $60 on AWS Lambda in the us-east-1 Region.
The service then goes down for more than 5% that month. In that case, the SLA then dictates you would get 100% of your money, in this case $60, in service credits that you can apply to future charges for AWS Lambda in the us-east-1 region.
If Lambda is down between 1 and 5% of the month, you get 25% of your money back, in this case, $15 in service credits.
And if the service is down between .05% and 1% of the month, you get 10% of your $60, which is $6 in service credits.
Now is service credit the same thing as cash in your pocket? Nope, it’s kind of like store credit, you get your money back but it applies to future bills that you owe for the same service. You cannot transfer them or apply them to any other AWS account. So if your friend has a huge Lambda bill in their AWS account, you cannot use your $6 in service credits to help them out.
So how do you get these service credits? Surely AWS automatically provides them when a service does not meet their guaranteed percentage of uptime, right? Well, not exactly. Instead, they wait for you to notice and submit a case to AWS Support. And it’s not enough to say “hey, your service was down, please provide me service credits.” Instead, you have to provide proof. For example, for AWS Lambda, your support case must include the specific dates, times, and availability for each 5-minute interval with less than 100% availability in that AWS region for that billing cycle. You also have to submit logs that detail the errors for your claimed outage.
Not exactly a trivial amount of information to provide AWS.
The last thing we’ll talk about here is your part of the handshake. You, as a customer enter into a customer agreement with AWS when you use their services. This customer agreement is boring, but certainly worth a read, as it does mention some information about SLAs. For example, it states that AWS can change, add or remove SLAs. If the SLA changes for the worse, AWS will provide 90 day notice, but it is still your responsibility to check the AWS site regularly for modifications to the SLAs. If you continue to use the service after the new SLA is effective, AWS takes that as your agreement to the terms.
In summary, the AWS SLAs are pretty generous, as the services are expected to meet a high level of availability. However, if they do go down for a long period of time, it can be time-consuming to gather proof that it was, in fact, AWS that was down and not your own software or internet connection. When you provide that proof and if AWS finds the proof to be sufficient, you’ll get service credits that will apply for your bill for the service during the next billing cycle. That’s it for this one - see you next time!
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.