In this course, I’ll start by explaining the purpose of Azure Service Level Agreements (or SLAs) and some of the details they contain. Then I’ll cover some of the actions that impact SLAs. Finally, I’ll go over the lifecycle of Azure services, from public preview to retirement.
- Understand the purpose of Azure Service Level Agreements (or SLAs) and what they contain
- Understand actions that can impact SLAs in both positive and negative ways
- Understand Azure service lifecycle stages, including preview, general availability, and retirement
- Anyone who is responsible for tracking or maintaining service levels on Azure implementations
- Basic knowledge of Azure (or take our Overview of Azure Services course)
When your organization uses Azure services to run important applications, you need some kind of guarantee that those services will operate properly. Otherwise, unreliable cloud services could cost your organization dearly in terms of lost revenue, profit, or reputation. Microsoft provides guarantees in the form of service level agreements.
An Azure SLA tells you what level of reliability or performance you can expect from a given service and how you’ll be compensated if Microsoft doesn’t meet that level. Typically, an SLA promises a certain level of availability or uptime. For example, Azure App Service has an SLA that guarantees apps will be available 99.95% of the time.
So, what does that actually mean? How long can it be down without triggering the SLA? Well, first, you need to know that it’s calculated on a monthly basis, so even if the service isn’t up for 99.95% of the time during a single week, that won’t necessarily trigger the SLA.
Here’s how it’s calculated. This formula is actually in the service level agreement for App Service. The maximum number of available minutes depends on which month you’re talking about because months don’t have the same number of days.
Let’s take January as an example. It has 31 days. So the number of minutes in January is 31 days times 24 hours per day times 60 minutes per hour, which comes to 44,640 minutes. Now, let’s say App Service is down for a total of 20 minutes during the month of January. This could be one 20-minute outage or four 5-minute outages or any other combination of outages that adds up to 20 minutes. If we put that in the formula, the answer is 99.955%, which is higher than 99.95%. So 20 minutes of downtime in a month with 31 days won’t trigger the SLA. The service would have to be down for more than 22.32 minutes in a 31-day month to trigger the SLA.
When I say “trigger the SLA”, I mean that customers could receive compensation from Microsoft. Notice that I said could receive compensation. It isn’t automatic. In fact, to receive compensation, you have to submit a claim. If Microsoft approves your claim, then you’ll receive a credit on your monthly bill.
The amount of the credit depends on the uptime percentage. For App Service, there are three levels. If the uptime is below 99.95% but above 99%, then you’ll get back 10% of the monthly fees for that service. If it’s below 99% but above 95%, then you’ll get back 25% of the fees. If it’s below 95%, then you’ll get back 100% of the monthly fees. That’s extremely unlikely to happen, though, because the service would need to be down for over 37 hours in a 31-day month.
Of course, your App Service app could be down for reasons that have nothing to do with Azure, such as a bug in your code, so how will you know if there’s been downtime that was definitely caused by Azure? One of the best ways is to use Azure Service Health. Not only does it keep track of outages, but you can even configure it to send you alerts about service incidents.
I should mention that the App Service SLA doesn’t apply to App Service’s free tier. That’s understandable, right? If you’re not paying for the service, then there are no guarantees. That’s the case for all of Azure’s free services.
That’s not the only factor that can affect a service’s SLA. For example, the uptime guarantee for Azure virtual machines depends on the types of virtual machines and the availability configurations you use. If you deploy a single VM that uses Standard HDD disks, then the uptime guarantee is only 95%, which is shockingly low. But that’s because Standard HDD disks are not very reliable, and a single VM can go down for a variety of reasons.
You can get a higher uptime guarantee in a couple of ways. The first way is to use SSDs instead of HDDs, which makes a huge difference. With a Standard SSD, the uptime guarantee is 99.5%, and with a Premium SSD, it’s 99.9%.
The second way is to use redundancy. If you deploy at least two VMs in an availability set, then the uptime guarantee is 99.95%. If you deploy at least two VMs across availability zones in the same region, then it goes up to 99.99%. Availability zones give you the highest uptime because the VMs are deployed in separate datacenters, so even if an entire datacenter goes down, you’ll still have a VM running in a datacenter in another availability zone. Generally speaking, the SLA for an Azure service will offer higher uptime guarantees for more redundant configurations.
Although uptime guarantees are very important, they’re not the only guarantees. Some Azure services offer other ones as well. For example, Cosmos DB, which is Microsoft's innovative NoSQL database, offers guarantees for not only uptime but also for throughput, consistency, and latency. I won’t go into the details, but just bear in mind that some Azure SLAs can be quite complex.
You should also be aware that Azure SLAs do not apply when the customer causes a guarantee to not be met. I already mentioned the possibility of having a bug in your application that causes an App Service app to go down, but there are lots of other possible reasons, too. A simple example is if you shut down a virtual machine. At that point, it’s not available, but that’s obviously not Microsoft’s fault, so it doesn’t count in the uptime calculation. Another example is if you violate Microsoft’s acceptable use policies.
And that’s it for service level agreements.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).