How to design high availability and fault tolerant architectures
Designing solutions for elasticity and scalability
Designing Resilient Architectures. In this module, we explore the concepts of business continuity and disaster recovery, the well-architected framework and the AWS services that help us design resilient, fault-tolerant architectures when used together.
We will firstly introduce the concepts of high availability and fault tolerance and introduce you to how we go about designing highly available, fault-tolerant solutions on AWS. We will learn about the AWS Well Architected Framework, and how that framework can help us make design decisions that deliver the best outcome for end users. Next, we will introduce and explain the concept of business continuity and how AWS services can be used to plan and implement a disaster recovery plan.
We will then learn to recognize and explain the core AWS services that when used together can reduce single points of failure and improve scalability in a multi-tier solution. Auto Scaling is a proven way to enable resilience by enabling an application to scale up and down to meet demand. In a hands-on lab we create and work with Auto Scaling groups to improve add elasticity and durability. Simple Queue service increases resilience by acting as a messaging service between other services and applications, thereby decoupling layers, reducing dependency on state. Amazon Cloudwatch is a core component of maintaining a resilient architecture - essentially it is the eyes and ears of your environment, so we next learn to apply the Amazon CloudWatch service in a hands-on environment.
We then learn to apply the Amazon CloudFront CDN service to add resilience to a static website that is served out of Amazon S3. Amazon Cloudfront is tightly integrated with other AWS services such as Amazon S3, AWS WAF and Amazon GuardDuty making Amazon CloudFront an important component to increasing the resilience of your solution.
Hello and welcome to this lecture covering the AWS Global Infrastructure.
Amazon Web Services is a global public cloud provider, and as such, it has to have a global network of infrastructure to run and manage it's growing cloud services that support customers around the world. This global network is comprised of a number of different key components. These being: availability zones, regions, edge locations, and regional edge caches.
If you're deploying services on AWS, you want to have a clear understanding of each of these components, how they are linked, and how you can use them within your solution to your maximum benefit. Let's take a closer look, starting with availability zones.
Availability zones and regions are closely related. These availability zones, commonly referred as AZ's, are essentially the physical data centers of AWS. This is where the actual compute, storage, network, and database resources are hosted that we as consumers provision within our virtual private clouds, our VPC's.
A common misconception is that a single availability zone is equal to a single data center. This is not the case. In fact, it's likely that multiple data centers located close together form a single availability zone. Each availability zone will always have at least one other availability zone that is geographically located within the same area, usually a city, which are linked by highly resilient and very low latency private fiber optic connections. However, each AZ is isolated from the others using separate power and network connectivity that minimizes impact to other AZ's should a single AZ fail. These low latency links between AZ's are used by many AWS services to replicate data for high availability and resiliency purposes.
For example, when RDS, the relational database service, is configured for multi AZ deployments, AWS would use synchronous replication between it's primary and secondary database and asynchronous replication for any re-replicas that have been created.
Often, there are three or four or even five AZ's linked together via these low latency connections. This localized geographical grouping of multiple AZ's, which could include multiple data centers, is defined as an AWS region. Multiple AZ's within a region allows you to create highly available and resilient applications and services. But architecting your solutions to utilize resources across more than one AZ ensures that minimal or no impact will occur to your infrastructure should an AZ experience a failure.
Anyone can deploy resources in a cloud, but architecting them in a way that ensures your infrastructure remains stable, available, and resilient when faced with a disaster, is a different matter. Making use of at least two AZ's in a region helps you maintain high availability of your infrastructure and it's always a recommended best practice.
Regions. As we now know, a region is a collection of availability zones that are geographically located close to one another. This is generally indicated by AZ's within the same city. AWS has deployed them across the globe to allow it's world wide customer base to take advantage of low latency connectivity. Every region will act independently of the others and each will contain at least two availability zones. For example, if an organization based in London was serving customers throughout Europe, there would be no logical sense to deploy services in the Sydney region, simply due to the latency response times for it's customers. Instead, the company would select the region most appropriate for them and their customer base which may be the London, Frankfurt, or Ireland region.
Having global regions also allows for compliance with regulation laws and governance relating to data storage when at rest and in transit. For example, you may be required to keep all data within a specified location, such as Europe. Having multiple regions within this location allows an organization to meet this requirement. Similarly to how utilizing multiple AZ's within a region creates a level of high availability, the same can be applied to utilizing multiple regions. Depending on the level of business continuity you require, you may choose to architect your AWS environment to support your applications and services across multiple regions should an entire region become unavailable, perhaps due to a natural disaster.
You may want to use multiple regions if you are a global organization serving customers in different countries that have specific laws and governance about the use of data. In this case, you could even connect different VPC's together in different regions. The number of regions is increasing year after year, as AWS works to keep up with the demand for cloud computing services. Interestingly, not all AWS services are available in every region. This is a consideration that must be taken into account when architecting your infrastructure. Some services are classed as global services, such as AWS identity and access management, or Amazon CloudFront, which means that these services are not tied to a specific region. However, most services are region specific, and it's down to you to understand which services are available within which region. The link on the screen provides a definitive list of all services and the regions where they operate. This list is constantly being updated as more and more services become available in different regions.
AWS has a specific naming convention for both regions and availability zones. Depending on where you are viewing and using the region name, it can be represented as two different names for the same region. Regions have both a friendly name, indicating a location that can be viewed within the management console, and a code name that is used when referencing regions programmatically, for example, when using the AWS CLI. As you can see in this example, the name in the first column is easier to associate than that of the code name.
Availability zones are always referenced by their code name, which is defined by the AZ's region code name that the AZ belongs to, followed by a letter. For example, the AZ's within the eu-west-1 region, which is EU Ireland, are eu-west-1a, eu-west-1b, and eu-west-1c.
Edge locations are AWS sites deployed in major cities and highly populated areas across the globe. And they far outnumber the number of availability zones themselves. While edge locations are not used to deploy your main infrastructure, such as EC2 instances, EBS storage, VPC's or RDS resources, like within AZ's, they are used by AWS services such as AWS CloudFront to cache data and reduce latency for end user access by using the edge locations as a global content delivery network, a CDN. As a result, edge locations are primarily used by end users who are accessing and using your services.
For example, you may have your website hosted on EC2 instances and S3 as your origin within the Ohio region associated to a CloudFront distribution. When a user accesses your website from Europe, they would be redirected to their closest edge location within Europe where cache data could be read on your website, significantly reducing latency. To understand more about how Amazon CloudFront achieves this, you can take a look at our existing courses and labs on this service: Working with Amazon CloudFront, how to Serve your files using the CloudFront CDN, and how to Configure a Static Website with S3 And CloudFront.
In November 2016, AWS announced a new type of edge location called a regional edge cache. These sit between your CloudFront origin service and the edge locations. A regional edge cache has a larger cache width than that of the individual edge locations. And because data expires from the cache at the edge locations, the data is retained at the regional edge cache. Therefore, when data is requested at the edge location that is no longer available, the edge location can retrieve the cache data from the regional edge cache instead of the origins servers which would have a high latency. Understanding what each of these components can allow you to do will help you architect a resilient, highly available, secure, and low latency solution for you and your customers.
That has now brought me to the end of this lecture. Coming up next is the topic of AWS disaster recovery strategies.
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.