1. Home
  2. Training Library
  3. Designing Resilient Architectures

The AWS Well Architected Framework


How to design high availability and fault tolerant architectures
The AWS Well Architected Framework
2h 21m

Designing Resilient Architectures. In this module, we explore the concepts of business continuity and disaster recovery, the well-architected framework and the AWS services that help us design resilient, fault-tolerant architectures when used together.

We will firstly introduce the concepts of high availability and fault tolerance and introduce you to how we go about designing highly available, fault-tolerant solutions on AWS. We will learn about the AWS Well Architected Framework, and how that framework can help us make design decisions that deliver the best outcome for end users. Next, we will introduce and explain the concept of business continuity and how AWS services can be used to plan and implement a disaster recovery plan.

We will then learn to recognize and explain the core AWS services that when used together can reduce single points of failure and improve scalability in a multi-tier solution.  Auto Scaling is a proven way to enable resilience by enabling an application to scale up and down to meet demand. In a hands-on lab we create and work with Auto Scaling groups to improve add elasticity and durability. Simple Queue service increases resilience by acting as a messaging service between other services and applications, thereby decoupling layers, reducing dependency on state. Amazon Cloudwatch is a core component of maintaining a resilient architecture - essentially it is the eyes and ears of your environment, so we next learn to apply the Amazon CloudWatch service in a hands-on environment. 

We then learn to apply the Amazon CloudFront CDN service to add resilience to a static website that is served out of Amazon S3. Amazon Cloudfront is tightly integrated with other AWS services such as Amazon S3, AWS WAF and Amazon GuardDuty making Amazon CloudFront an important component to increasing the resilience of your solution.


Hello and welcome to this lecture, where I shall be providing an overview of the AWS Well Architected Framework, which has been designed by AWS to help you implement your solutions conforming to a large set of best practices across a wide range of topic areas.

The Well Architected Framework is exactly that, a framework that you can use to your benefit when you're looking to design architectural AWS solutions and deploy applications. It offers a set of guidelines and questions that will allow you to consistently follow best practices from a design, reliability, security, cost effectiveness, and efficiency perspective that have been developed and refined over a number of years by highly experienced AWS solution architects. By aligning your implementations to the Well Architected Framework, your solution is in good stead to meet your expectations and perform effectively and efficiently whilst remaining a stable solution.

To understand the elements of the Well Architected Framework, you must be aware of the five pillars that the framework is based and built upon. These are operational excellence, security, reliability, performance efficiency, and cost optimization. Let me explain each of these pillars in a bit more detail, so you're aware of what they cover. I will mention their best practices, which ultimately define the design principles of each pillar.

Starting with operational excellence, the prime focus of the Operational Excellence Pillar is based upon running and monitoring systems to help optimize and deliver value to the business and to aid in supporting, improving, and maintaining a process and procedures supporting your AWS infrastructure. The Operational Excellence Pillar is based upon these three best practices, prepare, operate, and evolve. And these three practices are interwoven in the following six design principles that make up this pillar.

Perform operations as code. This explains how to deploy, respond to events, and perform automated operational procedures using code to help prevent human error.

Annotate documentation. This defines how it's possible to automatically create and annotate documentation when provisioning AWS resources.

Make frequent, small, reversible changes. The focus in this principle is to implement new changes frequently at small scale, to allow you to easily roll back the change without effecting a wide customer base, if there are issues.

Refine operations procedures frequently. This focuses on the importance of consistently refining your operational procedures, evolving them as your business evolves.

Anticipate failure. The focus here is to understand and define your potential points of failure and how these can be mitigated. L

earn from all operational failures. This principle explains how knowledge sharing is key and how to learn from issues and failures that have occurred.

The Security Pillar. The Security Pillar defines how to manage and secure your infrastructure by protecting your data, by focusing on confidentiality, data integrity, access management, and other security controls, whilst insuring risk assessment and mitigation is built into your solutions.This pillar is based around five best practices, which can be defined as identity and access management, detective controls, infrastructure protection, data protection, and incident response. And these are built into the following six design principles.

Implement a strong identity foundation. This looks at how to implement the best practice of least privilege, which essentially focuses on granting a level of access and identity requires to perform its role. This also looks at how to prevent and eliminate identities having long term credentials.

Enable traceability. Having the ability to order, monitor, and log your environment is key, and this explains how to integrate this into your solutions, providing automated responses to events.

Apply security all layers. Security is key to your solution, and this principle focuses on how you should apply security at every layer of your deployment.

Automate security best practices. You should aim to automate security responses and mechanisms to insure your environment remains protected at all times.

Protect data in transit and at rest. The primary focus of this principle is encryption mechanisms and how they can be used to protect your data.

Prepare for security events. Finally, this principle explains how to prepare yourself from an incident and how to respond to this effectively ana efficiently, using simulations and tool sets.

The Reliability Pillar. This pillar looks at how to maintain stability of your environment and recover from outages and failures in addition to automatically and dynamically meet resourcing demands put upon your infrastructure. The reliability best practices are foundations, change management, and failure management. And again these best practices form the following five design principles.

Test recovery procedure. This principle looks at the importance of testing your solution's ability at recovering from a failure by utilizing cloud infrastructure and optimizing these procedures based on different failure scenarios.

Automatically recover from failure. Here, this principle focuses on monitoring and metrics and using automation to dynamically respond to thresholds to maintain a stable environment.

Scale horizontally to increase aggregate system availability. This explains how to implement horizontal scaling to reduce the single point of failure of perhaps a single large instance, when instead you could use multiple smaller instances. Stop guessing capacity. This looks at the use of auto-scaling to prevent the need to predict and guess your capacity and demand requirements, which aids in a better end user experience.

Manage change in automation. This explains how automation should be used where possible to make the changes to your infrastructure.

The Performance Efficiency Pillar. This pillar is dedicated on insuring you have the correct specified resources to efficiently meet the demands of your customers by monitoring performance and adapting your infrastructure as requirements change based on load. The best practices involved with the Performance Efficiency Pillar are selection, review, monitoring, and tradeoffs. These best practices are fed into the following five design principles for this pillar.

Democratize advanced technologies. This simply explains that where possible, utilize and maximize upon AWS managed services to perform a lot of the heavy lifting and management view which allows your business to focus on your application rather than having to learn complex and difficult technologies.

Go global in minutes. This principle looks at the very best way to make use of multiple regions to reach a global audience while maintaining low latency access to your application.

Use serverless architectures. This looks at how serverless technology can remove an administrative burden and help to reduce your cost across your solutions.

Experiment more often. With the flexibility of the cloud and resources, this explains how you have the potential to test and experiment with ease compared to that of an on premise environment.

Mechanical sympathy. This principle talks about how to define and select the most appropriate service of features for the task that you are trying to achieve within the cloud.

The Cost Optimization Pillar. Quite simply, this pillar is used to help you reduce your cost by understanding where it's possible to optimize your spend through a variety of means. There are four best practices defined, cost-effective resources, matching supply and demand, expenditure awareness, and optimizing over time. The design principles for this pillar are:

Adopt a consumption model. This looks at the different custom models available, for example, on demand, reserved, and spot compute resources, and how to select the most appropriate for your solution.

Measure overall efficiency. This focuses on how much it costs to provide output from your solution and how to optimize this by increasing the output and reducing the cost to you.

Stop spending money on data center operations. This principle defines how cloud computing optimizes your costs by reducing the traditional data center capital expenditure costs.

Analyze and attribute expenditure. It's important to identify where your costs are coming from, to measure your return on investment, which allows for additional optimization.

Use managed services to reduce cost of ownership. This principle explains how it can be more cost effective to utilize managed AWS services, as they remove a lot of the administrative functions that need to be undertaken by the customer.

That has now covered the five pillars of the Well Architected Framework. As you can see, there are many different best practices that cover the complete scope of deployment and operations of how to effectively deploy new applications and solutions within AWS.

That has now brought me to the end of this lecture, but if you'd like to understand each of these pillars in far greater detail, then you can find the whitepapers that cover each of these using the link on-screen.

About the Author
Andrew Larkin
Head of Content
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.