1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Solution Architect Professional for AWS - Domain Seven: Scalability and Elasticity

Making it Fail

Start course

Welcome to domain Seven - Scalability and Elasticity - in the Solution Architect Professional for AWS learning path. In this group of lectures, we will walk through building a flexible, available and highly resilient application in the Amazon web services environment.


Hi, welcome back. In this lesson we're going to test our architecture by simulating failure events. Each test will target a specific area of our design. Without any intervention on our part the site should recover and be available to our users. We're going to run these separate tests on our architecture. Each test will demonstrate our design goals by being fault tolerant, having no single point of failure, and a graceful degradation of services and self healing. The first test involves shutting down one of our EC2 instances that belongs to our Auto Scaling group. Our second test turns off Auto Scaling and shuts down all EC2 instances. The third and final test will force a failover of our RDS Instance. So let's begin. Our first test is to shut down a random EC2 Instance that is governed by our Auto Scaling group. We start by heading to our EC2 Dashboard to view running instances. Next let's select an instance. From the Actions menu, click on Terminate and confirm that we actually want to perform this action. While the instance is terminating we can open up a new tab and hit our website. The site loads without any issues. Jumping back to the EC2 Dashboard shows that the instance is finally terminated. Refreshing our sight shows that it's still operational. The Auto Scaling instances tabbed for this group shows the instances Health Status as Unhealthy. After some time, the Auto Scaling group will fire up a new instance in the Availability Zone that lost the instance. Refreshing the EC2 Dashboard shows a new instance that has been launched. Our site is still operational. Eventually this instance will pass it's health checks, and the Elastic Load Balancer will resume sending traffic to it demonstrating each on of our design goals. The second test will demonstrate a complete failover of our primary site to a secondary S3 site. In order to accomplish this, we'll need to set the desired minimum and maximum options of our Auto Scaling group to 0. When saved, the group will begin terminating all running EC2 instances. Then once terminated, we will open a new tab and hit our site. We are greeted with our S3 bucket version automatically. Route 53, being unable to reach the primary site, begins serving up the secondary site. Next, we want to fall back to our primary site when it becomes available again. So back onto the Auto Scaling group, we revert our desired minimum and maximum options to their original setting of 3, and save the changes. The Auto Scaling group will fire up new instances which we can see from our EC2 Dashboard. The Load Balance still shows that 0 of 3 instances are in service. Now if you recall, the ELB will take about five minutes to declare an instance healthy enough to send traffic to it. So fast forwarding, each instance moves OutOfService state into a InService state. Back in our other tab, we refresh the page to demonstrate that our primary site is back up and running. For our third and final test we will reboot the primary RDS Instance. In the RDS Dashboard we can see that the primary instance is currently running in us-eash-1a. To reboot the instance we select it and head to the Instance Actions button. The drop down will display a few options. We want to select the Reboot option. We need to confirm that we actually want to reboot. Before we do that, we need to select the Reboot With Failover box. When checked, the failover starts before the instance is restarted. Unchecked, the reboot will happen and then the failover will start. We confirm the reboot to continue. It will take a short amount of time for the failover to take effect. To save time, we'll fast forward to after the failover is completed. We can see that the primary RDS instance is now running in us-east-1b. A quick peek at the instance log shows it took 34 seconds for the failover to complete. Pretty good. During this time, users to our site will experience a slight disruption which we will address later. Our focus on RDS is the importance of the self healing aspect. Before we move on to the next session, it's important to understand that there are a variety of tools available to help us test our architectures. An open source tool called Chaos Monkey was build specifically for AWS by Netflix as part of it's Simian Army tools. When launched, Chaos Monkey will randomly shut down EC2 instances that belong to an Auto Scaling group. This can be run on a schedule or can be constantly running at random times. The tool is aimed at ensuring an architecture is capable of running under adverse conditions that might disrupt unprepared services and applications. So in our next session we will add one or more layers to our architecture to overcome that small disruption in the user experience that we had when RDS interruption happened by using CloudFront with Dynamic Content.

About the Author
Andrew Larkin
Head of Content
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.