1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Implementation and Deployment for Solutions Architect Associate on AWS

Optimizing for High Availability


Implementation and Deployment
Solution Design
Start course

In this course, we apply the design principles, components, and services we learned in the previous courses in this learning path to design and build a highly available, scalable application. We then apply our optimization principles to identify ways to increase durability and cost efficiency to ensure the best possible solution for the end customer.

Course Objective

  • Identify the appropriate techniques and methods using AWS services to implement a highly available, cost-efficient, fault-tolerant cloud solution.

Intended Audience

This course is for anyone preparing for the Solutions Architect–Associate for AWS certification exam. We assume you have some existing knowledge and familiarity with AWS, and are specifically looking to get ready to take the certification exam.


Basic knowledge of core AWS functionality. If you haven't already completed it, we recommend our Fundamentals of AWS Learning Path. We also assume you have completed all the other courses, labs, and quizzes that precede this course in the Solutions Architect–Associate on AWS learning path.

This Course Includes

  • 4 Video Lectures
  • Real-world application of concepts covered in the exam

What You'll Learn

Lecture Group What you'll learn
 Solution Design How to apply what you've learned about designing solutions to a real-world scenario
 Solution Architecture Architecting a solution in the real world
 Implementation Implementing on a solution you've designed and architected for the real world
 Optimizing for High Availability Optimizing your real-world solution for high availability

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


So now we have deployed our high-availability infrastructure and it's working. However, we need to ask ourselves, is this service elastic and scalable enough? The way it is right now, I expect it won't be able to handle peaks in demand. And things aren't quite scalable enough to handle any burst activity or significant growth. If the client has another popular widget, it's likely this solution may not meet burst demand any better than the solution the customer had hosted in their local data center. See, the problem is that while we've used AWS to deploy this application, we've deployed it in the same way we might have done in a traditional data center. So now we have to do what's required to make this into a genuine cloud application. We need to remember our goal is to add elasticity and scalability to this infrastructure. So let's go back to our high availability Top 10. I'm sure we could do more with our monitoring service through CloudWatch, and I feel we could do a lot more with our number one service, Autoscaling before this solution is going to be durable, scalable and highly available. Let's start working on scalability by creating an auto scale group for our web server instances. The first step is to configure the launch configuration. This will basically tell auto scaling how we want our instances to be configured when they're launched. We can use the same configuration we used to launch the three instances we have running, but this time, we'll use an m3 xlarge instance type, which will give us better performance. We can use the same IAM role and the user data that we used to launch our other instances. We'll select the existing security group, and now we're ready to create our launch configuration. So now let's configure the auto scaling group itself. We'll give it a name and select all available subnets that we have in our default VPC. We will configure this auto scaling group to receive traffic from our load balancer and also to evaluate the health of our instances based on the elastic load balancer health checks. Here is where we can define how our auto scaling group will increase and decrease in size. We will create some simple CloudWatch alarms to do this. We'll click Next. We don't need any notifications or text at this stage. We'll come back and do that later. Done. Now that we have our auto scaling group properly configured, we can set the values of our desired numbers of instances that we want to have in the group. Here we can see that auto scaling has already started to launch some new instances. So auto scaling is providing us with additional availability in our design combined with our multi availability zone relational database service. But optimization is not just about performance. Remember, our goal is to create the very best outcome for our end customer, so we've got ourselves a highly available environment but what about reducing costs? Let's quickly drop into the Amazon Simple Monthly Calculator and estimate what our monthly database costs might be. Now we know MySQL is quite a cost-efficient database to run. If we're running at 80% utilization with an r3, let's say r3 large, and we're using just standard SSD storage, not provisioned IOPS, it's gonna give us 140 bucks a month. And if we use an r3 xlarge, it'll be 278 a month. That's a single availability zone. We're using a multi availability zone solution, so let's just factor that cost in and see if that makes a difference. So let's change it to multi-AZ. Woah, $557 a month for our databased tier. That's at 80% utilized. But what if we went down to an r3 large? That's 282.44 a month. Okay, so that's reasonable. But what about we go for Amazon Aurora, which is a fantastic database solution using the r3 large? It's $170 a month. What about if we try an r3 xlarge? $340.38 a month, so that's a saving of around $200 a month on operating costs. We should go with Amazon Aurora. So why would we look at Amazon Aurora over MySQL? Well, Amazon Aurora employs highly durable SSD backed virtualized storage layers. They're purpose built for database workloads. We're able to create read replicas very easily, up to 15 read replicas; and Amazon Aurora can promote our read replica to a primary automatically. Amazon Aurora storage is fault-tolerant, transparently handling the loss of up to two copies of data without affecting database's write availability; and up to three copies without affecting read availability. So that's a very highly available environment, perfect for our type of use case. And Amazon Aurora storage is essentially self-healing, so data blocks and disks are continuously scanned for errors and replaced automatically, a highly available solution at the best price. A perfect use case for our Acme Widget Support page. Let's just compare that again. 557 to 340, yeah. That's a big difference. So let's migrate our database to Aurora. The process is very simple. We just have to take a snapshot of our MySQL instance and set up a new DB instance from the snapshot. One advantage of using Amazon Aurora is that we don't need to provision storage as AWS will do that automatically for us. Now that the snapshot is ready, we just have to select it and then select Migrate Snapshot. We select the instance class and check other settings. Okay, the database instance has deployed. Now we need to update the application files with this new database end point. All the other information in the application file can remain the same, so let's sync our local file with the S3 bucket we are using. Now that, that's synced, we can go ahead and remove all the web instances that we originally created and then wait for the auto scaling group to launch new ones that will download the new application configuration as part of the bootstrap routine. Now all our instances are registered in the elastic load balancer. We'll now go to our widget support site to check and see how it looks. Everything is working as it was before, so our migration has been a success. Now these are the metrics for the monitoring scripts that we deployed on the web instances as part of the bootstrapping routine. Elastic load balancers provide metrics in one minute intervals by default, which is great. EC2 CloudWatch provides five basic metrics at five-minute intervals. Now by using these scripts that we installed with the agent at bootstrap time with our instances, we can monitor things like memory and disk usage, which are not available by default in the CloudWatch basic metrics for EC2. The basic CloudWatch monitoring provides five EC2 metrics, which are recorded in five-minute intervals. Custom monitoring provides metrics in one-minute intervals, and with the script, we can monitor disk and memory usage in that interval time. If you want to set these monitoring scripts, they are available to download. They are provided by AWS, and you can get them from this page. Just follow the instructions to get them working in your own environment. Anyway, back to our deployment. We deployed the agent that's required to capture this information and send it to the instance logs to CloudWatch in our bootstrap routine, which is now part of our auto scale config file routine. Here we can see the logs for each one of our instances. This can really help us diagnose and troubleshoot applications running in our auto scaling groups. As an instance is terminated, it's not possible to access the EBS volumes with the application log data. So storing the log information somewhere else and using it in CloudWatch as we are can be very helpful in troubleshooting and diagnosis. You can follow this guide to deploy the agent to send the logs from the next instances to CloudWatch. Let's review our design diagram, which illustrates everything that we've built with high availability and fault tolerance in mind. Our Aurora database is sitting on our backend on an RDS instance. Aurora gives us additional availability and automatic failover, and auto scaling is maintaining three or more instances in three separate availability zones in our web tier. Traffic requests from our instances are managed by our elastic load balancer. Our elastic load balancer is checking the health of our instances and will only send traffic to those instances that return a healthy response. Keep in mind that elastic load balancer just checks the health and then sends traffic to healthy instances. Auto scaling does the hard work in adding or removing instances. Elastic load balancer also provides another layer of security in that it terminates inbound connections from the public domain. So if we're using HTTPS, SSL, then elastic load balancer can terminate or forward those SSL connections. Elastic load balancer is a managed service so we don't need to configure or size it. AWS provisions elastic load balancers to meet the incoming load. As a distributed delivery network, CloudFront is providing an additional layer of durability, as well as ensuring the best performance for our customers and users. So we've used nine of our top 10 high availability services. We have three EC2 instances running in three availability zones. They are behind an elastic load balancer. We have security groups set up in our VPC, we have auto scaling groups adding and removing instances based on the metrics set in our auto scale policy. Our auto scaling configuration file bootstraps our instances, and we have Route 53 directing traffic to our elastic load balancer. Route 53 enables us to add failover, static site as an additional layer of protection should we want it. We have CloudWatch capturing detailed metrics from our instances, which is sent by an agent that we installed as part of our bootstrapping routine. So CloudFront provides another abstraction to our availability and security as it provides us with detailed monitoring, which allows us to act and react to any changes in our environment. So if we look at our list, the only missing component we have is SQS. I've discounted elastic IP addresses because our elastic load balancer is acting as our termination point. So if we're adding Simple Queue Service, we could try to look for another layer of availability. SQS could be used to queue failed requests from the web server to the DB, for instance, if we do experience extreme load and neither of those two layers can support the type of activity that's requested. We can then queue requests and have those responses sent back to users in the most appropriate and speedy way we can possibly manage. We certainly would set up notification to allow us to keep monitoring of our auto scaling group activities, and it really is a question of adding a topic and then providing information we want to that. So I think we have a fantastic design here.

About the Author
Andrew Larkin
Head of Content
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.