Designing an Architecture for Operational Excellence
The course is part of this learning path
In this module, we will apply the skills we have learned thus far to select and combine AWS services together to create a highly available web site. First, we will create a strawman design based on the requirements we are given. Then we will implement the solution design in the AWS console setting up the services we have selected. Then we will review our solution architecture looking for possible improvements and optimizations, aiming to create the most operationally excellent architecture possible.
- [Instructor] Hi, and welcome back. So now we have deployed our high availability infrastructure and it's working. However, we need to ask ourselves is this service elastic and scalable enough? The way it is right now, I expect it won't be able to handle peaks in demand. And things aren't quite scalable enough to handle any burst activity or significant growth. If the client has another popular widget it's likely this solution may not meet burst demand any better than the solution the customer had hosted in their local data center. See the problem is that while we've used AWS to deploy this application, we've deployed it in the same way we might have done in a traditional data center. So now we have to do what's required to make this into a genuine cloud application. We need to remember, our goal is to add elasticity and scalability to this infrastructure. So let's go back to our high availability top 10. I'm sure we could do more with our monitoring service through CloudWatch and I feel we could do a lot more with our number one service, Autoscaling. Before this solution is going to be durable, scalable and highly available. Let's start working on scalability by creating an auto scale group for our web server instances. The first step is to configure the launch configuration. This will basically tell Autoscaling how we want our Instances to be configured when they're launched. We can use the same configuration we used to launch the three instances we have running, but this time we'll use an M3X large Instance type, which will give us better performance. We can use the same IAM role and the user data that we'll use to launch our other Instances. We'll select the existing security group and now we're ready to create our launch configuration. So, now let's configure the Auto Scaling Group itself. We'll give it a name and select all available subnets that we have in our default VPC. We will configure this Auto Scaling Group to receive traffic from our load balancer and also to evaluate the health of our Instances, based on the Elastic Load Balancer health checks. Here is where we can define how our Auto Scaling Group will increase and decrease in size. We will create some simple CloudWatch alarms to do this. We'll click next. We don't need any notifications or tags at this stage. We'll come back and do that later. Done. Now that we have our Auto Scaling Group properly configured, we can set the values of our desired numbers of Instances that we want to have in the group. We can see that Autoscaling has already started to launch some new Instances. So Auto Scaling's providing us with additional availability in our design, combined with our multi availability zone relational data base service. But optimization's not just about performance. Remember, our goal is to create the very best outcome for our end customer. So we've got ourselves a highly available environment, but what about reducing costs? Let's quickly drop into the Amazon Simple Monthly Calculator and estimate what our monthly database costs might be. Now we know MySQL is quite a cost efficient database to run. If we're running at 80% utilization with let's say r3.large and we're using just a standard SSD storage, not provisioned IOPS it's gonna give us 140 bucks a month. And if we use an r3.xlarge, it'll be 278 a month. That's a single availability zone. We're using a multi availability zone solution. Let's just factor that cost in and see if that makes a difference. So let's change it to Multi-AZ. Whoa, $557 a month for our data based here. That's at 80% utilized. And what if went down to an r3.large, that's $282.44 a month. Okay, so that's reasonable. But what about we go for Amazon Aurora, which is a fantastic database solution. Using the r.3large it's $170 a month. What about if we try an r3.xlarge? $340.38 a month, so that's a saving of around $200 in a month on operating costs. We should go with Amazon Aurora. So why would we look at Amazon Aurora over MySQL? Well Amazon Aurora employs highly durable SSD backed virtualized storage layers. They're purpose built for database work loads. We're able to create read replicas very easily upto 15 read replicas, and Amazon Aurora can promote a read replica to a primary automatically. Amazon Aurora storage is fault tolerant, transparently handling the loss of up to two copies of data without effecting databases write availability and up to three copies without effecting read availability. So that's a very highly available environment perfect for our type of use case. And Amazon Aurora storage is essentially self healing, so data blocks and discs are continuously scanned for errors and replaced automatically. A highly available solution at the best price. A perfect use case for our Acme widgets support page. Let's just compare that again. 557 to 340 yeah, that's a big difference. So let's migrate our database to Aurora. The process is very simple, we just have to take a Snapshot of a MySQL Instance and set up a new DB Instance from the Snapshot. One advantage of using Amazon Aurora is that we don't need to provision storage, as IWS will do that automatically for us. Now that the Snapshot is ready, we just have to select it and then select Migrate Snapshot. We select the Instance class, and check other settings. Okay, the database instance has deployed. Now we need to update the application files with this new database end point. All the other information and the application file can remain the same. So let's sync our local file with the S3 barcode we are using. Now that that's synced, we can go ahead and remove all the Web Instances that we originally created. And then wait for the Auto Scaling Group to launch new ones, that will download the new application configuration as part of the bootstrap routine. Now, all our instances are registered in the Elastic Load Balancer. We'll now go to our widget support side to check and see how it looks. Everything is working as it was before so our migration has been a success. Now, these are the metrics for the monitoring scripts that we deployed on the Web Instances as part of the bootstrapping routine. Elastic Load Balancers provide metrics in one minute intervals by default, which is great. EC2 ClubWatch provides five basic metrics at five-minute intervals. Now, by using these scripts that we install with the agent and bootstapped time with our instances, we can monitor things like memory and disk usage which are not available by default in the CloudWatch basic metrics for EC2. The basic CloudWatch monitoring provides five EC2 metrics which are recorded in five-minute intervals. Custom monitoring provides metrics in one minute intervals and with the script, we can monitor disk and memory usage in the interval time. If you want to set these monitoring scripts they are available to download. They are provided by IWS and you can get them from this page, just follow the instructions to get them working in your own environment. Anyway, back to our deployment. We deploy the agent that's required to capture this information and send it to the Instance log, to CloudWatch in our bootstrap routine, which is now part of our Auto Scale config file routine. Here we can see the logs, for each one of our instances. This can really help us diagnose and troubleshoot applications running in our auto scaling groups. As an Instance is terminated, it's not possible to access the EBS volumes with the application log data. So storing the log information somewhere else and using it in CloudWatch as we are can be very helpful in troubleshooting and diagnosis. You can follow this guide to deploy the agent to send the logs from the Linux Instances to CloudWatch. Let's review our design diagram which illustrates everything that we've built with high availability and fault tolerance in mind. Our Aurora database is sitting on our backend on an RDS Instance. Aurora gives us additional availability and automatic fail over, and order scaling is maintaining three or more instances in three separate availability zones in our Web tier. Traffic requests from our Instances are managed by our Elastic Load Balancer. Our Elastic Load Balancer, is checking the health about Instances and will only send traffic to those Instances that return a healthy response. Keep in mind that Elastic Load Balancer just checks the health, And then sends traffic to healthy Instances. Auto Scaling does the hard work in adding or removing Instances. Elastic Load Balancer also provides another layer of security in that it terminates inbound connections from the public domain. So, if we're using HTDPS, as a cell, then Elastic Load Balancer can terminate all four of those SSL connections. Elastic load balancer is a managed service so we don't need to configure or size it. IWS provisions Elastic Load Balancers to meet the incoming Load. As a distributed delivery network CloudFront is providing an additional layer of durability as well as ensuring the best performance for our customers and users. So, we've used nine of our top 10 high-availability services. We have three EC2 Instances running in three availability zones. They are behind an Elastic Load Balancer. We have security groups set up in our VPC. We have Auto Scaling Groups adding and removing Instances based on the metrics set in our Auto Scale policy. Our Autoscaling configuration file bootstrips out instances, and we have Route53 directing traffic to our Elastic Load Balancer. Route53 enables us to add fail over static side is an additional layer of protection should we want it. We have CloudWatch capturing detail metrics from our Instances, which is sent by an agent that we install as part of the bootstrapping routine. So, CloudFront provides another abstraction to our availability and security as it provides us with detailed monitoring which allows us to act and react to any changes in our environment. So, if we look at our list, the only missing component we have is SQS, I've discounted Elastic IP addresses because our Elastic Load Balancer is acting as our termination point. So if we are adding simple queue service we could try to look for another layer of availability. SQS could be used to queue filed requests from the Web server to the DB for instance. If we do experience extreme load and neither of those two layers can support a type of activity that's requested. We can then queue requests and have those responses sent back to users in the most appropriate and speedy way we can possibly manage. We certainly would set up notification to allow us to keep monitoring of our Autoscaling group activities and it really is a question of adding a topic and then providing information we want to there. So, I think we have a fantastic design here.
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.