1. Home
  2. Training Library
  3. Exam Preparation

Exam Prep - High Availability

Exam Prep - High Availability
Duration1h 34m
star star star star star-half


These study aids will help refresh your knowledge of core concepts covered in the Solutions Architect Associate learning path.
Run the 30min primer video before you go in to sit your exam.
Review the exam prep memory cards

About the Author

Learning paths38

Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe.  His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups. 

- [Instructor] Let's review and recall what we've been over. In domain one, we learned how elasticity and scalability help us design cloud services, and how AWS provides the ability to scale up and down to meet demand, rather than having to provision systems on estimated usage. And how that ability increases our agility, and reduces our cost as we only pay for what we use. We saw how the four pillars of the AWS well-architected framework can be a guide for designing with best practices. In security, we design to protect information, systems and assets, while delivering business value through risk assessments and mitigation strategies. In reliability, we aim to deliver systems that can recover from infrastructural service failures and that can dynamically acquire computing resources to meet demand. In performance efficiency, AWS enables us to use computing resources efficiently to meet system requirements and to maintain their efficiency as demand changes and evolves. So we need to be always looking for better ways to use services together, and to look for ways to break monolithic stacks down to smaller list-dependent services. And then cost-optimization. Our goal is to create the best possible outcome for our end customer. We need to avoid or eliminate unneeded cost or sub-optimal resources. Now that may mean using smaller, more loosely coupled services rather than going straight for biggest and best available. We need to always be looking for ways to reduce single points of failure, and to reduce costs. AWS has a global footprint, but we may not need to use the biggest instances in multiple regions, and it may be that by using multiple Availability Zones within one region, and by using a blend of on-demand and reserved instances, we can create a highly available cost-efficient solution. Let's go through the differences between rebooting, stopping, starting, and terminating. 

So in terms of the host, the instance stays on the same host when we reboot, but the instance may run on a new host computer when we stop or start. Underline may. When we terminate there's no impact. In terms of public and private IP addresses, when we reboot the addresses stay the same. With EC2-Classic, the instance gets a new private and new public IP address. With EC2-VPC, the instance keeps it's private IP address and the instance gets a new public IP address unless it has an elastic IP address, an EIP, which doesn't change during a stop or start. With elastic IP addresses, the EIP remains associated with the instance when you reboot it. For instant store volumes, when we reboot, the data is preserved. When we stop or start, the data is erased. And when we terminate, the data is erased. So remember that with instant store volumes, the data gone. And when you stop it'll terminate it. The root device volume is preserved during a reboot, and the volume is preserved during a stop or start event, but the volume is deleted by default during termination. And with billing, during a reboot, the instance hour doesn't change. Each time an instance transitions from stopped to running, AWS starts a new instance billing hour. When you terminate an instance, you stop incurring charges for the instance as soon as it's state changes to shutting down. Okay a couple a points to keep in mind for the exam. When an instance is rebooted, the host computer stays the same. 

When an instance is stopped or restarted, the instance may run on a new host. EIP addresses, elastic IP addresses, the EIP remains associated with the host during the reboot. When we terminate an instance, the EIP is disassociated from the instance. Keep in mind, too, that we can only stop and start EBS backed instances. We use that design in cost optimization in our deployment for Acme Widgets, where running instances behind an Elastic Load Balancer, in three AZs, connected with an Aurora database, that replicates our data across three Availability Zones, with automated failover, could scale and meet burst activity requirements, while remaining a highly available, cost-efficient solution. So in exam questions, look for clues to help you determine the business requirements and constraints in any of the scenarios you get. Look for the Recovery Time Objective, and the Recovery Point Objective. The Recovery Time Objective is the maximum amount of time the customer can be without the system in the event of a disaster. The Recovery Point Objective is the last possible point in time that the business data must be recoverable to. Now remember that the Recovery Point Objective is generally a time value as well. There are four design patents we can deploy in AWS to meet RPO and RTO objectives. The first is backup and restore, which is like using AWS as a virtual tape library. It's generally gonna have a relatively high Recovery Time Objective, since we're going to have to bring back archives to restore first, which could take four to eight hours or longer. We're gonna have a generally high Recovery Point Objective as well, simply because our point in time will be at our last backup, and if for example we're using daily backups only, then it could be 24 hours. Cost-wise, backup and restore is very low and easy to implement. The second option is pilot light, and that's where we have our minimal version of our environment running on AWS, which can be lit up and expanded to production size from the pilot light. Our Recovery Time Objective is likely to be lower than backup and restore, as we have some services installed already. And our Recovery Point Objective will be since our last data snapshot. And the third option is warm standby, where we have a scaled down version of a fully functional environment always running in AWS. Now that's gonna give us a lower Recovery Time Objective than perhaps pilot light, as some services are always running. And it's likely that our Recovery Point Objective will be lower as well, since it will be since our last data write if we're using asynchronous databases with a master-slave multi-AZ database servers. 

The cost of running warm standby is negligibly higher than the pilot light or backup and restores. The benefit of warm standby is that we can use the environment for dev tests or for skunk works to offset the costs. And the fourth option is multi-site, where we have a fully operational version of our environment running in AWS or in another region. And that's likely to give us our lowest RTO, simply because it could be a matter of seconds if we're using active-active failover through Route53. Our Recovery Point Objective likewise will be significantly lower than other options if we're using the synchronous databases then yes it will be a matter of seconds. If it's still using synchronous databases, then we're going to be an RPO of the last data write. So the cost and maintenance over here of running a multi-sites environment needs to be factored in and considered. The benefit is that you have a regular environment for testing DR processes. And another component is AWS Storage Gateway. So AWS Storage Gateway connects your on-premise storage with your AWS S3 storage. There's three options that are available, you have a gateway-cached volume, gateway-stored volume, and then we have a gateway VTL, which presents itself with a virtual tape library. The benefit of all three of those is that two end users each of the storage gateway connections look like iSCSI connections. Okay so choice of replication is another consideration, where we're talking about design requirements. So synchronous replication is where we have an atomic update to both databases. And it's bandwidth and latency dependent. So you need very good bandwidth and very high networking to be sure synchronous replication of databases. Generally comes at a higher cost. Asynchronous replication is a non-atomic update that happens to the secondary as network and bandwidth permit. A benefit of using asynchronous replication is you can use your secondary database as a read-replica. A key part of the Solution Architect Associate brief is to be able to recognize how you might use AWS services together to create highly available fault-tolerant scalable cost-efficient solutions. So we ran through the 10 AWS components that can help us design cost-efficient, highly available, fault-tolerant systems when used together. And those were briefly, if you remember regions, AZs, which are designed for fault isolation. So having multiple Availability Zones within one region can often provide a high level of durability and high availability without the need to use more than one region. If we do wanna extend our customer's footprint to another region that's also very possible to migrate AMIs, and to migrate data services, et cetera from one region to another. Virtual private cloud, which is that secure section of the AWS Cloud. It gives us a cider block between /16 and /28. The default to VPC comes with subnets for your Availability Zones, and internet gateway, a default route table, a network access controllers, and security group. A subnet is a public subnet if it has an internet gateway and a route in the route table to that internet gateway. Then we looked at the Elastic Load Balancer. It's a managed service, which detects the health of instances and routes traffic to the healthy ones. Now Elastic Load Balancer adds another layer of availability and security as a managed service, ELB can terminate or pass through SSL connections. And then we had simple queue service that enables us to increase fault tolerance by decoupling layers, reducing dependence on service state, and helping us manage communications between services. And of course elastic cloud compute, EC2. That on-demand computing.

 Those instance types available in various flavors. On demand, where you pay hourly. Reserved instances, where you pay either a one or three year partial upfront to reduce the cost of predictable usage patents. Then we have scheduled instances, which can be bought for a specific time of the day, week, or month. And their idea where you have patents of usage that are quite regular or reports that need to be done on a certain date every month or every year. Spot Pricing is market-placed pricing, based on supply and demand basically. Where you're bidding and paying for unused excess AWS capacity. Often it's a blend of those that can give you the best price. Now remembering that placement groups must be in the same Availability Zone, and placement groups do not support micro or medium-sized instances. Elastic IP addresses allow us to maintain service levels by swapping resources behind an elastic IP address. And we can have up to five elastic IP addresses per region. With our elastic IP addresses, if you stop an instance, the elastic IP address remains associated with the instance. And then Route53. That powerful DNS service. We can manage our top-level domains. It can provide graceful failover to a static site in the event of an outage, which could be hosted in S3. It can do active-active, active-passive failovers. Based on elastic load-balancer health checks, or EC2 health checks, and it can support weighted or geo tagger traffic distribution. Okay so CloudWatch are the eyes and ears of our environment. Great monitoring tools. CloudWatch, CloudTrail, and AWS Config. For CloudWatch, you get a basic EC2 monitoring enabled by default. Basic monitoring provides seven metrics at five minute intervals. And three metrics at one minute intervals. Elastic load balancing is by default a one-minute interval response. Detailed monitoring enables one minute intervals on the same metrics, but it comes with a charge. So you have to pay extra to use detailed monitoring. CloudWatch also has things like an agent, which we installed on our EC2 instances for the Acme Widgets deployment, which can send log files to CloudWatch, and so provide us more instance debugging and reporting information. Now CloudWatch notifies of a change in state. And the three reporting states are, OK, Alarm, or Insufficient Data. If an instance or ELB has just started, it would most likely return an Insufficient Data state. Alright. Auto Scaling has three core components. 

The Launch Configuration, the Auto Scale group, and the Scaling Plan. So the launch configuration is your template for what you want your machines to do when Auto Scale starts them. And you can basically configure that machine to do exactly what you want with your launch configuration. The Auto Scale group is literally the group of services that are run inside that group. And then the Scaling Plan defines how services are added or removed from that Auto Scale group. So scaling in. So we wanna make our Auto Scale group smaller to reduce costs. The whole point of scaling down or in is to reduce your costs. So you're only paying for what you use. So these are the steps that Auto Scaling goes through to determine which machine to terminate first. First off, it looks, out of the instances in more than one Availability Zone. Okay, now if there are Auto Scaling applies its policy to the Availability Zone that has the most number of instances in it. So if you have two AZs, one's got three instances running and one's got two, Auto Scaling will apply its rule to the AZ with the three instances in it first. Alright that's the first piece of logic. The next logic point is, select the instance with the oldest launch configuration. If there're multiple instances using their oldest launch configuration, then select the instance closest to the next billing hour. If there are multiple instances close to the next billing hour, then select an instance at random. Three key steps. First of all, choose the Availability Zone that has the most instances and apply the rule to that. Second, there are multiple instances, terminate the one with the oldest launch configuration. And if there's multiple instances on that same launch configuration, choose the one closest to the next billing hour. And if you still can't find a difference between them, choose one at random. Now remember that that Availability Zone rule applies even if you have a custom Auto Scaling policy. AWS has a shared security responsibility model. AWS manages the global infrastructure, the regions, the Availability Zones, and the edge locations. And some of the foundation services such as compute, storage, database, and networking. And then everything else on top of that is managed by us the customers. So AWS manages security of the cloud, and AWS customers manage security in the cloud. Now we looked at the four pillars of security in the cloud. At data protection, which is protecting data in transit and in rest. Privilege management, which is ensuring our users have least privilege to resources. Infrastructure protection, keeping the facilities and network secure is the job of AWS. And those detective controls that regular monitoring and testing to avoid compromise. So some of the tools that AWS makes available to us via IAM. We have multi-factorial authentication, which is an additional layer that should be applied to your root account and any privileged users. We can interface with identity providers using AWS Roles. We have our passwords, and Roles provide a very, very efficient way for us to connect to applications and third-parties without us having to share our security credentials. Now where we're integrating with other corporate networks we can use Single Sign On, or directory services. And the Amazon Temporary Token service, or STS, and a Role enable us to connect to AWS via Identity Broker. Temporary credentials expire after a given period. We can also use identity providers such as Facebook, Amazon, YouTube, et cetera, to enable end users to sign into an application using STS and IAM Roles. Amazon Cognito provides a service that does a lot of this for you. And Amazon Cognito comes included in the iOS, Fire, Android, Unity, and JavaScript SDKs. Now platform compliance. PCI, DSS, SOC 1, 2, and 3, ISO 9001, HIPAA compliance, along with compliance there's a number of frameworks and alignments that make it easier for third parties to check or comply with compliance reporting. And the AWS security center and the well-architected frameworks can provide some really good guidelines for how third parties can respond to RFPs or run things like penetration testing, or to do compliance audits using roles and third-party connectors. So securing data in transit. All AWS endpoints support SSL. And one of the key benefits of Elastic Load Balancer, is that it can terminate or pass through SSL connections. So if we're securing data at rest, two key services, the AWS KMS, or the Key Management Service, and Amazon CloudHSM. So looking at the options we have for using this, there's three. So the first is where you control the encryption method, and the entire key management infrastructure. So you can take your whole KMI out of AWS and manage that yourself. 

The second option is where AWS manages key storage for you, you will manage the encryption method, you choose whichever way you want to encrypt your content, and you manage your own keys. They're stored in the CloudHSM. And the third option is where AWS manages encryption and the key management and the KMI infrastructure for you. So they do everything on your behalf basically. Okay when we're looking at threat mitigation, remember it's about protecting in layers. So we want to reduce our service area. And it's our responsibility to put in place additional controls to limit access. What additional filtering or blocking can we add on top of security groups and network ACLs to provide additional levels of threat protection? Let's start with our core services. So Amazon Simple Storage service, or S3, provides 11 nines durability and four nines availability. You can put pretty much any object you want into Amazon S3. It's an object storage. It scales automatically. The maximum file size you can upload to Amazon S3 is five terabytes. Objects are stored in buckets. There's three storage types. Standard Storage Class, which offers the highest availability and lowest latency. Standard - Infrequent Access Class. The third label of storage class is what's called Amazon S3 Reduced Redundancy Class, and that provides the same 99.99% availability, but less durability. Points to remember, each bucket name has to be unique. Five terabyte maximum file size. You can't change the region or the S3 part of the axis point name. Bucket can't be renamed. You can delete a bucket and reuse the name after a period of time. By default, you can create up to 100 buckets per region. Bucket ownership is not transferable. A Lifecycle Configuration on MFA enabled buckets is not supported. Elastic Block Store, EBS volumes are replicated within an Availability Zone, not throughout a region as is S3. EBS snapshots are stored in Amazon S3, so point-in-time snapshots increase durability by protecting against hardware or loss of services in one Availability Zone. And EBS is persistent storage rather than ephemeral storage. With Amazon Glacier, it's low-cost object storage. Annual average durability of 11 nines for an archive. Redundantly stores data in multiple facilities and on multiple devices within each facility. Glacier stores objects in vaults. There's no maximum or minimum limit to the total amount of data that can be stored in an Amazon Glacier. And your individual archives can be up to 40 terabytes. And a common use case is to find Amazon S3 lifecycle rules to automatically archive sets of Amazon S3 objects to Amazon Glacier to reduce storage costs. Dynamo DB is a NoSQL key-value data store. ElastiCache is a managed, in-memory cache, which allows you to give fast, reliable data access. And the underlying engines behind ElastiCache are Membaches and Redis. For RedShift, it's a fully managed petabyte-scale data warehouse. Elastic Map Reduce is a managed Hadoop framework. Amazon Kinesis is a fully-managed service for processing real-time data streams. 

It can output from Kinesis to Amazon S3, Amazon RedShift, Amazon EMR, and also to Lambda. So let's just quickly remind ourselves of the three deployment services that we have, and what their use cases are. So if we look at OpsWorks, AWS OpsWorks is a configuration management service that enables you to configure and operate applications of all shapes and sizes using Chef. So it's perfect for dev ops engineers who are looking at automating as much of their environment as possible, as it's Chef-based, it makes it very easy to integrate with other Chef recipes. AWS CloudFormation, it's a building block service that enables you to provision and manage almost any AWS resource. And it uses a JSON-based, domain specific language. The third option is AWS Elastic Beanstalk, perfect for developers and people who perhaps don't have a lot of experience with infrastructure, or who perhaps don't have the right resource access to build infrastructure. Then AWS Elastic Beanstalk can provision and maintain versions for them. Beanstalk does also integrate with CloudFormations, so you can use the two together, but it's just a really easy way of deploying an application. So the CloudFormation template, JSON template, coupla things to keep in mind. It's a real easy way to collect resources together and provision them in an orderly and predictable way. Now CloudFormation by default rolls back the entire stack if there's any issue. It supports Elastic Beanstalk application environments, so you can use those two together. 

The other thing that you must define are the resources. So perimeters and outputs they're optional but resources you must define those in the template. Here's a quick memory refresh on some of the other services. API Gateway helps developers develop mobile, or deliver mobile and web application back ends. Amazon Simple Queue Service, fast, reliable, and scalable messaging queue service. You can have an unlimited number of messages. The order of delivery is not guaranteed. You could set a message visibility window of up to 12 hours, and you can store messages between one minute and two weeks. The Simple Notification Service, is a fully managed push service. Topics are used for subscribing and publishing. Simple Workflow Service. It's assist developers in keeping state separate from actual units of work. The workflow components are task, marker, timer, and signal. Simple Email Service allows you to send email by the Amazon email servers. Elastic Transcoder can transcode media files into various formats required for HLS and HDS delivery to devices. AWS Lambda is our service for running processes without the need for provisioning and managing EC2 instances. Supports Java, Python, and Node.js. Appstream delivers Windows applications from the Cloud to end-users without any code modifications. Workspaces is a desktop computing service that runs Microsoft Windows. It incorporates PC-over-IP, or PCoIP, which is a technology from Teradici. And it runs Windows on Mac computers, Chromebooks, iPads, Kindle Fire tablets, and Android tablets. You generally get a Windows 7 desktop experience as of today, and it can integrate with existing Active Directory environments. And another thing to keep in mind is that the user volume, which is meant as drive D, is backed up every 12 hours. So Data Pipeline, a service for reliably processing and moving data between compute and storage services. Amazon Container Services lets you fully utilize EC2 Instances. You can run different layers of the same application or different applications all together. It comes at no additional cost, and you only pay for the EC2 instances that you're using in the ECS cluster. Okay, now do manage your time in the exam. If you come across a really hard question, mark it and move on. You're probably better off going through and getting as many of the simple answers as that you can get, and then coming back and with the time left, try to problem solve those ones that you don't quite get. Read as many of the FAQs as you can. And look, good luck, okay? You can do it! You can nail this exam. So go knock it over. Good luck.