These study aids will help refresh your knowledge of the core concepts covered in the Solutions Architect Associate learning path.
Run the 30min primer video before you go in to sit your exam.
The revision cards are included in the learning path items.
09/01/2020 - Updated Exam Primer lecture
Let's review and recall what we've been over in domain one. We learned how elasticity and scalability helped us design cloud services and how AWS provides the ability to scale up and down to meet demand rather than having to provision systems on estimated usage and how that ability increases our agility and reduces our cost, as we only pay for what we use. We saw how the four pillars of the AWS well-architected framework can be a guide for designing with best practices and security. We designed to protect information systems and assets while delivering business value through risk assessments and mitigation strategies.
In reliability, we aim to deliver systems that can recover from infrastructure or service failures and that can dynamically acquire computing resources to meet demand and performance efficiency. AWS enables us to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand changes and evolves, so we need to be always looking for better ways to use services together and to look for ways to break monolithic stacks down to smaller less dependent services.
And then cost optimization. Our goal is to create the best possible outcome for our end customer. We need to avoid or eliminate unneeded cost or suboptimal resources. Now, that may mean using smaller, more loosely coupled services rather than going straight for biggest and best available. We need to always be looking for ways to reduce single points of failure and to reduce costs.
AWS has a global footprint but we may not need to use the biggest instances in multiple regions and it may be that by using multiple availability zones within one region and by using a blend of on-demand and reserved instances, we can create a highly available cost-efficient solution.
EC2 billing can be quite complex so let's just step through this. When you terminate an instance, the state changes to shutting down or terminated and you are no longer charged for that instance. when you stop an instance, it enters the stopping state and then the stopped state and you are not charged hourly usage or data transfer fees for your instance after you stop it, but AWS does charge for the storage of any Amazon EBS volumes. Now, each time you transition an instance from stopped to running, AWS charges a full instance hour, even if these transitions happen multiple times within a single hour. When you reboot an instance, it doesn't start a new instance billing hour.
Let's go through the differences between rebooting, stopping, starting, and terminating. So, in terms of the host, the instance stays on the same host when we reboot, but the instance may run on a new host computer when we stop or start— underline "may." When we terminate, there's no impact in terms of public and private IP addresses. When we reboot, the addresses stay the same.
With EC2 classic, the instance gets a new private a new public IP address. With EC2 VPC, the instance keeps its private IP address and the instance gets a new public IP address unless it has an elastic IP address an EIP—which doesn't change during a stop or start. With elastic IP addresses, the EIP remains associated with the instance when you reboot it. For instance store volumes, when we reboot, the data is preserved, when we stop or start, the data is erased, and when we terminate the data is erased.
So remember that with instance store volumes, data are gone when you stop it or terminate it. The root device volume is preserved during a reboot and the volume is preserved during a stop of start event but the volume is deleted during, by default, during termination. And with billing, during a reboot, the instance hour doesn't change. Each time an instance transitions from stopped to running, AWS starts a new instance billing hour. When you terminate an instance, you stop incurring charges for that instance as soon as its state changes to shutting down.
Okay, a couple of points to keep in mind for the exam. When an instance is rebooted, the host computer stays the same. When an instance is stopped or restarted, the instance may run on a new host. EIP addresses—elastic IP addresses—the EIP remains associated with the host during a reboot. When we terminate an instance, the EIP is disassociated from the instance. Keep in mind too that we can only stop and start EBS-backed instances.
We use that design and cost optimization in our deployment for Acme widgets. We're running instances behind an elastic load balancer and three AZs connected with an Aurora database that replicates our data across three availability zones with automated failover could scale and meet burst activity requirements while remaining a highly available cost-efficient solution.
So in exam questions look for clues to help you determine the business requirements and constraints in any of the scenarios that you get. Look for the recovery time objective and the recovery point objective. The recovery time objective is the maximum amount of time the customer can be without this system in the event of a disaster. The recovery point objective is the last possible point in time that the business data must be recoverable to.
Now, remember that the recovery point objective is generally a time value as well. There are four design patterns we can deploy in AWS to meet RPO and RTO objectives. The first is backup and restore, which is like using AWS as a virtual tape library. It's generally going to have a relatively high recovery time objective since we're going to have to bring back archives to restore first which could take four to eight hours or longer. We're gonna have a generally high recovery point objective as well simply because our point in time will be at a last backup and if for example we're using daily backups only then it could be 24 hours.
Cost-wise, back up and restore is very low and easy to implement. The second option is pilot light and that's where we have our minimal version of our environment running on AWS which can be lit up and expanded to production size from the pilot light. Our recovery time objective is likely to be lower than backup and restore as we have some services installed already and a recovery point objective will be since our last data snapshot. And the third option is warm standby where we have a scaled-down version of a fully functional environment always running an AWS. Now, that's going to give us a lower recovery time objective than perhaps pilot light, as some services are always running, and it's likely that our recovery point objective will be lower as well since it will be since our last data write if we're using asynchronous databases with a master/slave multi-AZ database service.
The cost of running warm standby is likely to be higher than the pilot light or backup and restores. The benefit of warm standby is that we can use the environment for dev tests or for skunkworks to offset the cost. And the fourth option is multi-site where they have a fully operational version of our environment running in AWS or in another region and that's likely to give us our lowest RTO simply because it could be a matter of seconds if we're using active active failover through Route 53.
Our recovery point objective likewise will be significantly lower than other options. If we're using synchronous databases then, yes, it'll be a matter of seconds. If it's still using asynchronous databases, then we're going to be the RPO over the last data write. So the cost and maintenance overhead of running a multi-site environment needs to be factored in and considered. The benefit is that you have a regular environment for testing DR processes.
And another component is AWS Storage Gateway so AWS Storage Gateway connects your on-premise storage with your AWS S3 storage and there's three options that are available. You have a gateway cached volume, gateway stored volume, and then we have a gateway VTL which presents itself like a virtual tape library. The benefit of all three of those is that to end-users each of the storage gateway connections look like iSCSI connections.
Okay, so choice of replication is another consideration when we're talking about design requirements, so synchronous replication is where we have an atomic update to both databases and it's bandwidth and latency dependent. So we need a very good bandwidth and very very high networking to ensure synchronous replication of databases. It generally comes at a higher cost. Asynchronous replication is a non- atomic update that happens to the secondary as network and bandwidth permit.
A benefit of using asynchronous replication is you can use your secondary database as a read replica. A key part of the solution architect associate brief is to be able to recognize how you might use AWS services together to create highly available, fault-tolerant, scalable, cost-efficient solutions. So we ran through the ten AWS components that can help us design cost-efficient, highly available, fault-tolerant systems when used together and those were briefly if you remember regions: AZs which are designed for fault isolation. So having multiple availability zones within one region can often provide a high level of durability and high availability without the need to use more than one region.
If we do want to extend our customers' footprint to another region that's also very possible to migrate AMIs and to migrate data services, etc. from one region to another. Virtual private cloud, which is that secure section of the AWS cloud. It gives us a CIDR block between /16 and /28. The default VPC comes with subnets for your availability zones, an Internet gateway, a default route table, a network access control list, and a security group.
A subnet is a public subnet if it has an Internet gateway and a route in the route table to that internet gateway. Then we looked at the elastic load balancer. It's a managed service which detects the health of instances and routes traffic to the healthy ones.
Now, Elastic Load Balancer adds another layer of availability and security. As a managed service, ELB can terminate or pass through SSL connections and then we had Simple Queue Service that enables us to increase fault tolerance by decoupling layers reducing dependence on server state and helping us manage communications between services. And, of course, Elastic Cloud Compute, EC2, that on-demand computing.
There's instance types available in various flavors: on-demand where you pay hourly, reserved instances where you pay either a one or three-year partial upfront to reduce the cost of predictable usage patterns, then we have scheduled instances which can be booked for a specific time of the day, week, or month and that idea where you have patterns of usage that are quite regular or reports that need to be done on a certain date every month or every year.
Spot pricing is marketplace pricing based on supply and demand basically where you're bidding and paying for unused excess AWS capacity. Often it's a blend of those that can give you the best price.
Now, remembering that placement groups must be in the same availability zone and placement groups do not support micro or medium-sized instances, the elastic IP addresses allow us to maintain service levels by swapping resources behind an elastic IP address and we can have up to five elastic IP addresses per region.
With our elastic IP addresses, if you stop an instance, the elastic IP address remains associated with the instance and then Route 53, that powerful DNS service, we can manage our top-level domains, it can provide graceful failover to a static site in the event of an outage which could be hosted in s3 it can do active active active passive failovers based on elastic load balancer health checks or EC2 health checks and it can support weighted or geo-target traffic distribution.
Ok, so CloudWatch are the eyes and ears of our environment. Great monitoring tools: CloudWatch, CloudTrail, and AWSConfig. For CloudWatch, you get basic EC2 monitoring enabled by default. Basic monitoring provides seven metrics at five-minute intervals and three metrics at one-minute intervals. Elastic load balancing was by default a one-minute interval response. Detailed monitoring enables one- minute intervals on the same metrics but it comes with a charge, so you have to pay extra to use detailed monitoring. CloudWatch also has things like an agent which we installed on our EC2 instances for the Acme widgets deployment, which can send log files to cloud watch and so provide us more instance debugging and reporting information.
Now, CloudWatch notifies of a change in state and the three reporting states are: okay, alarm, or insufficient data. If an instance or ELB has just started it will most likely return an insufficient data state.
Auto scaling has three core components: the launch configuration, the auto scale group, and the scaling plan. So the launch configuration is your template for what you want your machines to do when auto scale starts them and you can basically configure that machine to do exactly what you want with your launch configuration. The auto scale group is literally the group of services that are run inside that group and the scaling plan defines how services are added or removed from their auto scale group. So scaling in. So we want to make our auto scale groups smaller to reduce costs. The whole point of scaling down or in is to reduce your costs so you're only paying for what you use.
So these are the steps that auto scaling goes through to determine which machine to terminate first. First off, are there instances and more than one availability zone? Okay, now if there are, auto scaling applies its policy to the availability zone that has the most number of instances in it. So, if you have two AZs, one's got three instances running and one's got two, auto scaling will apply its rule to the AZ with the three instances in it first. Alright, that's the first piece of logic.
The next logic point is select the instance with the oldest launch configuration. If there are multiple instances using their oldest launch configuration then select the instance closest to the next billing hour. If there are multiple instances closest to the next billing hour then select an instance at random.
Three key steps. First of all, choose the availability zone that has the most instances and apply the rule to that. Second, there are multiple instances. Terminate the one with the oldest launch configuration. And if there's multiple instances on that same launch configuration, choose the one closest to the next billing hour and if you still can't find a difference between them, choose one at random.
Now, remember that that availability zone rule applies even if you have a custom auto scaling policy. AWS has a shared security responsibility model. AWS manages the global infrastructure, the regions, the availability zones, and the edge locations, and some of the foundation services such as compute, storage, database, and networking, and then everything else on top of that is managed by us the customers. So AWS manages security of the cloud and AWS customers manage security in the cloud.
Now, we looked at the four pillars of security in the cloud: data protection, which is protecting data in transit and at rest, privilege management, which is ensuring our users have least privilege to resources, infrastructure protection, keeping the facilities and network secure is the job of AWS, and those detective controls, the regular monitoring and testing to avoid compromise.
So some of the tools that AWS makes available to us via IAM, we have multi-factor authentication which is an additional layer that should be applied to your root account and any privileged users. We can interface with identity providers using AWS roles. We have our passwords and roles provide a very very efficient way for us to connect to applications and third parties without us having to share our security credentials. Now, when we are integrating with other corporate networks, we can use single sign-on or directory services and the Amazon temporary token service or STS and a role enable us to connect to AWS via identity broker.
Along with compliance, there's a number of frameworks and alignments that make it easier for third parties to check or comply with compliance reporting and the AWS security center and the well-architected frameworks can provide some really good guidelines for how third parties can respond to RFPs or run things like penetration testing or to do compliance audits using roles and third-party connectors.
So securing data in transit, all AWS endpoints support SSL and one of the key benefits of the elastic load balancer is that it can terminate or pass through SSL connections. So if we're securing data at rest, two key services: the AWS KMS, or the Key Management Service, and Amazon CloudHSM.
So, looking at the options we have for using this, there's three, so the first is where you control the encryption method and the entire key management infrastructure. So, you can take your whole KMI out of AWS and manage that yourself. The second option is where AWS manages the key storage for you. You will manage the encryption method. You choose whichever way you want to encrypt your content and you manage your own keys. They're stored in CloudHSM. And the third option is where AWS manages encryption and the key management and the KMI infrastructure for you so they do everything on your behalf basically.
Okay, when we're looking at threat mitigation, remember it's about protecting the layers, so we want to reduce our surface area and it's our responsibility to put in place additional controls to limit access what additional filtering or blocking can we add on top of security groups and Network ACLs to provide additional levels of threat protection. Let's start with our core services.
So Amazon Simple Storage Service or S3 provides eleven nines durability and four nines availability. You can put pretty much any object you want into Amazon S3, it's an object storage, it scales automatically, the maximum file size you can upload to Amazon S3 is five terabytes. Objects are stored in buckets. There's three storage types: standard storage class which offers the highest availability and lowest latency, standard infrequent access class, the third level of storage class is what's called Amazon S3 reduced redundancy class and that provides the same 99.99% availability but less durability. Points to remember: each bucket name has to be unique, five terabyte maximum file size, you can't change the region or the S3 part of the access point name, buckets can't be renamed, you can delete a bucket and then reuse the name after a period of time, by default you can create up to a hundred buckets per region, bucket ownership is not transferable, a life cycle configuration on MFA-enabled buckets is not supported.
Elastic Block Store. EBS volumes are replicated within an availability zone, not throughout a region, as is S3. EBS snapshots are stored in Amazon S3, so point-in-time snapshots increase durability by protecting against hardware or loss of services in one availability zone. And EBS is persistent storage rather than ephemeral storage.
The Amazon Glacier. It's low-cost object storage, annual average durability of eleven nines for an archive, redundantly stores data in multiple facilities and on multiple devices within each facility. Glacier stores objects in vaults, there's no maximum or minimum limit to the total amount of data that can be stored in Amazon Glacier, and your individual archives can be up to 40 terabytes. And a common use case is to define Amazon S3 lifecycle rules to automatically archive sets of Amazon S3 objects to Amazon Glacier to reduce storage costs.
DynamoDB is a NoSQL key-value data store.
ElastiCache is a managed in-memory cache which allows you to give fast reliable data access and the underlying engines behind ElastiCache are Memcached and Redis. For Redshift, it's a fully managed petabyte-scale data warehouse.
Elastic Map Reduce is a managed Hadoop framework.
Amazon Kinesis is a fully managed service for processing real-time data streams. You can output from Kinesis to Amazon S3, Amazon Redshift, Amazon EMR, and also to Lambda.
So let's just quickly remind ourselves of the three deployment services that we have and what the use cases are. So if we look at OpsWorks. AWS OpsWorks is a configuration management service that enables you to configure and operate applications of all shapes and sizes using Chef, so it's perfect for DevOps engineers who are looking at automating as much of their environment as possible. As it's Chef-based, it makes it very easy to integrate with other Chef recipes.
AWS CloudFormation is a building block service that enables you to provision and manage almost any AWS resource and it uses a JSON-based, domain-specific language. The third option is AWS Elastic Beanstalk, perfect for developers and people who perhaps don't have a lot of experience with infrastructure or who perhaps don't have the right resource access to build infrastructure, then AWS Elastic Beanstalk can provision and maintain versions for them.
Beanstalk does also integrate with CloudFormation so you can use the two together but it's just a really easy way of deploying an application. So the CloudFormation template, JSON template. A couple of things to keep in mind, it's a real easy way to collect resources together and provision them in an orderly and predictable way. Now, CloudFormation, by default, rolls back the entire stack if there's any issue. It supports Elastic Beanstalk application environments, so you can use those two together. The other thing that you must define are the resources, so parameters and outputs are optional but resources you must define those in the template.
API gateway helps developers develop mobile or deliver mobile web application backends.
Amazon Simple Queue Service, a fast, reliable, and scalable messaging queue service. You can have an unlimited number of messages. The order of delivery is not guaranteed. You can set a message visibility window of up to 12 hours and you can store messages between one minute and two weeks.
A Simple Notification Service, it's a fully managed push service. Topics are used for subscribing and publishing. Simple Workflow Service, it assists developers in keeping state separate from actual units of work. The workflow components are Task, Marker, Timer, and Signal.
Simple Email Service allows you to send email by the Amazon email servers. Elastic Transcoder can transcode media files into various formats required for HLS and HDS delivery to devices.
AWS Lambda is our service for running processes without the need for provisioning and managing EC2 instances. It supports Java, Python, and Node.js. App Stream delivers Windows applications from the cloud to end-users without any code modifications.
Workspaces is a desktop computing service that runs in Microsoft Windows. It incorporates PC- over-IP, or PCoIP, which is a technology from Teradici. It runs Windows on Mac computers, Chromebooks, iPads, Kindle Fire tablets, and Android tablets. You generally get a Windows 7 desktop experience as of today and it can integrate with existing Active Directory environments. Another thing to keep in mind is that the user volume, which is mapped as drive D, is backed up every 12 hours.
So Data Pipeline, a service for reliably processing and moving data between compute and storage services.
Amazon Container Services lets you fully utilize EC2 instances. You can run different layers of the same application or different applications altogether. It comes at no additional cost and you only pay for the EC2 instances that you're using in the ECS cluster.
Okay, now do you manage your time in the exam. If you come across a really hard question, mark it and move on; you're probably better off going through and getting as many of the simple answers as you can get and then coming back and with the time left try to problem solve those ones that you don't quite get. Read as many of the FAQs as you can, and look, good luck, okay? You can do it, you can nail this exam, so go knock it over! Good luck!
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.