1. Home
  2. Training Library
  3. Storage (SAA-C02)

EXAM PREP - Storage


AWS Storage
Introduction to Amazon EFS
Amazon EC2
Amazon Elastic Block Store (EBS)
Optimizing Storage
AWS Backup
SAA-C02- Exam Prep
Start course
2h 46m

This section of the Solution Architect Associate learning path introduces you to the core storage concepts and services relevant to the SAA-C02 exam. We start with an introduction to the AWS storage services, understand the options available and learn how to select and apply AWS storage services to meet specific requirements. 

Want more? Try a lab playground or do a Lab Challenge

Learning Objectives

  • Obtain an in-depth understanding of Amazon S3 - Simple Storage Service
  • Get both a theoretical and practical understanding of EFS
  • Learn how to create an EFS file system, manage EFS security, and import data in EFS
  • Learn about EC2 storage and Elastic Block Store
  • Learn about the services available in AWS to optimize your storage

- The "Storage" section is now complete. So, a great welcome making it this far. So, what did we cover? We looked at Amazon S3, the Elastic File System, also our old favorites from the compute course, EBS and EC2 instance storage. And we also touched on Amazon Fsx, Storage Gateway and Amazon Backup. Now in the exam prep, I want to use this time to ramp up and helping you prepare and pass any questions on storage. So, we touching on some of the most common elements that you might see in the exam. So, let's take a look. So, let me start with Amazon S3. Now you need to know this service pretty well as you'll be definitely getting a few questions on this service. So firstly, some key points: Is highly available, highly durable, very cost-effective, and widely accessible. It's great for use cases such as data lakes, data backups, building websites, and much more. However, there are some key elements that you do need to know. I would say without hesitation you'll experience some sort of question that we'll reference to storage classes that exist, and these are usually relating to cost optimization or the speed of data retrieval. Now remember that Glacier Storage classes are designed for long-term data storage providing the most cost optimized solution. But the drawback is, that they do not offer instant data retrieval. Whereas S3 Storage classes, do offer that instant data retrieval but are more expensive as a result. Now, as I discussed in the previous course there are a number of different storage classes available for S3 but you need to have an insight into when to use one over the other for optimization and news case point. For example, if you had a workload that provided unpredictable pattern access and looking to provide a cost-effective storage solution on S3, you might use S3 Intelligent Tiering over Standard. Or, if you wanted instant access to objects for the lowest cost point where your data could be easily reproduced, if lost, then you would use S3 One Zone Infrequent Access. So, let's look at a question where knowledge of storage classes comes into play. So, a company needs to maintain access logs for a minimum of five years due to regulatory requirements. The data is really accessed once stored but it must be accessible within one day's notice if it is needed. What is the most cost-effective data storage solution that meets these requirements? So, here we're looking for the cheapest storage class that meets all the requirements. So, we know straight away that the cheapest storage classes are those offered by Glacier. But, let's make sure that the question doesn't require instant data retrievals. So, let's take a read over it again. A company needs to maintain access logs for a minimum of five years due to regulatory requirements. Now, straightway, that would be great use for Glacier access cold storage. The data is rarely accessed once stored, but must be accessible with one day's notice if needed. So, that's also good as well. So, we're not looking for instant data retrieval. So, I would be steering towards the Glacier storage classes here because they're cost-effective, you don't need to provide instant data retrieval, and you're asking that data to kept for five years. So, let's take a look at our options that we have. 

So A, store the data in Amazon S3 Glacier Deep Archive storage and delete the objects after five years using the Lifecycle Rule. So, this is a really good option because Deep Archive is the cheapest solution of all. We are keeping the objects for five years and then they are being deleted automatically using Lifecycle Rules. So I would say that that fits the solution, but let's keep reading. B, store the data in Amazon S3 Standard storage and transitioned to Amazon S3 Glacier after 30 days using a lifecycle rule. Now, it doesn't anywhere in the questions say that we need to have instant access to the data. So, I don't see a use case here for the S3 Standard storage class at all. So, I would be reluctant to select that. C, store the data in logs, using Amazon CloudWatch Logs and set the attention period to five years. Now, CloudWatch Logs, isn't actually a storage solution. So, this is just a detractor to make you think about if you're using the right service or not. So, as long as you understand that S3 and Glacier is used for object storage, which is what we're talking about here, then you can roll this one out. And then D, store the data in Amazon S3 Infrequent Access storage class and then delete the object after five years using their lifecycle rule. So, although that would provide a cheap option to the S3 Standard storage class, it won't actually provide the most cost-effective data storage solution, and that's what the question is asking. If you look at that last sentence in the question again it says, what is the most cost-effective data storage solution to meet these requirements? So, I would say the answer here is A. Because using Deep Archive is the cheapest solution and you can keep it five years without that instant data access. So, the answer is A. Now, storage classes aren't the only element of S3 that you need to understand to be prepared for questions covering S3. You should certainly understand S3 Versioning, Lifecycle Rules, Transfer Acceleration, and Basic Security Controls. 

You'll be expected to be able to determine when it's best to use Versioning and Lifecycle Rules to manage your data on S3. So, you might be given a scenario about how you need to keep data highly accessible for 90 days after which it won't be needed to be accessed anymore, but it will be needed to keep for legal reasons. So, what would you implement to enforce this behavior? Would you add Versioning? Well, no, because this is used to allow to cover from previous versions if changes to your objects are made or if they are deleted. Would you use Lifecycle Rules? Yes, certainly, this provides an automatic method of moving your data between storage classes based on time periods. So, you can move your data from S3 Standard to S3 Glacier. Now, one final point on S3, before we look at a question is to ensure that you familiarize yourself with the options to control access to your S3 buckets. Now, you can either use identity-based policies through IAM, resource-based policies using bucket policies, S3 Access Control Lists, in-built public protection settings on the bucket or Cross Origin Resource Sharing. Okay, so let's take a look at a simple question which looks at some S3 features. So the question reads, when storing data in Amazon S3, you can define something to automatically archive sets of Amazon S3 objects to less expensive Amazon S3 storage classes. So, we're looking for something here that can automatically move objects from one storage class to another. So, let's take a look at our options. A, Object Versioning, now object versioning doesn't actually move data between storage classes, it just allows us to revert to a previous version of an object. So, it's not A. B, Lifecycle Rules. So, Lifecycle Rules are used to help you manage your data on S3, and they can be used to transition your objects from one storage class to another after a set period of time, or even delete it after a set period of time. So, that's certainly something that can be used to help answer this question. Object encryption. Again, this is simply used just to encrypt the data rather than to move it between different storage classes to help you save costs. So, it's not C. And then D, AWS Glacier. That is simply mentioning a storage class, it's not actually telling you a management feature of how to move your data around. So, the answer here is B, Lifecycle Rules. Let's look at another question. So, a user has hosted objects on AWS in the bucket cloudacademy. Can a JavaScript of the website www.myvideos.com directly access the objects of the bucket? So, let's take a look at the first answer. A, yes with S3 CORS support. Well, the CORS specification gives the user the ability to build web applications that might request two domains other than the one which supplied the primary content. So yes, this certainly would be an option, but let's carry on and read the rest. So B, yes with S3 IAM support. IAM isn't really gonna help here because IAM relates to identities and permissions of those identities. So, it's not really used to control the permissions allowing another website to access an S3 bucket in this situation relating to JavaScript. So, it's not IAM. C, no, it's a security violation. But, we've already established that we can do this using S3 CORS. So, it's not actually a security violation and you might actually need to do this in a production environment. And Then lastly, D. No, one domain cannot interact with the objects of another domain. That is a false statement, as we've already established with CORS. So, here the answer is A, yes with S3 CORS support because the user can use the CORS support to build web applications that use JavaScript and HTML file to interact directly with the resources in an S3 bucket. Next, we looked at the Elastic File System and this is a scalable network file storage service for use with Amazon EC2 instances that can easily scale to petabytes in size with low latency access providing support to thousands of EC2 instances at once. So, this is very different from S3. Where S3 is used for object storage, EFS is used as file system storage. Again, however, it does have storage classes and varied performance options for you to optimize your file system with. So, make sure you know the difference between the Standard and Infrequent storage classes in addition to performance modes including general purpose and MAX I/O, but also the throughput modes of Bursting Throughput and Provisioned Throughput as well. So, just have a recap of those and just understand when you might use each of those individually. Now, you might receive questions asking you to select the most appropriate performance and throughput mode based on a particular workload. Now, knowing these difference will help you quickly and easily eliminate any wrong answers and help you find the correct answer. I would also recommend you just understand some of the underlying architecture from a connectivity perspective. So, familiarize yourself with mount points and the part that they play with how your EC2 instances connect EFS in using these mount points. Also, if you receive any questions relating to encryption with EFS, then remember it offers both encryption at rest backed by KMS like most AWS services, but importantly, it also supports in transit encryption too which can be configured during the mounting process. So, let's take a look at couple of questions which focus on EFS. So, the first question, an IT department manages a content management system running on Amazon EC2 instance mounted to an Elastic File System. 

The CMS throughput demands are high compared to the amount of data stored on the file system. What is the most appropriate EFS configuration in this case? So, here we need to have knowledge of the performance and throughput modes to help us answer this question. So, let's take a look. So, we're looking for the most appropriate configuration to meet the demands of high throughput. So A, choose Provisioned Throughput mode for the file system. Now, with the Provisioned Throughput mode, it allows you to actually burst above your allocated allowance which is based upon your file system size. So, that could be a really good answer to this question. B, choose Bursting Throughput mode for the file system. Now, the Bursting Throughput mode is the default mode, and with this mode, the throughput scales as your file system grows. But, if we look at the question, it's saying that our throughput demands are high compared to the amount of data stored on the file system. So, the Bursting Throughput isn't really gonna help us here. Whereas, the Provisioned Throughput from answer A actually allows us to burst above our allocated allowance which is based upon the file system size. So between the two of these, I would certainly choose A over B. C, start with the General Purpose performance mode and update the file system to MAX I/O if it reaches its I/O limit. Now, General Purpose is the default performance mode but it has a limitation of only allowing 7,000 file system operations per second, and us we're looking for high performance here. I wouldn't go with the General Purpose performance mode because we know the throughput demands are high. D, start with Bursting Throughput mode and update the file system to MAX I/O if it reaches its I/O limit. Now, we've already established that the Bursting Throughput mode isn't the most appropriate and Provision Throughput would be the better choice here. So, I would also rule out D. So, the answer to this question is, A. Choose Provision Throughput mode for the file system. Okay, let's take another look at an EFS question as well. So, a photo-sharing application uses a fleet of EC2 instances to assemble photos into a package that professional photographers can share with clients. To improve performance, the team uses multiple Amazon EC2 instances in parallel to resize and prepare each image with a watermark. 

Now, during the build process, the EC2 instances requires shared access to common storage to retrieve and save each photograph in the package. After the package is complete, it should be stored in the cloud so that only authorized clients can review the contents. Which of the following storage solutions will allow shared access between EC2 instances and enable clients access to the final package described in this scenario? So, this is quite a big question. There's a lot of information here. So, the way I would approach this, is just read that last sentence again and just to understand exactly what I'm trying to find out. So, which of the following storage solutions will allow shared access. So, we're looking for a shared access storage system between EC2 instances. So, straightaway when we're talking about shared access of storage between EC2 instances, my mind thinks of EFS that Elastic File System. And, we're also looking to enable clients to access the final package described in this scenario. So, lets also looking for a storage solution where others can gain access to it as well. Okay, so let's read through the question again now we have this information. So, a photo-sharing application uses the fleet of EC2 instances to assemble photos into a package professional photographers can share with clients. Okay, that's absolutely fine. It's not really given us a huge amount of information there that we need for our answer. To improve performance, the team uses multiple Amazon EC2 instances in parallel to resize and prepare each image with a watermark. Now, during the build process, the EC2 instances require shared access to common storage to retrieve and save each photograph in the package. So this is where EFS really comes into it's own. Now, after the package is complete it should be stored in the cloud so only authorized clients can review the contents. Okay, so we're looking for shared storage and then a storage solution where others can access it securely. So, let's look our answers. A, use the Elastic Block Storage EBS, for persistent storage shared across EC2 instances and save packages to a private S3 bucket and then create an IAM user for the client and grant this user read access to the bucket. Well, this isn't really gonna work because EBS storage can't be shared across multiple EC2 instances. So, I would rule that one out straight away. B, use the Elastic File Storage, EFS, for persistent storage shared across EC2 instances. Okay, we've already established that EFS would be a good fit for this. That is sounds good so far. Then save the client packages to a private Amazon S3 bucket and provide client access using a CloudFront signed URL. So again, this also provides a storage solution with secure access through that CloudFront signed URLs. So, that's a really good option. C, use Elastic Block Storage. So again, I pretty much know this is the wrong answer because we don't really wanna use EBS and can't really use EBS for a shared file system across multiple EC2 instances. So, I'm gonna roll out straight out. D, use the Elastic File System for persistent storage shared across EC2 instances. Okay, that sounds good. That's what we want to use. And then save packages to a private S3 bucket, create an IAM user for the client, and grant this user read access to the bucket. Now, this is not as secure as creating a CloudFront signed URL because with the signed URL, we can set an expiration date and time, and it gives us more control and access over our content. And also, we don't really want to be given out IAM credentials for third party clients with access to our buckets. So here, the answer is B. Okay, so moving on from EFS, we also looked at the Elastic Block Store and we cover this in the compute section as well, but in this section would cover it at a greater depth. Looking into the service as a whole not just from an EC2 stand point. So right off the bat, the key points for EBS to remember for the exam are that: It is persistent data. Meaning the data will not be lost if you terminate the instance that the EBS filling is attached to. It's a really flexible storage option for your EC2 instances. And your knowledge of this flexibility will be assessed in the exam. One element to really focused on for the certification are the EBS snapshots; how they work, where they are stored, and also how they work when encryption is applied. This is all covered in the course. So, ensure you understand these key points. Now you might be presented with a scenario where you have an unencrypted EBS volume that now needs to be encrypted within a different region. How would you go about doing this? Well, one option will be is to take a snapshot of the volume, copy this snapshot to the right region, and then create a new volume from this copied snapshot and select encryption during the volume creation. 

So, you need to understand what you can and can't do with a snapshot. For example, you can't create an unencrypted volume from an encrypted snapshot. Similarly, you can't create an unencrypted snapshot from an encrypted volume. So let's take a break away from this and take a look at some example questions relating to EBS. So, the first question. A research institution has a growing number of source code repositories hosted on multiple servers and managed by different IT departments. The institution plans to consolidate these repositories and deploy GitHub Enterprise Server on a single Amazon EC2 instance. The team needs to select the most appropriate Amazon storage service for the data volume of the instance running the GitHub server enterprise. Which of the following storage solutions would the team choose for this deployment? So again, we're looking for the most appropriate storage service for the data volume. So, let's just try and rule out any unnecessary information from the question. So, a research institution has a grown number of source card repo's based on multiple servers and managed by different IT departments, okay? That's absolutely fine. It doesn't really help us choose a storage service here. The institution plans to consolidate these repo's and deploy GitHub Enterprise Server on a single Amazon EC2 instance. Okay, so we're just looking for storage solutions on a single EC2 instance. Now, that's key. So, let's look at our options. Now, we know that it's just a single EC2 instance and we're looking for the most effective storage service. So looking at A, we have the Elastic File Service, EFS. Well, this is used as a shared file system across multiple EC2 instances. 

So, that's not really effective in this use case could just look in a single instance. S3, now S3 isn't a storage service that can be attached to an EC2 instance. That's a huge object by storage that's widely accessible. So, that's not the answer. C, EC2 Instant Storage. Now, this can be attached to a single EC2 instance, but remember, Instant Storage is a ephemeral, so this EC2 instance was to be stopped or terminated then we'd lose the entire GitHub Enterprise Server data. So, that's not really an option either. Then we have D, the Elastic Block Store. Again, EBS can be attached to a single EC2 instance, and it is persistent. So, this is the most appropriate storage solution in this case. Let's look at one more question relating to EBS. So, Amazon Elastic Block Store provides block level storage volumes for use with EC2 instances. EBS volumes are highly available and reliable storage volumes that can be attached to any running instance that is in the same availability zone. Which of the following would be considered a feature of EBS? So all this is asking here is what points are true relating to EBS volumes. So, let's read through them. A, EBS volume behave like raw, unformatted block devices. You can create a file system on top of these volumes, or use them in any other way you would use a block device like a hard drive. That is definitely true. You can just simply attach it to an EC2 instance, install a file system at the top, and then use it like a hard drive. B, EBS volumes are created in a specific region and can then be attached to any instance in that same region. That is incorrect. EBS volumes are only available within a single availability zone. So, you can't detach the volume from one EC2 instance in YZ, and then reattach it to another instance in a different AZ. C, you can use encrypted EBS volumes to meet a wide range of data-at-rest encryption requirements for regulated slash audited data and applications. Yes, you can encrypt EBS volumes with ease and it can be backed by KMS as well. So A and C are true so far. D, you account does not have a limit on the number of EBS volumes that you can use, and the total storage available to you is also unlimited. So again, that is incorrect. There are limitations on storage especially the size of storage that you can have. So that's incorrect. E, you can create point-in-time snapshots of EBS volumes, which are persisted to Amazon S3. And the snapshots protect data for long-term durability, and they can be used as a starting point for new EBS volumes. That is also correct. You can take backups of your volumes and then create new volumes from those backups which are stored on S3. So the answer here is A, C, and E. Okay, moving on. We also looked at FSx at a high level. It's not mentioned on the exam at any significant level but you should be aware of it, and what it is, and when you might use it. So remember, then it's another file system storage service much like EFS, but also note that FSx comes in two flavors. FSx for Windows which provides a Windows File Server used as a fully managed native Windows File system on AWS, and it uses the Server Message Block protocol, SMB. The other being FSx for Lustre which is a fully managed Linux-based File System, and this is designed for compute intensive workloads and high performance computing. So, as long as you remember those components for the exam you shouldn't need much more than that when it comes to FSx. Now, the final service I want to talk about is AWS Storage Gateway. Again, there are some key points to focus on. Firstly, is use case. It provides a gateway between your own data center storage systems and Amazon S3, and Glacier. For giving you a hybrid storage solution with unlimited space. Now, secondly, there were three types of gateway that you need to know. The file, volume, and tape gateway configuration. Now in the exam, you'll be given a scenario and asked which solution would be best based on specific criteria. So, ensure you have a good understanding of the differences between the three. Let's take a quick look at an example question where we all need to understand the differences between each of them. So, the question reads. A company maintains an on-premise data center and performs daily data backups to disk and tape storage to comply with regulatory requirements. The IT department is looking for an AWS cloud solution to backup its data. The IT department responsible for this project plans to continue maintaining the primary data onsite and is looking for an AWS cloud solution for data backup that will work well with their current archiving process. What should the following AWS storage services should the team choose to manage its data backup requirements? So, we're looking at data backup pair and there are some key words that jumped out on me as I was reading through them. First thing was that the primary data will remain on site. And also, secondly, they're looking at archiving, and when you think of archiving and cold storage I think of the services that are backed by AWS Glacier, so that long-term storage. And again, we're looking at backup here. So, let's take a look at each of our options. So, we have the AWS Volume Gateway. Now, with Volume Gateways, you get Stored volume and also Cached Volume Gateways. The Stored Volume Gateway provides a backup solution to backup your local storage to S3, whereas the Cached volume gateway the primary data storage is on S3 rather than your local storage solution. So, it's definitely not the Cached volume gateways, and as there's no stipulation here as to which type of volume gateway it is, it's not really a viable answer because like I say, with volume gateways you get Stored Volume in gateways where your primary data is on site or Cashed Volume Gateways where your primary data is on S3. But as we know in the question here, the primary data must remain on site. So, it doesn't give enough information to say yes to Volume Gateway. The File Gateway, with File Gateways, again, the primary storage is mainly on S3. So, that doesn't really work with our solution here either. C, AWS Tape Gateway. Now, with Tape Gateway, this is essentially a virtual tape library and allows you to back up your data to S3 and it leverages Amazon Glacier for data archiving. So, this is a really strong contender. And then lastly, AWS Glacier. Although we do want to store our data in Glacier, it's not actually a storage solution that will help us move the data from on-premises into Glacier itself. So, that isn't an option either. So, the answer here and the most appropriate answer is AWS Tape Gateway. So, just a quick wrap up before we move on to the next section. If you get any questions about unlimited object storage think S3 and about storage classes and how Glacier is used for long-term data storage. How you can use Lifecycle policies and Versioning to help with data management. Transfer acceleration for getting data into S3 faster. If you get any questions about persistence of data with EC2 instances, think EBS volumes, block storage, EBS snapshots as backups, and encryption is also possible. If questions appear related to network file systems, think EFS running the NFS protocol. Mount points for connecting your EC2 instances, multiple availability zones, encryption in transit and at-rest, and thousands of concurrent connections. If any questions relate to Windows file systems using the Server Message Block protocol, think Amazon FSx for Windows, or if anything relates to file systems for high-performance computing using Linux instances think Amazon FSx Lustre. Now lastly, if any scenarios appear talking about backing up data between your own corporate data center and AWS using S3 Glacier, think AWS Storage Gateway either using File, Volume or Tape Gateways. Okay, that's it for me, now you're ready to tackle the next section.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.