This section of the AWS Certified Solutions Architect - Professional learning path introduces you to the core storage concepts and services relevant to the SAP-C02 exam. We start with an introduction to AWS storage services, understand the options available, and learn how to select and apply AWS storage services to meet specific requirements.
Want more? Try a Lab Playground or do a Lab Challenge!
Learning Objectives
- Obtain an in-depth understanding of Amazon S3 - Simple Storage Service
- Learn how to improve your security posture in S3
- Get both a theoretical and practical understanding of EFS
- Learn how to create an EFS file system, manage EFS security, and import data in EFS
- Learn about EC2 storage and Elastic Block Store
- Learn about the different performance factors associated with AWS storage services
Consider the following scenario: you have a data lake workload in a versioned S3 bucket that grows at a very fast and consistent pace. Tons of overwrites are occurring daily and large objects are being uploaded via multipart upload.
And because bad things often happen to good people - that means, that from the data management perspective, you may have incomplete multipart uploads every so often, and tons of out-of-date non-current versions of your data. I’m sure you can probably imagine how costly it is to run this data swamp - I mean, lake.
So it’s important that you consider every cost tool at your disposal. And one of these tools is to manage the storage lifecycle of your data, by moving your data to lower cost storage classes, or deleting data you no longer need. You can, of course, move and delete your data manually, but that can be difficult to manage at scale. So, there are two ways to automate this process:
The first way is to use the S3 Intelligent-Tiering storage class. You’ll pay a monthly object monitoring and automation charge, and in return S3 Intelligent-Tiering will monitor your object access patterns, and automatically move your objects between three tiers: frequent, infrequent and archival. S3 Intelligent-Tiering is recommended for data access patterns that are unknown or unpredictable, and is meant to give a more “hands-off” approach to managing your data lifecycle.
The second approach is by using Lifecycle configurations. You can use lifecycle configurations to transition data to a lower cost S3 storage class, or to delete data. Lifecycle configurations additionally provide options to clean up incomplete multipart uploads and manage noncurrent versions of your data - which ultimately, helps reduce storage spend.
Using lifecycle configurations is the most cost-effective strategy when your objects and workloads have a defined lifecycle and follow predictable patterns of usage.
For example, a defined access pattern may be that you use S3 for logging and only access your logs frequently for at most, a month. After that month, you may not need real-time access, but due to company data retaining policies, you cannot delete them for a year.
With this information, you can create a solid lifecycle configuration based on this access pattern. You could create an S3 Lifecycle configuration that transitions objects from the S3 standard storage class to the S3 Glacier Flexible Retrieval storage class after 30 days. By simply changing the storage class of your objects, you will begin to see significant cost savings in your overall storage spend. And after 365 days, you can then delete the objects and continue to save on costs.
You may find that a lot of your data follows a similar access pattern: you slowly stop needing real-time access to your data, and can eventually delete the data after a certain period of time passes. Or you may have data that you need to save to meet some compliance or governance regulation that can be moved from S3 Standard to archival storage and left alone for long periods of time. Or perhaps, you have a ton of objects in S3 Standard storage and you want to transition all of those objects into the S3- Intelligent Tiering storage class.
If these patterns sound similar to your use case, then using lifecycle configurations makes sense for your workload.
In summary, Lifecycle configurations are an important cost tool that can enable you to delete or transition old unused versions of your objects, clean up incomplete multipart uploads, transition objects to lower cost storage tiers and delete objects that are no longer needed.
Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.