1. Home
  2. Training Library
  3. Storage (SAP-C02)

Object Storage

Contents

keyboard_tab
Course Introduction
1
Introduction
PREVIEW2m 16s
AWS Storage
2
Introduction to Amazon EFS
Amazon EC2
36
Amazon Elastic Block Store (EBS)
Optimizing Storage
40
41
AWS Backup
PREVIEW3m 50s
AWS Storage Gateway
Performance Factors Across AWS Storage Services
49

The course is part of this learning path

Start course
Overview
Difficulty
Intermediate
Duration
4h 13m
Students
61
Ratings
5/5
starstarstarstarstar
Description

This section of the AWS Certified Solutions Architect - Professional learning path introduces you to the core storage concepts and services relevant to the SAP-C02 exam. We start with an introduction to AWS storage services, understand the options available, and learn how to select and apply AWS storage services to meet specific requirements. 

Want more? Try a Lab Playground or do a Lab Challenge

Learning Objectives

  • Obtain an in-depth understanding of Amazon S3 - Simple Storage Service
  • Learn how to improve your security posture in S3
  • Get both a theoretical and practical understanding of EFS
  • Learn how to create an EFS file system, manage EFS security, and import data in EFS
  • Learn about EC2 storage and Elastic Block Store
  • Learn about the different performance factors associated with AWS storage services
Transcript

Hello, and welcome to this lecture, where I will discuss performance factors with AWS object storage. And of course, we’ll be using Amazon S3 for our object storage. So S3, and object storage in general, is great when you're dealing with large amounts of unstructured data. And it’s important to note, even if we’re storing what we would consider to be files in the traditional sense in S3, we’re still going to refer to them as objects that reside in object storage.

So while block storage will store data in chunks based on the size of the data, object storage stores your data in a single object. That means that unlike block storage, where updating data might only require a single block out of many to be rewritten, with object storage the entire object will always have to be rewritten every time, no matter what. Another difference between block and object storage is that while your block storage can be mounted as a drive within your operating system, you’ll need to use HTTP, HTTPS, or AWS APIs via the CLI or SDK to access your objects in S3.

Now one major advantage of object storage with S3 is that it’s inherently highly available and extremely durable. It’s also very low cost and infinitely scalable, far beyond the limits of a single EBS volume. And from a disaster recovery standpoint, it’s easy to configure cross-region replication to ensure your data is stored across multiple AWS regions, which makes it great for storing things like backups.

Now with object storage, the objects you’re storing will consist of more than just the data or files themselves. Each object will also include some sort of globally unique identifier for the object, along with some additional metadata. And this metadata can be everything from system-defined metadata such as creation date, or image format and resolution, to user-defined metadata, which could be any kind of key-value pair you can imagine. So in S3, objects are stored in buckets and are assigned a name, which is considered the object’s key. And it’s this key, along with the version ID for a particular object, that serves to uniquely identify that object within S3. So you can essentially think of S3 as a huge key-value store, where the object’s name is the key and its value is the data itself. And each object’s associated metadata is then maintained in a separate key-value store also associated with that object.

So that was a quick overview of object storage with S3. And if you’re interested in learning more about S3, I encourage you to check out this learning path. But I want to wrap up this lecture by discussing performance factors in S3 along with some design patterns to help you achieve the best possible performance within your applications that use S3.

So when we talk about improving performance with S3, we’re generally interested in the speed and reliability with which we can put objects into S3 and retrieve objects from S3. Now it should go without saying that if you have an application with EC2 instances that communicate with S3, your EC2 instances should reside in the same region as your S3 buckets. And this is not only to give you the best performance but to keep your data transfer costs down as well.

Now if your requirements dictate that you must upload data to an S3 bucket in a region that is not geographically close to you, that added distance can increase latency and have a significant impact on performance. So the best way to mitigate this is to leverage S3 Transfer Acceleration, which uses CloudFront edge locations to give you the fastest upload path between your location and your S3 bucket using AWS network infrastructure. To learn more about S3 Transfer Acceleration, check out the AWS documentation here.

And speaking of CloudFront, if you’re hosting a static website within S3, you should definitely stand up a CloudFront distribution in front of that website to enable caching, reduce latency, and improve your site’s overall performance. To learn more about using CloudFront with S3, check out this hands-on lab.

Another thing you can do to improve upload performance is to leverage multipart uploads in S3. Multipart uploads are required for any objects that are over 5 gigabytes in size, but are also a good idea to use when uploading any objects over 100 megabytes in size. And that’s because you can issue multiple PUT requests in parallel to improve throughput and overall performance. To learn more about Multipart Upload, check out the AWS documentation here.

 

It’s important to think of S3 as a large distributed system rather than a single endpoint that might be subject to bottlenecks like a disk drive or file share. And because of this, just like we can issue multiple concurrent PUT requests for our multipart uploads, we can also improve our download performance by issuing multiple concurrent GET requests. So to do this, we can add an HTTP header called Range to our GET requests to use what’s called a byte-range fetch. This allows us to massively increase throughput with our object downloads from S3 by downloading multiple smaller chunks of an object at the same time. And since there are no limits to the number of concurrent connections you can make to your S3 bucket, you can really massively increase your download throughput by doing this. To learn more about downloading objects from S3, check out the AWS documentation here. And that will wrap up our discussion of different performance factors with AWS object storage.

About the Author
Students
27799
Courses
23
Learning Paths
11

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.