1. Home
  2. Training Library
  3. Amazon Web Services
  4. Amazon Web Services Courses
  5. Performance Factors Across AWS Storage Services

Object Storage

Contents

keyboard_tab
Performance Factors Across AWS Storage Services
1
Introduction
PREVIEW1m 46s
6
Summary
PREVIEW1m 10s
Start course
Difficulty
Intermediate
Duration
22m
Students
14
Ratings
5/5
starstarstarstarstar
Description

In this course, I will discuss different performance factors across AWS storage services.

Learning Objectives

  • Identify the proper AWS storage services to use within your architectures as well as various configuration options within each service that can impact their performance, the differences between block, object, and file storage, as well as the benefits they offer and proper use cases for each
  • The different services AWS offers for each of these storage types 
  • How you can configure the storage types to optimize overall performance

Intended Audience

  • Anyone who needs to determine the right storage services and configurations to use within their AWS Cloud-based architectures to maximize performance and meet other architectural requirements

Prerequisites

  • Have a basic understanding of AWS storage services, as well as experience designing and implementing solutions in the AWS Cloud

 

Transcript

Hello, and welcome to this lecture, where I will discuss performance factors with AWS object storage. And of course, we’ll be using Amazon S3 for our object storage. So S3, and object storage in general, is great when you're dealing with large amounts of unstructured data. And it’s important to note, even if we’re storing what we would consider to be files in the traditional sense in S3, we’re still going to refer to them as objects that reside in object storage.

So while block storage will store data in chunks based on the size of the data, object storage stores your data in a single object. That means that unlike block storage, where updating data might only require a single block out of many to be rewritten, with object storage the entire object will always have to be rewritten every time, no matter what. Another difference between block and object storage is that while your block storage can be mounted as a drive within your operating system, you’ll need to use HTTP, HTTPS, or AWS APIs via the CLI or SDK to access your objects in S3.

Now one major advantage of object storage with S3 is that it’s inherently highly available and extremely durable. It’s also very low cost and infinitely scalable, far beyond the limits of a single EBS volume. And from a disaster recovery standpoint, it’s easy to configure cross-region replication to ensure your data is stored across multiple AWS regions, which makes it great for storing things like backups.

Now with object storage, the objects you’re storing will consist of more than just the data or files themselves. Each object will also include some sort of globally unique identifier for the object, along with some additional metadata. And this metadata can be everything from system-defined metadata such as creation date, or image format and resolution, to user-defined metadata, which could be any kind of key-value pair you can imagine. So in S3, objects are stored in buckets and are assigned a name, which is considered the object’s key. And it’s this key, along with the version ID for a particular object, that serves to uniquely identify that object within S3. So you can essentially think of S3 as a huge key-value store, where the object’s name is the key and its value is the data itself. And each object’s associated metadata is then maintained in a separate key-value store also associated with that object.

So that was a quick overview of object storage with S3. And if you’re interested in learning more about S3, I encourage you to check out this learning path. But I want to wrap up this lecture by discussing performance factors in S3 along with some design patterns to help you achieve the best possible performance within your applications that use S3.

So when we talk about improving performance with S3, we’re generally interested in the speed and reliability with which we can put objects into S3 and retrieve objects from S3. Now it should go without saying that if you have an application with EC2 instances that communicate with S3, your EC2 instances should reside in the same region as your S3 buckets. And this is not only to give you the best performance but to keep your data transfer costs down as well.

Now if your requirements dictate that you must upload data to an S3 bucket in a region that is not geographically close to you, that added distance can increase latency and have a significant impact on performance. So the best way to mitigate this is to leverage S3 Transfer Acceleration, which uses CloudFront edge locations to give you the fastest upload path between your location and your S3 bucket using AWS network infrastructure. To learn more about S3 Transfer Acceleration, check out the AWS documentation here.

And speaking of CloudFront, if you’re hosting a static website within S3, you should definitely stand up a CloudFront distribution in front of that website to enable caching, reduce latency, and improve your site’s overall performance. To learn more about using CloudFront with S3, check out this hands-on lab.

Another thing you can do to improve upload performance is to leverage multipart uploads in S3. Multipart uploads are required for any objects that are over 5 gigabytes in size, but are also a good idea to use when uploading any objects over 100 megabytes in size. And that’s because you can issue multiple PUT requests in parallel to improve throughput and overall performance. To learn more about Multipart Upload, check out the AWS documentation here.

 

It’s important to think of S3 as a large distributed system rather than a single endpoint that might be subject to bottlenecks like a disk drive or file share. And because of this, just like we can issue multiple concurrent PUT requests for our multipart uploads, we can also improve our download performance by issuing multiple concurrent GET requests. So to do this, we can add an HTTP header called Range to our GET requests to use what’s called a byte-range fetch. This allows us to massively increase throughput with our object downloads from S3 by downloading multiple smaller chunks of an object at the same time. And since there are no limits to the number of concurrent connections you can make to your S3 bucket, you can really massively increase your download throughput by doing this. To learn more about downloading objects from S3, check out the AWS documentation here. And that will wrap up our discussion of different performance factors with AWS object storage.

About the Author
Students
37823
Courses
26
Learning Paths
20

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.