In this Tablet Talk, Will discusses all the storage options that are available within AWS. This talk is perfect for you if you're looking for a 10,000-foot view of how each storage option functions within AWS and what it's good at.
- Learn about the different storage options available within AWS
- Determine which option works best for a few basic workloads
This is a high-level course that is intended for those who are new to storage and wish to get a general overview before delving into more detailed courses.
This course is open to anyone who wants to learn about storage on AWS!
Hello, my name is Will Meadows, and today we'll be doing a tablet talk on the different types of storage that is available with an AWS. And this is important to understand because there's many different services that can be used for storage, and they each sort of offer a different modality and a reason why you'd want to use them. So how about we just get started.
So when we're trying to discuss storage within AWS, there's a number of different services here that are important to think about. The first one that most people usually start off with is S3, which is the simple storage service. Another common storage mechanism within AWS is EBS, the Elastic Block Store. This one would be commonly attached to an EC2 Instance, and that's normally where you would find it.
We have another type of storage which is EFS, the elastic file storage, Elastic File System. Another bit of storage that you might be familiar with is FSX, which is very similar to EFS. And then the final little thing we'll probably talk about is just general database storage, and what that all means and sort of relationship to everything else here.
So opening this up, I think it's probably good to first sort of discuss this one right here, Amazon S3, what's it's deal? So S3 is a type of storage called object storage. And what does that mean? It means that when we're trying to keep data within S3, here's some data, it's filled with a bunch of information.
Anytime you go to save some data that stored as object storage, you'll just save the entire object at one time. So there's no just taking this little bit and sort of reuploading it, okay. You can't just upload a little bit of it, the whole thing has to go back up all at once. Whole thing has to be saved all at once when you go save it to the cloud or when you save it back into S3.
So what that means is, when you're doing something with the object storage, in general, especially with S3, you want to Write, look at this huge W, Write once, read many times. Because it's so annoying to write to S3 in that you have to save the whole object. And that means for like every little change, you have to reupload the full object, and so that uses of a bunch of your network throughput to do that if you were gonna write multiple times.
So that's why this Write once read many is kind of the philosophy with S3. So if we have that philosophy with S3, what workloads does that work with? Well, things that, again, Write once read many would be like a webpage. So if you have your computer here and you have a cool webpage, let's say it's the news, and it's got an article about this guy who invented the Venus trap as a robot, then that would be a good place to use S3. 'Cause maybe this image right here, you could store that up in S3, that could be your data. 'Cause everyone's gonna need to see that, and you're not gonna read-write over that image very often. And all of those texts, this could be part of your website, like maybe this was the HTML, right? Make my Hs a little bit more readable here.
This could be something that would be, you write it once and then it gets read hundreds of thousands of times by people, and so this could be your main index page. So this would be a really good example of what to use S3 for, things that people are gonna access a lot that you're only gonna occasionally change, and it's usually images, it's static web content, it could be videos that you post for streaming of some sort.
S3 is a really good repository for things that you need to hold onto for a long time and you don't expect it to change very often, but it's okay if it changes every now and then, and that you expect other people to need to be able to read from fairly frequently. All right, but what if we have the situation where we do need to modify little bits of these files very often, like maybe, this type of information gets changed a lot or infinitely, or we have many different types of files that need to have little subsets saved and change, and we don't wanna have to keep uploading the entire object all at once, what's a better solution? Well, how about we take a look at EBS.
Here in EBS land, boop we're here now, we're gonna try to resolve the problem that S3 has. So EBS has something called block storage. And this type of storage is that something you're already very familiar with 'cause this is what your computer uses. Doesn't use EBS, but it uses a type of block storage. There we go, vaguely readable. And this is where your systems can open up a file again, and access maybe just this small little corner of the file, right?
It brings it in, it changes that three into a 42, and then it pops it right back onto the file, now we've mirrored it. And that's the only thing that changed about it. And this system, if you're using EBS, doesn't require you to upload the whole object, no, it can just change that one little bit and save the file and be totally fine. One of the caveats with EBS though, is that it must be attached to an EC2 Instance.
So if something's attached to an EC2 Instance, it's not necessarily available for other people around the world to just get access to, unless you created some sort of pass through device or website in order to access that over the web. So that means EBS works particularly well as sort of a backend for a web server. It's connected here, this could be like your, again, your EC2 Instance.
And let's say that you're running some kind of game, let's say it's an online multiplayer pong game, why not? This is the ball, and it's bouncing back and forth. And EBS sits real nicely back here as some sort of storage for whatever application you're running on, things like locally keeping track of the state of the game or keeping track of how many users have connected to the server, something of that sort of ilk.
It's basically your hard drive for the EC2 Instance, anything you would normally use your hard drive on your computer for, you can use just like this EC2 Instance can. It comes in sort of SSD flavors, it comes in HDD flavors, has all sorts of different IOPS that you can plan for, like how much you need to actually run the drive. And again, solid state is particularly good for sort of more random access and hard disks are good for throughput, you'll have to believe me that it says throughput.
But we have a question now, since the EBS volume is sort of a one-to-one pairing per instance, how do I connect multiple instances if they all start to need to access the same files? Do I sort of just like hot swap this around with another EC2 Instance and just plug it back and forth, like erase that connection, just go pop, pop, pop, pop, pop really fast? No, no, there's a much better solution, and it's EFS.
All right, now we're over here in EFS land. So what does EFS? It is the elastic file system, and its job is to allow you to connect a bunch of EC2 Instances to one shared network drive. It is a shared file system, and allows each of these to access whatever data they want from it. And its whole goal is it's trying to be the easiest possible way to share files across multiple instances. You don't have to provision any amounts of storage, you just put in there and get charged for what you use.
It is a managed service, which means you don't have to worry about setting up any of the underlying infrastructure, it runs NFS and it's POSIX compliant. One thing that's important to notice though, is that this is for Linux workloads only, and it's pretty dead simple. It doesn't do much besides what's on the 10. If you're looking for some more features, like maybe wanting to run some Windows workloads or you wanted possibly some extra bells and whistles for your users, we can see that in the next service.
All right, so here we are in the next service FSX. As far as I can tell it doesn't stand for anything. Let me double check. No, no, I, I double checked, I can't find out what FSX specifically stands for. Now that aside, maybe it does and I'm going crazy and can find it. That aside, FSX comes in two flavors, which is interesting, it has FSX for Windows File Server, and FSX for Lustre, neat. Now it's neat.
So when we talk about the service, there's this dividing line down the center, these two different flavors of it. It's also a network file system just like EFS, but it comes with a lot more bells and whistles.
So let's start off with the FSX for Windows File Server over here. So as the name implies, this allows you to have a network file server that many different things can talk to. Boop, boop, boop, boop, boop. It's like a menorah. And the selling point is that it gives you some of the administrative features that you would get from Windows File Server, which is, you can have your users access this data through an active directory service. And having active directory service allows you to authenticate your users to make sure you know who they are, it's me, or, you can make sure that they can access the correct type of data that you've allowed them to.
So you have authentication and authorization. And this one is also a fully managed service, I'm making up this acronym right now, FMS, which means, again, you don't have to deal with any of the underlying infrastructure that's built into the service. It runs over SMB protocol, and allows you to have cool things like user quotas and user file restore and other neat integrations that one might expect from a Windows File Server. And on this side, we have FSX for Lustre, the other half of the service. FSX for Lustre is for high performance compute workloads.
In general, it is a very high-performance network file share. It can have hundreds of thousands of concurrent compute users all connected into the system at once, they're all in here. And that's very important for some workloads like high performance compute, some ML stuff, some database analytic stuff, and it's all built on top of the opensource Lustre frame. Is again, also for Linux only, so something to think about.
It offers much better performance than the Windows File Server version, but again, it doesn't have all those cool administrative bells and whistles that you might be looking for. It also has a super neat feature of not only being a network file share, but it also allows you to hook up an S3 bucket to it, which is amazing.
So what that means is, it will be able to see everything within S3 as if it was part of the network file share itself. And the way it does that is by creating sort of a false copy of it or an icon or a reference inside the file share, and any time you go to actually access that bit data, it drags it in from S3 and puts it in the file share for you, so you can manipulate it locally. And then anytime you actually make a save to it, it'll reupload the whole object back into S3. And what makes this super unique is that you can use this to help save a lot of money.
Let's say that you have all of your data in S3 already, 'cause you've been stockpiling all of your information there, and you want it to run some sort of compute on entire dataset. And so you run up one of these high performance compute setups that allows you to blast through all of the information by bringing it locally close to the compute cluster. And once the cluster has done performing, whatever thing it is that you wanted on it, it can go ahead and save all the data back into S3 and then shut down this network file share, 'cause you don't need it anymore, you needed it to only for that brief moment in time. That's pretty powerful. And this is also another fully managed service, so you don't have to worry about any of the underlying.
All right, so that leaves us with the last place to go check out, which is just sort of generic database storage. And it's important to know when you actually need to use a database, when any of these things aren't really that effective. So a database isn't gonna help you store files per se, though it could, it's mostly recommended for user information, transactional information and things of that nature.
AWS offers a bunch of different types of databases that come in the NoSQL variety and the SQL variety. These include things like Dynamo and things like RDS. There's like a total of 11 of these so they say different types of databases that do different things, so I'm not gonna go over each of them. But in general, your database is gonna be there to store information in a completely separate way than S3 would, than EBS, than EFS and different than FSX.
It's gonna store all of your information generally in a table where you can perform queries on the data to try to find relationships between it. This could be like your name, this could be a number, this could be like a Boolean and whatever, this would be like customer one, customer two, customer three. There's also graph databases where it's all about the relationship between various nodes and however well that works. And then there's sort of key value databases. And then there's the cool quantum ledger database that like helping create permanent records of transactions.
So, I think that's a pretty good overview of all the different types of storage that there is out there, and each one of these has very in-depth courses that can tell you more about the service in this brief little tablet talk has. Well, I hope that answered at least sort of some of your questions about what's going on with each of these different types of storage mechanisms. If you have any other questions, feel free to send us an email at firstname.lastname@example.org or you can contact me directly at email@example.com, and we'll try our best to help you out. But until then, please keep learning and have a good one. Bye.
William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.