AWS Storage and Database


Course Introduction
Course Conclusion
Start course

As we mentioned in the last lesson, AWS EC2 instances are full-fledged computers, but because they are “elastic” you can’t rely on them to keep your data safe. As such, if you have any persistent data you need to store, you’ll need to connect some sort of persistent storage volume to your EC2 instances. Think of this as connecting a hard drive to your on-premise servers.


As we mentioned in the last lesson, AWS EC2 instances are full-fledged computers, but because they are elastic you can't rely on them to keep your data safe. As such, if you have any persistent data you need to store, you'll need to connect some sort of persistent storage volume to your EC2 instances. Think of this as connecting a harddrive to your on-premise servers. AWS makes connecting data storage to EC2 instances easy, and offers a number of options, depending on the types of data you're storing. However, selecting between all these options can be confusing, so in this lesson we'll be using plain language to describe not only how each option works but also what types of use cases it's best for.

First, AWS data-storage services are grouped into two main service groups: AWS storage and AWS database. Generally, the services grouped under storage are more open-ended, while database services focus specifically on managing database software and storage. Confusingly, some of the storage services can also be used with database software, but in this case you'll have to manage the database yourself instead of allowing AWS to do it.

We'll start off talking about AWS storage, then move on to AWS database. As with other lessons, we'll hit only the most used and discussed services. If you'd like to learn about the details of all the possible services, you'll need to continue to more advanced CloudAcademy courses. There are four main AWS storage services you should know about: EBS, S3, EFS, and Glacier. Ugh, that's a mouthful. I feel that sometimes Amazon's love of acronyms can make things really hard for beginners. So let's break each of these down. EBS is Elastic Block Storage, and it's just like your harddrive. S3 stands for Simple Storage Service, and it's specifically designed for storing files. EFS is Elastic File System, and it's an Amazon-built Cloud-based file system. And Glacier is just Glacier, and it's a backup and cold-storage solution.

So, let's start with EBS, Elastic Block Storage. Block storage simply refers to storage that acts like a harddrive. You get a block of storage that is a certain size and doesn't have any particular format, and you can do what you'd like with it, such as install a database or a file system. EBS, or Elastic Block Storage, is the most versatile of Amazon storage offerings. You can connect an EBS volume to an EC2 instance, and your instances will treat it just like they do their onboard storage, only the storage won't disappear when the instances do. EBS is considered elastic because it's easy to add new blocks or even increase the size of your current blocks. But again, the data is persistent.

S3 stands for Simple Storage Service. Simple Storage Service is an object-based storage system. The harddrive in your computer comes with a file system installed which creates a hierarchy of folders inside of which you store your files, and each file has a unique name and location within the file system. In object-based storage, you do away with the folders, and instead only have files stored as objects. These objects contain both the data you want to store as well as metadata about the object, information that is very easy and fast to look up and read. Object storage works really well for storing media files, such as images, videos, or sound files, but isn't as good for storing data or files that will need to be updated frequently. As such, if you have a Web app, you might want to store your users' data on EBS, while storing the images for your app in S3.

Next comes EFS, or Elastic File System. Elastic File System is a file system developed by Amazon for use on EBS. Unlike other file systems, which are not designed with the Cloud in mind, EFS makes it easy to mount a single volume to multiple EC2 instances. As of the publication of this course, EFS is still fairly new and not available to all AWS users, but it is an exciting new technology from Amazon. If you're using EBS, it's worth looking into whether EFS is the right solution for your needs.

Now glaciers are large slow-moving blocks of ice. Oh, wait, that's the Alaska glacier, not the AWS one. The final storage we'll talk about is Glacier. So far, EBS, EFS, and S3 are all designed with connection to the Internet in mind. Your Web app should be able to serve the data from these services directly to your users if that's what you want. Glacier, on the other hand, is primarily a data-backup service. It is really really cheap to store your data on Glacier, but it's harder, slower, and more expensive to upload it or download it. This makes it ideal for data that users don't need to necessarily have access to but which you don't want to throw away yet. Data on Glacier is backed up in multiple places by Amazon, so you don't have to worry about your data ever being lost. Amazon encrypts it so that only you can read it.

Those are the storage services you most need to know about. If you'd like to learn more, there's a course on CloudAcademy called Storage Fundamentals for AWS that covers them all in-depth as well as labs for S3 and EBS that will walk you through creating and setting up your storage on AWS. I actually recorded the videos for the labs for S3 and EBS, so you'll get to hear my voice again if you go through those.

So let's go on to AWS database. I don't want to get too deep into the differences between storage and databases, or we'll be here all day. Suffice to say that a database is a great way to store structure data as well as an engine that lets you write code to query that data, read it and write it, and do analytics on it. Databases are a great way to store data in plain text, but generally not the best way to store files, documents, media, and the like. This is an oversimplification, but will work for now. If you'd like to learn more, I'd recommend doing some research on databases on your own. There are four important databases services for our purposes: RDS, Aurora, DynamoDB, and ElastiCache. Just one acronym this time, so we'll cover that service first.

Amazon RDS stands for Relational Database Service. As the name suggests, this gives you an easy and Amazon-managed way to run common relational databases, like MySQL or PostgreSQL. In relational databases, data is stored in tables, and there is a query language, that's the SQL part, for writing and reading data as well as comparing chunks of data to each other. Amazon RDS offers an easy way to manage your databases. Instead of running the database software on an EC2 instance and hosting the database itself on an associated EBS volume, you can simply create a database of the size and type that you want through RDS, and Amazon will take care of managing all the underlying details.

Amazon Aurora is a MySQL-compatible database that Amazon has designed from the ground up to work on their Cloud servers. It's much faster and more reliable than running MySQL through EC2 or RDS but also more expensive. You can only spin up large databases, meaning that Aurora is most appropriate for large production-ready applications, not for applications that are small, need to be run locally, because it's a Cloud-only software, or are currently in testing.

DynamoDB is, like Aurora, an Amazon-created Cloud-based database, however, unlike Aurora or RDS, Dynamo is a NoSQL database. This means that you can create document-based databases, which don't require the rigidity of datatypes that SQL requires. The details of how NoSQL works and the difficulties of using it in the Cloud would warrant a course unto itself, so let's just say that DynamoDB is an excellent Cloud-based NoSQL option for those looking for it. And if you're looking for it, you probably know.

Finally comes AWS ElastiCache. While the other database options store data in longterm storage, such as harddrives or solid-state disks, ElastiCache works with two databases that store information in memory. Memory is very fast to access but is not persistent, making it good for some applications but not for longterm data storage. The most frequent use of these kinds of databases is to cache information from your storage that you know you'll need access to. This will speed up the access time by a lot. Another common use is for caching HTML generated by your app. This way, your app doesn't have to regenerate the page each time a new person loads it, if nothing has changed, and users will experience a much faster website. ElastiCache works with either Redis or Memcached, two different in-memory database systems.

So those are the most commonly used AWS database services. We weren't able to cover them in depth, so if you'd like to learn more, there's a Cloud Academy course called Database Fundamentals for AWS that's perfect for you, as well as some associated labs, which, again, I recorded, which will walk you through actually setting up different AWS databases. Now you've learned about the most important individual pieces of your AWS app, we'll walk you through how to network those pieces together in the next lecture, and how to manage your app and users in the lecture after that. Let's continue.

About the Author

Adrian M Ryan is an educator and product manager. He was an early employee at General Assembly, has co-founded an education startup and a consultancy, and he loves teaching. He grew up in rural Alaska, and while he now lives in New York City he makes sure to find time to get out in the woods hiking whenever possible.