1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Deploying a Highly Available Solution for Pizza Time

Sending backups to AWS

Start course
3h 11m

In this group of lectures we run a hands on deployment of the next iteration of the Pizza Time solution. The Pizza Time business has been a success. It needs to support more customers and wants to expand to meet a global market.  

We define our new solution, then walk through a hands on deployment that extends our scalability, availability and fault tolerance. 


Hi and welcome to this lecture.

In this lecture we talk about sending backups to AWS. We talk about Multipart uploads, import export Snowball, and storage gateway.

Maybe you are not using AWS to run your apps, or workloads or whatever. Maybe you're using AWS to send your backup data, or maybe you just don't want to buy a lot of storage appliances and hardware to store your data and also to store your backups inside your company. And you might decide to send your data to AWS. The reasons are many, we are not focused on the reasons in here.

You have a lot of options that you can use. The most relevant for this applications, are S3 and Glacier. You can also create a data replication between your servers. So you can have a server running on Primus, and you can have a copy of that server receiving data every time via a data replication service. You can send your data to the storage gateway service and the storage gateway service will keep snapshots of your data inside S3. The most common thing that you might think when talking about sending backups to AWS is send your data to S3.

You can also send data to Glacier. But one thing you need to know if that you can't upload archives to Glacier, using the AWS management console. That is one of those things that you can only do using the AWSCLI or some STK.

You can also send data to S3 maybe we are talking about operations in here. You want to send your data to S3, and you think that you might use that data for the next 30 days to maybe a quick restore of your data. And if you don't need that data after 30 days, you send that data to Glacier. We saw how to configure a lifecycle route a few lectures ago. And you can do that, you can send your data to S3 and create a lifecycle route to archive it to Glacier. But S3 and Glacier have some limits when sending data through those services, and you might need to use Multipart uploads.

For example, for the S3 service with a single PUT operation, you can upload objects up to five gigabytes in size. If you need to upload objects bigger than five gigabytes, you need to use the Multipart upload. When using the Multipart upload API, you can upload objects up to five terabytes. And you might recall that five terabytes is the maximum size that a single object can have inside S3. So that's the way to send the biggest file possible to S3, is using Multipart uploads. And you can't start a Multipart upload in the AWS management console, you need to use one of the STKs that have support for that.

Talking about Glacier in a single operation, you can upload archives up to four gigabytes in size. That's for Glacier, not for S3. And using the Multipart upload API, you can upload archives up to about 40,000 gigabytes. So again, using the Multipart API is the way to send the biggest archive possible to Glacier.

And AWS encourages you to use Multipart upload to uploads greater than 100 megabytes. That's because Multipart uploads have a better performance when uploading files. So you might want to use that for files greater than 100 megabytes.

Another way to send data to AWS is using the import export service. These days I look at the AWS console and they are calling this service Snowball only. But if you ever hear the word import export, you will know that we are talking about this service. So when you're dealing with the Snowball service, what you do is, you create a job. And inside the job you can link their import or export something to AWS.
So in this case we are going to import data to AWS. We are going to start a job and AWS will send us a Snowball and will copy the data to the Snowball, and then we are going to ship that data to AWS. And when you're exporting the data, the data goes in the opposite direction. We are going to create an export job, AWS will send us the Snowball appliance with your data, and you are going to connect that appliance, we are going to copy that data to our service, and we can repeat that operation if we want. But if we already receive all the data that we want, we can finish that job.

So, in short, what happens is, when you are importing data to AWS, you need to copy the data, and you need to send the Snowball to AWS. And there is no way better to send a lot of files, let's say that you want to send five terabytes of files to somewhere, there is no better way to do that, than putting those files inside a hard drive, and putting the hard drive inside the truck, and shipping that to the final destination. Because when you send data to S3, Glacier, or any other AWS service, you will be dependent on your internet connection and we know that networking sometimes can be an issue if you want to move a lot of files fast. So there is no faster way to move a lot of files than shipping those files inside a hard drive to AWS. AWS will receive the file and copy the file to a destination bucket. And when you're exporting, the data will go in other direction. AWS will copy that data to the Snowball appliance, will ship to you and you will be responsible to copy those files to your service. AWS would be very careful with your data, they will encrypt the data all the way over, and some might say that, Werner Vogels himself is copying the data for you. So they will take very good care of your data.

This is how the Snowball hardware looks like. You can see that it's very reinforced so there will be no way to break this thing, this thing is really, really resistant. And AWS has a few suggestions in regards to your available internet connections.

So you can see here that if you have, for example, a hundred megabytes of dedicated internet connection, AWS recommends that you use the import export service Snowball when you're sending five terabytes of data or more.

Talking now about the storage gateway service. With the storage gateway service, you can create volume gateways or virtual tape libraries. I'll talk first about volume gateways. With volume gateways, you can create cloud backed storage volumes that you can mount as iSCSI devices.

And there are two ways of volume gateways. You can have gateway cached volumes. With gateway cached volumes you have the data is stored on AWS and you have available on you own Primus network, only the most accessed data. So AWS will keep kind of a cached inside your network, and will be sending all the data to the storage gateway service, and the storage gateway service will be saving snapshots off that data inside S3. And we can also have gateway stored volumes. With gateway stored volumes, you have all the data inside your company, and also on AWS. So AWS will be managing the sync of your data between your appliance and the cloud. And you can have gateway virtual tape library, VTL. This is something that might be useful for you if you want to change those kind of things. Those hardwares look like monsters, and there is still today a lot of companies that are storing their backups inside tapes. And if you want to replace those tape libraries, you can use storage gateway. And storage gateway will create a visual appliance so you won't need to buy lots of hardware to do that. And that will use the same protocol that these big hardwares are using. So you can use the same softwares, the same workflows, that you've been using and just pointing the backups to the virtual tape library.

So basically this is how storage gateway works. You need to store, you need to install the storage gateway VM appliance in a virtual machine, inside your network. And you need to choose between gateway cached or gateway storage volumes. That will make a bit of a difference in this diagram.

And you will be creating iSCSI devices. And with iSCSI, for people that don't know what that is, it is a network attached storage block, it's more or less like an EBS from, on Primus. So you have the data, you see that data as a block storage, as a hard drive, but you'll be accessing that data over the network.

You will be creating iSCSI devices and you can read and write the data inside your servers and you can send your desktop users to write and read data directly to these iSCSI devices. And the virtual machine appliance will send the data to the storage gateway service, and the storage gateway service is stored its data as snapshots inside S3.

About the Author
Eric Magalhães
DevOps Consultant

Eric Magalhães has a strong background as a Systems Engineer for both Windows and Linux systems and, currently, work as a DevOps Consultant for Embratel. Lazy by nature, he is passionate about automation and anything that can make his job painless, thus his interest in topics like coding, configuration management, containers, CI/CD and cloud computing went from a hobby to an obsession. Currently, he holds multiple AWS certifications and, as a DevOps Consultant, helps clients to understand and implement the DevOps culture in their environments, besides that, he play a key role in the company developing pieces of automation using tools such as Ansible, Chef, Packer, Jenkins and Docker.