Data management is a key part of the infrastructure of most organizations, especially those dealing with large data stores. For example, imagine a team involved in scientifical analysis of data: they probably require a system to store the raw data in, another to analyze chunks of data quickly and cost-efficiently, and long-term archival to keep both the raw data and the result of their computation. In cases like that, it's important to deploy an automated system that can move data efficiently with integrated automatic backups.
In this course, the experienced System Administrator and Cloud Expert David Clinton will talk about implementing such a data management and backup system using EBS, S3 and Glacier, and taking advantage of the S3 LifeCycle feature and of DataPipiline for the automation of data transfers among the various pieces of the infrastructure. This system can be enabled easily and cheaply, as is shown in the last lecture of the course.
Who should take this course
As a beginner-to-intermediate course, some basic knoweldge of AWS is expected. A basic knowledge of programming is also needed to follow along the Glacier lecture. In any case, even those who are totally newcomers to these topics should be able to grasp at least the key concepts.
If you want to learn more about the AWS solutions discussed in this course, you might want to check our other AWS courses. Also, if you want to test your knowledge on the basic topics covered in this course, we strongly suggest to take our AWS questions. You will learn more about every single services cited in this course.
If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.
Hi, and welcome to cloudacademy.com's video series on data management using the Amazon Cloud. This video we're going to discuss Managing Glacier archives through Amazon's high level API and specifically using Java. Glacier archives are very cheap about one cent per gigabyte per month, the trade-off as we mentioned before is that there's a very high latency can be hours after an upload for instance before you can access the data and it could be hours before a download is initiated. Also the archives on Glacier can only be accessed and managed via a program API, however given those restrictions Glacier can be a terrific platform for storing less important or data that's important but doesn't need immediate access. Let's take a look at a sample code that will upload an archive from your instance or from your computer anywhere to a Glacier vault. You will already have created a big Glacier vault, we'll assumed that you had no trouble doing that and you're like giving it a unique name and it's important to remember that the name you give your vault is in fact unique. So the first in this case highlighted selection of the code is the imports that is those libraries that are installed with the AWS toolkit for Eclipse, if you happen to be using Eclipse. These libraries are what actually runs your code under the surface, the next string you will have to populate is the vault name just so that your code knows where to look.
The vault name in this case is "myuniquename876" this is just a vault I created in the previous video for demonstration purposes. You'll next populate the string archive to upload, so the program will know which archive actually is meant to be moved up to Glacier. It has to be the archive name in its absolute location, that is in this case "home/awscontrol" which would be the name of a user account in your system whether it's in a Amazon instance or on your local computer. Just I made up the name "awscontrol" it doesn't have to be that name obviously, and it might be an archive which is compressed using tar and Gzip called "mydata" Your code will have to provide credentials, this is done through the profile credentials provider variable which is populated by the file credentials. This file is usually created and stored in the home directory of the user who created it in a hidden directory called aws, .aws means it's a hidden directory in a file called credentials.
There you will replace your access key ID and your secret access key with the values that you can get from your Amazon dashboard. Finally, the code actually does the task it was meant to achieve and that is first by establishing a connection to the Amazon vault you've created through using your credentials and then actually uploading the archive you created and designated to the vault you've designated, that's how we upload.
Downloading archives in Glacier vault is a little different, you'll notice that there are different libraries imported before this code can be run, you'll notice also that you'll need the archive ID that is the ID that Amazon gave your archive when you uploaded it through your Glacier vault. We'll need to insert that to populate the string archive ID you'll also need to provide the absolute download file path. In our case it's "home/awscontrol/Downloads" one additional option, which is applicable both to uploads and downloads is the end point which Amazon region is hosting your vault.
By default the region is set to "us-east-1" if however your vault is hosted in a different region then you can populate the Glacier client set end point value. In this case glacier.us-west-2 if that happens to be the location of your vault.
David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.
Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.
Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.
His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.