Data management is a key part of the infrastructure of most organizations, especially those dealing with large data stores. For example, imagine a team involved in scientifical analysis of data: they probably require a system to store the raw data in, another to analyze chunks of data quickly and cost-efficiently, and long-term archival to keep both the raw data and the result of their computation. In cases like that, it's important to deploy an automated system that can move data efficiently with integrated automatic backups.
In this course, the experienced System Administrator and Cloud Expert David Clinton will talk about implementing such a data management and backup system using EBS, S3 and Glacier, and taking advantage of the S3 LifeCycle feature and of DataPipiline for the automation of data transfers among the various pieces of the infrastructure. This system can be enabled easily and cheaply, as is shown in the last lecture of the course.
Who should take this course
As a beginner-to-intermediate course, some basic knoweldge of AWS is expected. A basic knowledge of programming is also needed to follow along the Glacier lecture. In any case, even those who are totally newcomers to these topics should be able to grasp at least the key concepts.
If you want to learn more about the AWS solutions discussed in this course, you might want to check our other AWS courses. Also, if you want to test your knowledge on the basic topics covered in this course, we strongly suggest to take our AWS questions. You will learn more about every single services cited in this course.
If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.
Hi and welcome to CloudAcademy.com's video series on data management on Amazon web services instances.
In this video, we're going to discuss creating and accessing data on S3, Amazon's S3 cloud storage service. Let's create an S3 bucket.
Click on actions, create bucket. We'll choose a unique name. We'll select a region which is closest to where we're going to be doing most of our work, and then create. Let's click on our new bucket and create a folder. Let's call it my data. And select my data. Open the bucket.
Select actions again, and let's upload a file from our own computer to this folder. Upload the file called information. Start upload and we now have this data file information in the folder, my data, in the bucket called our unique name, 678. One more thing we should really do before we leave the dashboard, is click on our account name, click on security credentials, ignore the important security message for now, and we would click on access keys. You can use this to create a new secret access key or just create an access key ID. Either way, you need your access key ID and your secret access key in order to gain access to your data from the command line interface. Now let's explore accessing the data that's in our S3 bucket. The first method we'll use can be used from any Linux command line, anywhere attached to the internet. We use the command, WGet, then http://s3.amazon.aws.com. That's what everybody would use, no matter which bucket they're trying to access. Then the /bucket name/foldername/filename. In our example, to try to get the file that we just uploaded to the S3 bucket, we would type Wget and AmazonAWS.com, then our unique name, 678, which was the name we gave to our bucket, /my data, which is the folder that we created in this bucket, and /information, which is the file we uploaded to the bucket. The file would then be downloaded to whichever directory on whichever computer this command was run from. We could also make use of a wide range of tools on an Amazon instance command line, through the package AWS CLI.
That's Amazon web services, CLI, command line interface. We have to install it first, using pseudo apt-get install awscli. Then we have to configure the AWS CLI package to recognize our credentials, and here's where the access key and secret access key that we saved from our dashboard will come in handy. Type AWS configure on the command line, and you'll be asked a series of questions. The first two require you to fill in or paste your AWS access key ID, and your AWS secret access key. The other questions you can answer or you can just leave blank, our example will work either way. Once we've saved this configuration file, we can now turn to accessing or manipulating the data in the bucket itself from the EC2 instance command line. The syntax that you use for copying a file from the S3 bucket to the current EC2 instance, goes as follows: aws s3 cp. CP stands for copy. S3 tells the command line that we're looking for a file somewhere in the S3 system.
mybucket/myfolder/myfile. And then, my copied file.ext or the name you'd like this file to have on your system. So in our case, let's say we're going to copy the file information that we uploaded previously to the local folder, the local directory. AWS S3CP S3:// our unique, 678, which is the name of the bucket, /my data, which is the name of the folder, /information, name of the file, space dot.
Dot tells this command, AWS to copy that file to the current directory, whichever directory it happens to be. Then, and this is very important, you have to tell AWS which region the S3 bucket lives in. So -- region=ap-northeast-1, in our case.
Let's now copy a file from our instance, our EC2 instance, to the S3 bucket. In this case, the syntax is just the reverse.
AWS S3 CP, the name under and location of the file you wish to copy, S3://mybucket, my folder, my file. In our case, that would be AWS S3 CP/home/ubuntu/information. Let's say that's where the file information happens to live, on our local system. And we're copying that to S3://our unique name, 678/mydata/ to tell the program that this file should be deposited in the my data folder. And again, --region=app-northeast-1.
David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.
Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.
Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.
His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.