Getting the tools ready
Data management automation
Data management is a key part of the infrastructure of most organizations, especially those dealing with large data stores. For example, imagine a team involved in scientifical analysis of data: they probably require a system to store the raw data in, another to analyze chunks of data quickly and cost-efficiently, and long-term archival to keep both the raw data and the result of their computation. In cases like that, it's important to deploy an automated system that can move data efficiently with integrated automatic backups.
In this course, the experienced System Administrator and Cloud Expert David Clinton will talk about implementing such a data management and backup system using EBS, S3 and Glacier, and taking advantage of the S3 LifeCycle feature and of DataPipiline for the automation of data transfers among the various pieces of the infrastructure. This system can be enabled easily and cheaply, as is shown in the last lecture of the course.
Who should take this course
As a beginner-to-intermediate course, some basic knoweldge of AWS is expected. A basic knowledge of programming is also needed to follow along the Glacier lecture. In any case, even those who are totally newcomers to these topics should be able to grasp at least the key concepts.
If you want to learn more about the AWS solutions discussed in this course, you might want to check our other AWS courses. Also, if you want to test your knowledge on the basic topics covered in this course, we strongly suggest to take our AWS questions. You will learn more about every single services cited in this course.
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Hi, and welcome to CloudAcademy.com's video series on Data management. In this video we're going to discuss how product pricing can impact your choice of Amazon services. Storage isn't necessarily the biggest expense you'll face with your Amazon activities. While Data Management talks about data, and data has to be stored. But there's also the transfer of data from an instance to Glacier, or from an instance to S3, or an EBS volume back to an instance or back to your own computer at home or in the office. These transfers can actually make up the lion share of Amazon charges at the end of the month, and sometimes you can be surprised.
It's not that Amazon is being dishonest. Everything is up front and available clearly on their website. But the billing system is very complex and the things that you're going to be doing with your AWS services are often very complex. And it's not always immediately apparent what exactly it's going to cost. So we're going to discuss a little bit some scenarios that you should be aware of. Storing your data, or storing a great deal of data on your EC2 instance itself can be cheap. One point three cents per hour that the instance is actually running. When you terminate or even just shut down your instance it's not costing you anything at all. That's the most inexpensive rate however, 1.3 cents. Storage opts to optimize instances. That is if you choose an instance AMI that is optimized for storage, that can cost you as much as $6.82 an hour. It may be worth it, or it may be that you should be looking for other data storage solutions. EBS, that is the Elastic Book Store volumes that we discussed in the previous video, they cost you money whether you happen to have attached this volume to an instance, or whether it's just sitting unused. It's storage, and Amazon will charge you for that storage. From 12.5 cents per gigabyte per month for storage alone, but also charges running from 6.5 cents per IOPS per month if you are configuring your own IOPS levels with provisioned IOPS. So it's not just the storage of the Elastic Book Store volume, but also the type of IOPS you're enjoying. S3 usage can cost a remarkably low rate of three cents per gigabyte per month for the first terabyte of data that you store. And that genuinely is a deal considering that you have pretty much instant access for writes and reads to and from this storage. However, transfers aren't free, put, copy, post, or list requests cost a half a penny per 1000 requests. Doesn't sound bad, but a busy website could actually go through 1000 requests in a half an hour. Multiply that by all the requests through the course of a month and it's no longer a trivial amount of money. Glacier Archive and Restore requests will at this point anyway, cost five cents per 1000 requests.
But again, if you're making multiple requests per week, or per day, or per hour, that can add up. Transferring S3 data out to another region besides the region that's hosting this S3 service could cost you two cents per gigabyte for each transfer. Transferred out to the internet beyond the AWS service, up to 10 terabytes a month will cost you 12 cents per gigabyte. Data Pipeline activities and preconditions, that is each activity or precondition that you program into Data Pipeline will cost between 60 cents and $2.50 per month for each. So if you don't have that many activities, just a couple of backups or some database processing, that really is a decent deal. I should add that all these charges are rates that the Amazon website advertises right now. They could go up this week, or they could go down. In fact recently there was a drop in many of the charge rates that Amazon was advertising. The bottom line is, you should be aware of all the possible charges that a particular profile of Amazon services could incur before you start. So let's apply these ideas to a hypothetical real world scenario. You're managing data analysis which requires pretty strong computing power, and a fair amount of data storage. You've got no choice but to use EC2 for the computing, but the data you can move around for greater efficiencies. So let's say your EC2 instance is going to be a compute optimized C34 by large, and it's running 24 hours, 7 days a week, which means some months 30 days. That's 720 hours. At 84 cents an hour that'll be $604.80, which is by far your greatest expense. You import the data using EBS volumes. Your average volume space for your project, this hypothetical project is 25GB. You provision for 1000 IOPS so you get a more reliable and quick read/write response.
That would come to about $65. Your total for EBS, $68.12. Your S3 bucket, which you use for manipulating the data, which gives an S3 total of $19.575. A Glacier Archive of 150GB at one penny a gigabyte is $1.50.
And a Data Pipeline usage, including three activities through the course of the month, total cost for your month of AWS services for this process is $695.195.
David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.
Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.
Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.
His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.