AWS Data Pipeline
In course one of the AWS Big Data Specialty Data Collection learning path we explain the various data collection methods and techniques for determining the operational characteristics of a collection system. We explore how to define a collection system able to handle the frequency of data change and the type of data being ingested. We identify how to enforce data properties such as order, data structure, and metadata, and to ensure the durability and availability for our collection approach
- Recognize and explain the operational characteristics of a collection system.
- Recognize and explain how a collection system can be designed to handle the frequency of data change and the type of data being ingested.
- Recognize and identify properties that may need to be enforced by a collection system.
This course is intended for students looking to increase their knowledge of data collection methods and techniques with big data solutions.
While there are no formal prerequisites, students will benefit from having a basic understanding of analytics services available in AWS. Please take a look at our Analytics Fundamentals for AWS
This Course Includes
- 45 minutes of high-definition videos
- Live hands-on demos
What You'll Learn
- Introduction to Collecting Data: In this lesson, we'll prepare you for what we'll be covering in the course; the Big Data collection services of AWS Data Pipeline, Amazon Kinesis, and AWS Snowball.
- Introduction to Data Pipeline: In this lesson, we'll discuss the basics of Data Pipeline.
- AWS Data Pipeline Architecture: In this lesson, we'll go into more detail about the architecture that underpins the AWS Data Pipeline Big Data Service.
- AWS Data Pipeline Core Concepts: In this lesson, we'll discuss how we define data nodes, access, activities, schedules, and resources.
- AWS Data Pipeline Reference Architecture: In this lesson, we'll look at a real-life scenario of how data pipeline can be used.
- Introduction to AWS Kinesis: In this lesson, we'll take a top-level view of Kinesis and its uses.
- Kinesis Streams Architecture: In this lesson, we'll look at the architecture that underpins Kinesis.
- Kinesis Streams Core Concepts: In this lesson, we'll dig deeper into the data records.
- Kinesis Streams Firehose Architecture: In this lesson, we'll look at firehose architecture and the differences between it and Amazon Kinesis Streams.
- Firehose Core Concepts: Let's take a deeper look at some details about the Firehose service.
- Kinesis Wrap-Up: In this summary, we'll look at the differences between Kinesis and Firehose.
- Introduction to Snowball: Overview of the Snowball Service.
- Snowball Architecture: Let's have a look at the architecture that underpins the AWS Snowball big data service
- Snowball Core Concepts: In this lesson, we'll look at the details of how Snowball is engineered to support data transfer.
- Snowball Wrap-Up: A brief summary of Snowball and our course.
Okay, let's have a look at the architecture that underpins the AWS Snowball big data service. AWS Snowball is a service that accelerates transferring large amounts of data into and out of AWS using physical storage appliances, bypassing the internet. Each AWS Snowball appliance type can transport data at faster than internet speeds. This transport is done by shipping the data in the appliance through a regional carrier. The appliances are rugged shipping containers complete with e-ink shipping labels.
AWS Snowball uses Snowball appliances and provides powerful interfaces that you can use to create jobs, transfer data, and track the status of your jobs through to completion. There are two types of Snowball appliances, the Snowball and the Snowball Edge. There are many options for transferring your data into AWS. Snowball is intended for transferring large amounts of data. If you want to transfer less than 10 terabytes of data between your on-premise data centers and Amazon S3, Snowball might not be your most economical choice.
Snowball uses a Snowball's appliances shipped through your regional carriers. Each Snowball appliance is predicted by AWS key management services and made physically rugged to secure and protect your data while the Snowball is in transit. In the US region, Snowballs come in two sizes, 50 terabytes and 80 terabytes. All other regions have 80 terabyte Snowballs only. Snowballs include a 10 gig Base-T network connection both RJ45 as well as SFP plus with either a fiber or copper interface to minimize the transfer. Before we go into each of the options in detail, let's have a quick look at how AWS makes things easier for you.
One of the great things about AWS is they always try and make things easy for you. The first thing we need to define when creating a new Snowball job is whether we're importing data from our on-premise data center to AWS S3 or exporting data from AWS S3 to our on-premise data center or finally whether we are using Snowball for local storage. Next you need to define the address where the AWS Snowball appliance will be shipped to. Depending on the country you live in, you might also be presented with a number of shipping options. For example, the two in the bottom right-hand side of the screen.
Next, you set which S3 bucket you wish your data to be moved into when the AWS Snowball appliance arrives back at Amazon. Next, you set the security settings you require. And last you can configure whether AWS Snowball sends notifications to you when each of the statuses change. And the last part of that screen is to set the options for which statuses you would like to send you a notification.
About the Author
Shane has been emerged in the world of data, analytics and business intelligence for over 20 years, and for the last few years he has been focusing on how Agile processes and cloud computing technologies can be used to accelerate the delivery of data and content to users.
He is an avid user of the AWS cloud platform to help deliver this capability with increased speed and decreased costs. In fact its often hard to shut him up when he is talking about the innovative solutions that AWS can help you to create, or how cool the latest AWS feature is.
Shane hails from the far end of the earth, Wellington New Zealand, a place famous for Hobbits and Kiwifruit. However your more likely to see him partake of a good long black or an even better craft beer.