In course one of the AWS Big Data Specialty Data Collection learning path we explain the various data collection methods and techniques for determining the operational characteristics of a collection system. We explore how to define a collection system able to handle the frequency of data change and the type of data being ingested. We identify how to enforce data properties such as order, data structure, and metadata, and to ensure the durability and availability for our collection approach.
- Recognize and explain the operational characteristics of a collection system.
- Recognize and explain how a collection system can be designed to handle the frequency of data change and the type of data being ingested.
- Recognize and identify properties that may need to be enforced by a collection system.
This course is intended for students looking to increase their knowledge of data collection methods and techniques with big data solutions.
While there are no formal prerequisites, students will benefit from having a basic understanding of analytics services available in AWS. Please take a look at our Analytics Fundamentals for AWS
This Course Includes
- 45 minutes of high-definition videos
- Live hands-on demos
What You'll Learn
- Introduction to Collecting Data: In this lesson, we'll prepare you for what we'll be covering in the course; the Big Data collection services of AWS Data Pipeline, Amazon Kinesis, and AWS Snowball.
- Introduction to Data Pipeline: In this lesson, we'll discuss the basics of Data Pipeline.
- AWS Data Pipeline Architecture: In this lesson, we'll go into more detail about the architecture that underpins the AWS Data Pipeline Big Data Service.
- AWS Data Pipeline Core Concepts: In this lesson, we'll discuss how we define data nodes, access, activities, schedules, and resources.
- AWS Data Pipeline Reference Architecture: In this lesson, we'll look at a real-life scenario of how data pipeline can be used.
- Introduction to AWS Kinesis: In this lesson, we'll take a top-level view of Kinesis and its uses.
- Kinesis Streams Architecture: In this lesson, we'll look at the architecture that underpins Kinesis.
- Kinesis Streams Core Concepts: In this lesson, we'll dig deeper into the data records.
- Kinesis Streams Firehose Architecture: In this lesson, we'll look at firehose architecture and the differences between it and Amazon Kinesis Streams.
- Firehose Core Concepts: Let's take a deeper look at some details about the Firehose service.
- Kinesis Wrap-Up: In this summary, we'll look at the differences between Kinesis and Firehose.
- Introduction to Snowball: Overview of the Snowball Service.
- Snowball Architecture: Let's have a look at the architecture that underpins the AWS Snowball big data service
- Snowball Core Concepts: In this lesson, we'll look at the details of how Snowball is engineered to support data transfer.
- Snowball Wrap-Up: A brief summary of Snowball and our course.
Welcome to Big Data on AWS. We are looking at collecting data with AWS Snowball. At the end of this module, you should be able to describe in detail how AWS Snowball can be used to collect data within a Big Data solution. You have already seen how you can use AWS Data Pipeline and Amazon Kinesis to collect and move data between a number of Big Data services.
Let's now have a look at how you can use AWS Snowball. Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS Cloud. With Snowball, you don't need to write any code or purchase any hardware to transfer your data.
Simply create a job in the AWS Management Console and a Snowball Appliance will be automatically shipped to you. You then encrypt and transfer the files to the appliance at high speed. Then return the appliance to AWS and they will move your data to S3 for you. When choosing a big data processing solution from within the available AWS service offerings, it is important to determine whether you need the latency response from the process to be in seconds, minutes, or hours.
This will typically drive the decision on which AWS service is the best for that processing pattern or use case. AWS Snowball is a one-time data collection process, so none of the four processing patterns we typically classify the AWS Big Data Solutions by are applicable. AWS Snowball can be used to move semi structured and unstructured data. This is driven the S3 being the final storage place for the data that you are transferring to AWS.
Structured data in the form of database backups can also be transferred, or if you have a technology solution that allows you to extract records from your database's S files, then those can also be included. Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud. Using Snowball addresses common challenges with large-scale data transfers, including high networks costs, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, and secure, and can be as little as 1/5 the cost of high speed Internet.
Shane has been emerged in the world of data, analytics and business intelligence for over 20 years, and for the last few years he has been focusing on how Agile processes and cloud computing technologies can be used to accelerate the delivery of data and content to users.
He is an avid user of the AWS cloud platform to help deliver this capability with increased speed and decreased costs. In fact its often hard to shut him up when he is talking about the innovative solutions that AWS can help you to create, or how cool the latest AWS feature is.
Shane hails from the far end of the earth, Wellington New Zealand, a place famous for Hobbits and Kiwifruit. However your more likely to see him partake of a good long black or an even better craft beer.