The course is part of this learning path
AWS Data Pipeline
In course one of the AWS Big Data Specialty Data Collection learning path we explain the various data collection methods and techniques for determining the operational characteristics of a collection system. We explore how to define a collection system able to handle the frequency of data change and the type of data being ingested. We identify how to enforce data properties such as order, data structure, and metadata, and to ensure the durability and availability for our collection approach Intended audience: This course is intended for students looking to increase their knowledge of data collection methods and techniques with Big Data solutions.
While there are no formal pre-requisites students will benefit from having a basic understanding of analytics services available in AWS. Recommended courses - Analytics Fundamentals https://cloudacademy.com/amazon-web-services/analytics-fundamentals-for-aws-course/
- Recognize and explain the operational characteristics of a collection system.
- Recognize and explain how a collection system can be designed to handle the frequency of data change and type of data being ingested.
- Recognize and identify properties that may need to be enforced by a collection system.
This course includes:
- 45 minutes of high-defnition videos
- Live hands-on demos
What You'll Learn:
- Introduction to Collecting Data: In this lesson we'll prepare you for what we'll be covering in the course; the Big Data collection services of AWS Data Pipeline, Amazon Kinesis and AWS Snowball.
- Introduction to Data Pipeline: In this lesson we'll discuss the basics of Data Pipeline.
- AWS Data Pipeline Architecture: In this lesson we'll go into more detail about the architecture that underpins the AWS Data Pipeline Big Data Service.
- AWS Data Pipeline Core Concepts: In this lesson we'll discuss how we define data nodes, access, activities, schedules and resources.
- AWS Data Pipeline Reference Architecture: In this lesson we'll look at a real life scenario of how data pipeline can be used.
- Introduction to AWS Kinesis: In this lesson we'll take a top level view of Kinesis and it's uses.
- Kinesis Streams Architecture: In this lesson we'll look at the architecture that underpins Kinesis.
- Kinesis Streams Core Concepts: In this lesson we'll dig deeper into the data records.
- Kinesis Streams Firehose Architecture: In this lesson we'll look at firehose architecture and the differences between it and Amazon Kinesis Streams.
- Firehose Core Concepts: Let's take a deeper look at some detals about the Firehose service.
- Kinesis Wrap-Up: In this summary we'll look at the differences between Kinesis and Firehose.
- Introduction to Snowball: Overview of the Snowball Service.
- Snowball Architecture: Let's have a look at the architecture that underpins the AWS Snowball big data service
- Snowball Core Concepts: In this lesson we'll look at the details of how Snowball is engineered to support data transfer.
- Snowball Wrap-Up: A brief summary of Snowball and our course.
Welcome to Big Data on AWS. We're looking at collecting data with Amazon Kinesis. At the end of this module, you will be able to describe in detail how Amazon Kinesis can be used to collect data within a Big Data solution. We have already seen how you can use AWS Data Pipeline to collect and move data between a number of Big Data services. Let's now have a look at how you can use Amazon Kinesis.
Amazon Kinesis makes it easy to collect, process, and analyze real-time streaming data so you can get timely insights and react quickly to new information. With Amazon Kinesis, you can ingest real-time data such as application logs, website clickstreams, IoT to imagery data, and more, into your databases, your data lakes and data warehouses. We'll build your own real-time applications using this data. Amazon Kinesis enables you to process and analyze data as it arrives and respond it real time, instead of having to wait until all your data is collected before the processing can begin. When choosing a big data processing solution from within the available AWS service offerings, it is important to determine whether you need the latency of response from the process to be in seconds, minutes, or hours. This will typically drive the decision on which AWS service is the best for that processing pattern or use case. AWS Kinesis is primarily designed to deliver processing orientated around real-time streaming.
One of the interesting things when we looked at the storage patterns is that Amazon Kinesis does not store persistent data itself, unlike many of the other Amazon big data services. AWS Amazon Kinesis needs to be deployed as part of a larger solution where you define a target big data solution that will store the results of the string in process. Note that each Amazon Kinesis Firehose delivery stream stores data records for up to 24 hours in case the delivery destination is unavailable, and the Kinesis stream stores records from 24 hours by default, but this can be extended to retain the data for up to seven days.
Amazon Kinesis can continuously capture and store terabytes of data per hour from hundreds or thousands of sources, such as website clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events. Amazon Kinesis provides three different solution capabilities. Amazon Kinesis Streams enables you to build custom applications that process or analyze streaming data for specialized needs.
Amazon Kinesis Firehose enables you to load streaming data into the Amazon Kinesis analytics, Amazon S3, Amazon RedShift, and Amazon Elastic Search Services. Amazon Kinesis Analytics enables you to write standard SQL queries on streaming data. We will be covering Kinesis Streams and Firehose in this module, and we will cover Kinesis Analytics later in Course Four.
About the Author
Shane has been emerged in the world of data, analytics and business intelligence for over 20 years, and for the last few years he has been focusing on how Agile processes and cloud computing technologies can be used to accelerate the delivery of data and content to users.
He is an avid user of the AWS cloud platform to help deliver this capability with increased speed and decreased costs. In fact its often hard to shut him up when he is talking about the innovative solutions that AWS can help you to create, or how cool the latest AWS feature is.
Shane hails from the far end of the earth, Wellington New Zealand, a place famous for Hobbits and Kiwifruit. However your more likely to see him partake of a good long black or an even better craft beer.