1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Fundamentals of Streaming with Amazon Kinesis

A Streaming Framework

Contents

keyboard_tab
Amazon Kinesis Streaming Fundamentals
1
Introduction
PREVIEW2m 48s
4
Summary
2m 23s

The course is part of this learning path

Start course
Overview
Difficulty
Intermediate
Duration
25m
Students
1112
Ratings
4.8/5
starstarstarstarstar-half
Description

In this course, we take a look at streaming data, why it's important, and how Amazon Kinesis is used to stream data into the AWS cloud.

You'll learn what data streaming is, the problems it solves, and, how Amazon Kinesis addresses them.

We'll also cover, at a very high level, what services exist inside Amazon Kinesis.  These are Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams.

Learning Objectives

  • Understand the fundamentals of streams processing
  • Learn about the features of Amazon Kinesis
  • Learn about the services that make up Amazon Kinesis

Intended Audience

This course is intended for people that want to learn about streaming data, why it's important, and how Amazon Kinesis is used to send data into the AWS cloud.

Prerequisites

  • This course assumes no prior knowledge of Amazon Kinesis, streaming data, or its internals.
  • A general understanding of the AWS cloud.
Transcript

I want to take a few moments to talk about Amazon Kinesis as a streaming framework. That is, Amazon Kinesis, and its features, are really a collection of parts that work together to process data in real time or near real time.

First, a reminder of why streaming data exists. There are a number of common use cases for streaming data.  They include industrial automation, smart cities, smart homes, data lakes, log analytics, and IoT analytics.

Two of the most popular use cases are log analytics feeding into data lakes and IoT analytics. IoT is a broad category of devices.  Think of IoT devices as simply a connected device like a phone, tablet, or smart speaker.  These are connected devices that are almost always sending data.

Events can be things such as search results, financial transactions, user activity, telemetry data from IoT devices, log files, and application metrics.

While in the stream, data is processed dynamically while it is in motion.  This processing can be real-time analytics with machine learning, alerts, or the triggering of one or more actions.

A point that I think must be made here is that, while in a stream, data can be processed but it can not be changed.  Data records are immutable.  If information in a stream needs to be updated, another record is added.  

Consumers are connected to the stream and can aggregate the incoming data, send alerts, and create new data streams that can be processed by other consumers.

A stream-based architecture that matches the flow of data has several advantages over batch-based processing.

One of these advantages is that it has low latency.  Streaming systems can process events and react to them in real-time.  

Another advantage of stream processing is that streams can be architected to reflect how people use applications.  This means streams match real-world processes. 

Put differently, stream processing matches how people interact with the data that surrounds them.  Applications that have a never-ending flow of events are ideal for stream processing.

Recall that, with batch systems, data has to accumulate before processing can start.  

When using stream processing, computation occurs as soon as the data arrives.

Data streaming can enable you to ingest, process, and analyze high volumes of high-velocity data from a variety of sources in real time.

In general, there are five layers of real-time data streaming.  They are the source layer, the stream ingestion layer, the stream storage layer, and the stream processing layer

The source layer is where the data originates. This could be something like data coming from IoT sensors, click-stream data from mobile devices and websites, or application logs.   

The steam ingestion layer is a Producer application tier that collects the source data, formats it appropriately, and publishes Data Records to the stream storage layer   

The stream storage layer acts as a high-speed buffer for data.  The stream processing layer accesses the stream storage layer using one or more applications called Consumers

Consumers read and process the streaming data in near-real time.  This processing could include ETL--Extract, Transform, Load--operations, data aggregation, anomaly detection, or analysis.  

The Consumers deliver Data Records to the fifth layer, the destination.  This could be storage such as a Data Lake or Data Warehouse, durable storage such as Amazon S3, or some type of database.

Clickstream analytics can act as a recommendation engine providing actionable insights used to create personalized coupons & discounts, customize search results, and guide targeted advertisements — all of which help retailers enhance the online shopping experience, increase sales, and improve conversion rates.

As a quick aside, if you're new working with salespeople, the term conversion was new to me when I started working in the cloud.  Instead of data formats, it means converting prospective customers to paying customers.  You might hear the term related to eyeballs.  That is, companies want to convert eyeballs--people looking at products--to customers that return to do more business.

Moving back to use cases, streaming efforts related to preventive maintenance allows equipment manufacturers and service providers to monitor quality of service, detect problems early, notify support teams, and prevent outages.

Streaming data can be used to alert banks and service providers to make them aware of suspected fraud in time to stop bogus transactions and quickly notify them of affected accounts.

Streaming data with sentiment analytics can detect unhappy users and help customer service augment a response and prevent escalations before that unhappiness turns into anger.

Using streaming data with a dynamic pricing engine can automatically adjust the price of a product based on factors such as current customer demand, product availability, and competitive prices in the area

Because of its complexity, the creation of data streaming workflows has a number of challenges. Historically, streaming applications have been "high-touch" systems that have a large amount of human-interaction that make them inconsistent and difficult to automate.

Data streaming applications can be difficult to set up.  Streaming applications have a number of "moving parts" that tend to be brittle.  

The source layer has to be able to communicate with the ingestion layer.  The ingestion layer must be able to put data into the stream storage layer.  

Consumer applications process the data in the stream-storage layer and either put it into a new stream or send it on to its final destination.

It's expensive to create, maintain, and scale streaming solutions built in on-premises data centers in terms of both human and compute costs.

Issues around creating streaming applications continue with scaling operations.  IoT sensor data might be seasonal like monitoring airspeed during hurricane season.  It's important to be able to increase and decrease the number of resources required to store and consume the collected data.  

To address the challenges of creating custom streaming frameworks and applications to stream data into the AWS cloud, AWS introduced Amazon Kinesis.

When developing Amazon Kinesis, AWS engineers recognized that high availability and durability were a necessary part of the service and it was built to minimize the chance of data loss.

As a managed service, AWS provisions the compute, storage, and memory resources automatically upon request.  Streaming applications use APIs to publish and consume data to and from Amazon Kinesis.

Kinesis is fully scalable and elastic.  That is, it can grow to meet a workload's needs and it can shrink to prevent wasting resources that, in turn, waste money.

Amazon Kinesis integrates with a variety of AWS services.  A benefit of this is that it is possible to create workflows with little or no code that do steam processing at scale.    

This brings me to the end of this lecture. Thank you for watching and letting me be part of your cloud journey.

What I hope you got out of this lecture is that streaming is not a thing by itself.  It is a collection of systems that work together to process data in real time or near real time.  

Having a fully managed framework from AWS means that most of the work required to create a streaming data system has been done in advance.  

Instead of worrying about streaming infrastructure, you can focus on what sort of insights and analysis that needs to be done to improve your business or organization.

If you have any feedback, positive or negative, please contact us at support@cloudacademy.com, your feedback is greatly appreciated.

I'm Stephen Cole for Cloud Academy.  Thank you for watching!

About the Author
Avatar
Stephen Cole
AWS Certification Specialist
Students
13471
Courses
14
Learning Paths
4

Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.

Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.

Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.

In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.