1. Home
  2. Training Library
  3. Serverless, Component Decoupling, and Solution Architectures (SAP-C02)

A Streaming Framework

Contents

keyboard_tab
Course Introduction
1
Introduction
PREVIEW2m 26s
Utilizing Managed Services and Serverless Architectures to Minimize Cost
Decoupled Architecture
Amazon API Gateway
10
Advanced API Gateway
PREVIEW11m 29s
Amazon Elastic Map Reduce
18
Introduction to EMR
PREVIEW1m 46s
Amazon EventBridge
26
EventBridge
PREVIEW7m 58s
Design considerations
39

The course is part of this learning path

Start course
Difficulty
Intermediate
Duration
4h 43m
Students
80
Ratings
3/5
starstarstarstar-borderstar-border
Description

This section of the AWS Certified Solutions Architect - Professional learning path introduces common AWS solution architectures relevant to the AWS Certified Solutions Architect - Professional exam and the services that support them. These services form a core component of running resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Learn how to utilize managed services and serverless architectures to minimize cost
  • Understand how to use AWS services to process streaming data
  • Discover AWS services that support mobile app development
  • Understand when to utilize serverless services within your AWS solutions
  • Learn which AWS services to use when building a decoupled architecture
Transcript

I want to take a few moments to talk about Amazon Kinesis as a streaming framework. That is, Amazon Kinesis, and its features, are really a collection of parts that work together to process data in real time or near real time.

First, a reminder of why streaming data exists. There are a number of common use cases for streaming data.  They include industrial automation, smart cities, smart homes, data lakes, log analytics, and IoT analytics.

Two of the most popular use cases are log analytics feeding into data lakes and IoT analytics. IoT is a broad category of devices.  Think of IoT devices as simply a connected device like a phone, tablet, or smart speaker.  These are connected devices that are almost always sending data.

Events can be things such as search results, financial transactions, user activity, telemetry data from IoT devices, log files, and application metrics.

While in the stream, data is processed dynamically while it is in motion.  This processing can be real-time analytics with machine learning, alerts, or the triggering of one or more actions.

A point that I think must be made here is that, while in a stream, data can be processed but it can not be changed.  Data records are immutable.  If information in a stream needs to be updated, another record is added.  

Consumers are connected to the stream and can aggregate the incoming data, send alerts, and create new data streams that can be processed by other consumers.

A stream-based architecture that matches the flow of data has several advantages over batch-based processing.

One of these advantages is that it has low latency.  Streaming systems can process events and react to them in real-time.  

Another advantage of stream processing is that streams can be architected to reflect how people use applications.  This means streams match real-world processes. 

Put differently, stream processing matches how people interact with the data that surrounds them.  Applications that have a never-ending flow of events are ideal for stream processing.

Recall that, with batch systems, data has to accumulate before processing can start.  

When using stream processing, computation occurs as soon as the data arrives.

Data streaming can enable you to ingest, process, and analyze high volumes of high-velocity data from a variety of sources in real time.

In general, there are five layers of real-time data streaming.  They are the source layer, the stream ingestion layer, the stream storage layer, and the stream processing layer

The source layer is where the data originates. This could be something like data coming from IoT sensors, click-stream data from mobile devices and websites, or application logs.   

The steam ingestion layer is a Producer application tier that collects the source data, formats it appropriately, and publishes Data Records to the stream storage layer   

The stream storage layer acts as a high-speed buffer for data.  The stream processing layer accesses the stream storage layer using one or more applications called Consumers

Consumers read and process the streaming data in near-real time.  This processing could include ETL--Extract, Transform, Load--operations, data aggregation, anomaly detection, or analysis.  

The Consumers deliver Data Records to the fifth layer, the destination.  This could be storage such as a Data Lake or Data Warehouse, durable storage such as Amazon S3, or some type of database.

Clickstream analytics can act as a recommendation engine providing actionable insights used to create personalized coupons & discounts, customize search results, and guide targeted advertisements — all of which help retailers enhance the online shopping experience, increase sales, and improve conversion rates.

As a quick aside, if you're new working with salespeople, the term conversion was new to me when I started working in the cloud.  Instead of data formats, it means converting prospective customers to paying customers.  You might hear the term related to eyeballs.  That is, companies want to convert eyeballs--people looking at products--to customers that return to do more business.

Moving back to use cases, streaming efforts related to preventive maintenance allows equipment manufacturers and service providers to monitor quality of service, detect problems early, notify support teams, and prevent outages.

Streaming data can be used to alert banks and service providers to make them aware of suspected fraud in time to stop bogus transactions and quickly notify them of affected accounts.

Streaming data with sentiment analytics can detect unhappy users and help customer service augment a response and prevent escalations before that unhappiness turns into anger.

Using streaming data with a dynamic pricing engine can automatically adjust the price of a product based on factors such as current customer demand, product availability, and competitive prices in the area

Because of its complexity, the creation of data streaming workflows has a number of challenges. Historically, streaming applications have been "high-touch" systems that have a large amount of human-interaction that make them inconsistent and difficult to automate.

Data streaming applications can be difficult to set up.  Streaming applications have a number of "moving parts" that tend to be brittle.  

The source layer has to be able to communicate with the ingestion layer.  The ingestion layer must be able to put data into the stream storage layer.  

Consumer applications process the data in the stream-storage layer and either put it into a new stream or send it on to its final destination.

It's expensive to create, maintain, and scale streaming solutions built in on-premises data centers in terms of both human and compute costs.

Issues around creating streaming applications continue with scaling operations.  IoT sensor data might be seasonal like monitoring airspeed during hurricane season.  It's important to be able to increase and decrease the number of resources required to store and consume the collected data.  

To address the challenges of creating custom streaming frameworks and applications to stream data into the AWS cloud, AWS introduced Amazon Kinesis.

When developing Amazon Kinesis, AWS engineers recognized that high availability and durability were a necessary part of the service and it was built to minimize the chance of data loss.

As a managed service, AWS provisions the compute, storage, and memory resources automatically upon request.  Streaming applications use APIs to publish and consume data to and from Amazon Kinesis.

Kinesis is fully scalable and elastic.  That is, it can grow to meet a workload's needs and it can shrink to prevent wasting resources that, in turn, waste money.

Amazon Kinesis integrates with a variety of AWS services.  A benefit of this is that it is possible to create workflows with little or no code that do steam processing at scale.    

This brings me to the end of this lecture. Thank you for watching and letting me be part of your cloud journey.

What I hope you got out of this lecture is that streaming is not a thing by itself.  It is a collection of systems that work together to process data in real time or near real time.  

Having a fully managed framework from AWS means that most of the work required to create a streaming data system has been done in advance.  

Instead of worrying about streaming infrastructure, you can focus on what sort of insights and analysis that needs to be done to improve your business or organization.

If you have any feedback, positive or negative, please contact us at support@cloudacademy.com, your feedback is greatly appreciated.

I'm Stephen Cole for Cloud Academy.  Thank you for watching!

About the Author
Students
37985
Courses
26
Learning Paths
20

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.