1. Home
  2. Training Library
  3. Architecture (SAA-C02)

A Streaming Framework

The course is part of this learning path

Start course
Overview
Difficulty
Beginner
Duration
1h 53m
Students
1042
Ratings
4.8/5
starstarstarstarstar-half
Description

Domain One of The AWS Solution Architect Associate exam guide SAA-CO2 requires us to be able to Design a multi-tier architecture solution so that is our topic for this section.
We cover the need to know aspects of how to design Multi-Tier solutions using AWS services. 

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Learn some of the essential services for creating multi-tier architect on AWS, including the Simple Queue Service (SQS) and the Simple Notification Service (SNS)
  • Understand data streaming and how Amazon Kinesis can be used to stream data
  • Learn how to design a multi-tier solution on AWS, and the important aspects to take into consideration when doing so
 
Transcript

I want to take a few moments to talk about Amazon Kinesis as a streaming framework. That is, Amazon Kinesis, and its features, are really a collection of parts that work together to process data in real time or near real time.

First, a reminder of why streaming data exists. There are a number of common use cases for streaming data.  They include industrial automation, smart cities, smart homes, data lakes, log analytics, and IoT analytics.

Two of the most popular use cases are log analytics feeding into data lakes and IoT analytics. IoT is a broad category of devices.  Think of IoT devices as simply a connected device like a phone, tablet, or smart speaker.  These are connected devices that are almost always sending data.

Events can be things such as search results, financial transactions, user activity, telemetry data from IoT devices, log files, and application metrics.

While in the stream, data is processed dynamically while it is in motion.  This processing can be real-time analytics with machine learning, alerts, or the triggering of one or more actions.

A point that I think must be made here is that, while in a stream, data can be processed but it can not be changed.  Data records are immutable.  If information in a stream needs to be updated, another record is added.  

Consumers are connected to the stream and can aggregate the incoming data, send alerts, and create new data streams that can be processed by other consumers.

A stream-based architecture that matches the flow of data has several advantages over batch-based processing.

One of these advantages is that it has low latency.  Streaming systems can process events and react to them in real-time.  

Another advantage of stream processing is that streams can be architected to reflect how people use applications.  This means streams match real-world processes. 

Put differently, stream processing matches how people interact with the data that surrounds them.  Applications that have a never-ending flow of events are ideal for stream processing.

Recall that, with batch systems, data has to accumulate before processing can start.  

When using stream processing, computation occurs as soon as the data arrives.

Data streaming can enable you to ingest, process, and analyze high volumes of high-velocity data from a variety of sources in real time.

In general, there are five layers of real-time data streaming.  They are the source layer, the stream ingestion layer, the stream storage layer, and the stream processing layer

The source layer is where the data originates. This could be something like data coming from IoT sensors, click-stream data from mobile devices and websites, or application logs.   

The steam ingestion layer is a Producer application tier that collects the source data, formats it appropriately, and publishes Data Records to the stream storage layer   

The stream storage layer acts as a high-speed buffer for data.  The stream processing layer accesses the stream storage layer using one or more applications called Consumers

Consumers read and process the streaming data in near-real time.  This processing could include ETL--Extract, Transform, Load--operations, data aggregation, anomaly detection, or analysis.  

The Consumers deliver Data Records to the fifth layer, the destination.  This could be storage such as a Data Lake or Data Warehouse, durable storage such as Amazon S3, or some type of database.

Clickstream analytics can act as a recommendation engine providing actionable insights used to create personalized coupons & discounts, customize search results, and guide targeted advertisements — all of which help retailers enhance the online shopping experience, increase sales, and improve conversion rates.

As a quick aside, if you're new working with salespeople, the term conversion was new to me when I started working in the cloud.  Instead of data formats, it means converting prospective customers to paying customers.  You might hear the term related to eyeballs.  That is, companies want to convert eyeballs--people looking at products--to customers that return to do more business.

Moving back to use cases, streaming efforts related to preventive maintenance allows equipment manufacturers and service providers to monitor quality of service, detect problems early, notify support teams, and prevent outages.

Streaming data can be used to alert banks and service providers to make them aware of suspected fraud in time to stop bogus transactions and quickly notify them of affected accounts.

Streaming data with sentiment analytics can detect unhappy users and help customer service augment a response and prevent escalations before that unhappiness turns into anger.

Using streaming data with a dynamic pricing engine can automatically adjust the price of a product based on factors such as current customer demand, product availability, and competitive prices in the area

Because of its complexity, the creation of data streaming workflows has a number of challenges. Historically, streaming applications have been "high-touch" systems that have a large amount of human-interaction that make them inconsistent and difficult to automate.

Data streaming applications can be difficult to set up.  Streaming applications have a number of "moving parts" that tend to be brittle.  

The source layer has to be able to communicate with the ingestion layer.  The ingestion layer must be able to put data into the stream storage layer.  

Consumer applications process the data in the stream-storage layer and either put it into a new stream or send it on to its final destination.

It's expensive to create, maintain, and scale streaming solutions built in on-premises data centers in terms of both human and compute costs.

Issues around creating streaming applications continue with scaling operations.  IoT sensor data might be seasonal like monitoring airspeed during hurricane season.  It's important to be able to increase and decrease the number of resources required to store and consume the collected data.  

To address the challenges of creating custom streaming frameworks and applications to stream data into the AWS cloud, AWS introduced Amazon Kinesis.

When developing Amazon Kinesis, AWS engineers recognized that high availability and durability were a necessary part of the service and it was built to minimize the chance of data loss.

As a managed service, AWS provisions the compute, storage, and memory resources automatically upon request.  Streaming applications use APIs to publish and consume data to and from Amazon Kinesis.

Kinesis is fully scalable and elastic.  That is, it can grow to meet a workload's needs and it can shrink to prevent wasting resources that, in turn, waste money.

Amazon Kinesis integrates with a variety of AWS services.  A benefit of this is that it is possible to create workflows with little or no code that do steam processing at scale.    

This brings me to the end of this lecture. Thank you for watching and letting me be part of your cloud journey.

What I hope you got out of this lecture is that streaming is not a thing by itself.  It is a collection of systems that work together to process data in real time or near real time.  

Having a fully managed framework from AWS means that most of the work required to create a streaming data system has been done in advance.  

Instead of worrying about streaming infrastructure, you can focus on what sort of insights and analysis that needs to be done to improve your business or organization.

If you have any feedback, positive or negative, please contact us at support@cloudacademy.com, your feedback is greatly appreciated.

I'm Stephen Cole for Cloud Academy.  Thank you for watching!

About the Author
Avatar
Andrew Larkin
Head of Content
Students
122622
Courses
98
Learning Paths
102

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.