image
Introduction to Amazon Kinesis
Start course
Difficulty
Beginner
Duration
38m
Students
6993
Ratings
4.6/5
Description

This course explores and introduces you to the concepts of both decoupled and event-driven architectures within AWS. It also provides an introduction to the Amazon Simple Queue Service, Amazon Simple Notification Service, Amazon Kinesis and AWS Lambda. 

For any feedback, comments, or questions related to this course, feel free to reach out to us at support@cloudacademy.com.

Course Objectives

The objectives of this course are:

  • To establish an understanding of what decoupled architecture is
  • Establish an understanding of what event-driven architecture is
  • To learn the foundations of Amazon SQS and how it is used in a decoupled environment
  • To gain an awareness of the Amazon Simple Notification Service, Amazon Kinesis and AWS Lambda to understand when and why you might implement them in an event-driven solution

Intended Audience

This course has been designed for architects who are looking to design and implement best practice solutions by utilizing services in a decoupled and/or event-driven environment.

Prerequisites

To get the most from this course, it would be beneficial to have a basic awareness of what AWS is, in addition to understanding general infrastructure and application architectures, although this is not essential.

 

Transcript

Hello and welcome to this lecture introducing Amazon Kinesis.

Amazon Kinesis makes it easy to collect, process, and analyze real-time streaming data so you can get timely insights and react quickly to new information. With Amazon Kinesis, you can ingest real-time data such as application logs, website clickstreams, IoT to imagery data, and more, into your databases, your data lakes and data warehouses.

It enables you to process and analyze data as it arrives and responds to it in real-time, instead of having to wait until all your data is collected before the processing can begin.

Amazon Kinesis can continuously capture terabytes of data per hour from hundreds or thousands of sources, such as website clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events.

From a storage perspective, Amazon Kinesis does not store persistent data itself, unlike many of the other Amazon big data services. As a result, Amazon Kinesis needs to be deployed as part of a larger event-driven solution.

Amazon Kinesis provides three different solution capabilities. Amazon Kinesis Streams: This enables you to build custom applications that process or analyze streaming data for specialized needs. This comes in 2 different variations, Kinesis Data Streams, and Kinesis Video streams. Data Streams offer a real-time data streaming service capable of elastically scaling to support hundreds of thousands of data feeds to help you build real-time solutions, such as live dashboards or identifying any security anomalies. Video Streams are designed to securely elastically scale and ingest video streams on a massive scale, connecting to millions of video streaming devices, where it can then store, and encrypt the data ready for processing by your data analytics solutions. Amazon Kinesis Data Firehose. This enables you to load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon RedShift, Amazon Elastic Search, and Splunk. Amazon Kinesis Analytics. This enables you to write standard SQL queries on streaming data.

Amazon Kinesis Streams is based on a platform as a service style architecture where you determine the throughput of the capacity you require and the architecture and components are automatically provisioned and stored and configured for you. You have no need or ability to change the way these architectural components are deployed.

An Amazon Kinesis stream is an ordered sequence of data records. A record is the unit of data in an Amazon Kinesis stream. Each record in the stream is composed of a sequence number, a partition key, and a data blob. The data blob is the data of interest that your data producer adds to a stream.

So what is a Producer? A producer is an entity that is continuously pushing data to Kinesis Streams, for example, a web service sending log data to a stream is a producer.

And then we have Consumers, now a consumer receives records from Amazon Kinesis Streams and processes them in real-time. Consumers can store their results using an AWS service, such as Amazon DynamoDB, Amazon Redshift, or Amazon S3. These consumers are known as Amazon Kinesis Streams applications and typically run on a fleet of EC2 instances. You need to build your applications using either the Amazon Kinesis API or the Amazon Kinesis Client Library.

Okay, let's have a look at the architecture that underpins the Amazon Kinesis Firehose. While still under the Kinesis moniker, the Amazon Kinesis Firehouse architecture is different to that of Amazon Kinesis Streams.

Amazon Kinesis Firehose is a fully-managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon Elasticsearch Service and Splunk.

With Kinesis Firehose, you do not need to write applications as your consumers. Instead, you configure your data producers to send data to Kinesis Firehose, where the service then automatically delivers the data to the destination that you specify. You can also configure Amazon Kinesis Firehose to transform your data before data delivery.

A delivery stream is the underlying entity of Kinesis Firehose. You use Kinesis Firehose by creating a Firehose delivery stream and then sending data to it, which means each delivery stream is effectively defined by the target system that receives the streamed data. Firehose can also invoke an AWS Lambda function to transform incoming data before delivering it to the selected destination. You can configure a new Lambda function using one of the Lambda blueprints AWS provides or you can choose on of your existing Lambda functions.

Let's have a quick look at the difference between Amazon Kinesis Streams and Firehose. Amazon Kinesis Streams is a service for workloads that require custom processing, per incoming record, with sub-one-second processing latency, and a choice of stream processing frameworks.

Amazon Kinesis Firehose is a service for workloads that require zero administration, with data latency of 60 seconds or higher. You use Firehose by creating a delivery stream to a specified destination and send data to it, you do not have to create a stream or create a custom application as the destination. But Firehose is limited to S3, Redshift, and Elasticsearch and Splunk as the data destinations.

Amazon Kinesis Analytics is a fully managed service that enables you to quickly author SQL code that continuously reads, processes and stores data. With Amazon Kinesis Analytics, you can ingest in real-time billions of small data points. Each and every individual data point can then be aggregated to provide intelligent business insights, which in turn can be used to continually optimize and improve business processes.

Working with Kinesis Analytics requires you to perform three steps. You must create an input stream. Input streams typically come from streaming data sources such as Kinesis streams. Create SQL processing logic, a series of SQL statements that process input and produce output. The SQL code will typically perform aggregations and generate insights. And finally, create an output stream. Output streams can be configured to hold intermediate results that are used to feed into other queries or be used to stream out the final results. Output streams can be configured to write out to destinations such as S3, Redshift, Elasticsearch and/or other Kinesis streams.

What is the benefit of using Kinesis Analytics, well, the ability to maintain peak performance of a business is often related to the ability to make timely decisions. The earlier we can make informed and actionable decisions, the quicker we can adjust and maintain optimal performance, and hence highlights the importance of being able to process data in near to real-time.

The type of decision making we can make is based on the age of the data itself. Considering this, we can see that data processed within real-time allows us to take preventative and/or predictive decisions.

Your SQL querying statements that you author represent the most important part of your Kinesis Analytics application as they generate the actual analytics that you wish to derive. Your analytics are implemented using one or several SQL statements, used to process and manipulate input and produce output.

This process can involve intermediary steps, whereby the outputs of one query feed into a second in-application stream. This process can be repeated multiple times until a final desired result is achieved persisted to an output stream.

About the Author
Students
237198
Labs
1
Courses
232
Learning Paths
187

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.