Amazon Kinesis Data Streams
The course is part of these learning paths
This course is part 2 of 2 on how to stream data using Amazon Kinesis Data Streams.
The course covers shard capacity, Kinesis Data Streams as a streaming storage layer, and the basics of securing data in a Kinesis Data Stream.
- Build upon the topics covered in Amazon Kinesis Data Streams Part 1
- Learn about shard capacity, scaling, and limits
- Obtain an in-depth understanding of how a data streaming service is layered and its operations
- Understand how to secure a Kinesis Data Stream
This course is intended for people that want to learn how to stream data into the AWS cloud using Amazon Kinesis Data Streams.
To get the most out of this course, you should have a basic knowledge of the AWS platform.
This concludes my overview of Kinesis Data Streams. Of course, there is more to learn. There always is. However, I think this is enough to get you started with the service.
As a review, this is what I've covered in this series of lectures about Kinesis Data Streams.
Some data moves faster than others. Not all data is created or used equally. There are times that data is only valuable for a short period of time, sometimes measured in seconds. That's one of the reasons why streaming data exists today. It can capture that data and use it while it still has value.
Cloud computing comes with the promises of agility, scalability, and elasticity.
Agility is about changing to meet needs. Being agile is more than adapting to change, it’s also about being able to fit into places that were, at one time, too big or too small.
Scalability means that growth can happen when needed.
Elasticity, at least in the cloud, is the opposite of scalability. Even though environments can expand, one of the cloud's key characteristics is that they can return to their original size.
Kinesis Data Streams is one of the features of Amazon Kinesis. It is a massively scalable and durable real-time data streaming service.
It is used to collect data from sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, location tracking events, and IoT devices.
Data collected is available within milliseconds inside a stream and enables applications to do things like analytics, dashboards, anomaly detection, and dynamic pricing in real time.
I want to summarize the key points from both parts of this course.
A Kinesis Data Stream is made up of Shards.
Producer applications put data into streams as Data Records.
Data Records have three main parts; a Partition Key, a Sequence Number, and a Data Blob.
Data inside a Data Record is Base64 encoded text.
Data Records are put into shards based on their Partition Key and ordered by their Sequence Number.
A Data Record can go into one and only one shard.
A shard is both a Unit of Scale and a Unit of Parallelism.
A single shard can support a data write rate of 1 megabyte per second or 1,000 records; whichever is greater.
AWS replicates shards automatically across 3 Availability Zones for durability and high availability.
Streams can be created using the AWS Console, programmatically using an SDK, or from the AWS CLI.
Charges start to accumulate as soon as the stream has been provisioned.
When creating a stream, it has to have a unique name and at least one shard.
Kinesis Data Streams throughput is based on the number of shards it has.
The Shard ID is unique to the shard, the Hash Key ranges do not overlap between shards, and the Sequence Number is used to keep Data Records in order.
When a Producer writes a Data Record to a stream, an MD5 Hash is created based on the Partition Key. This hash key value determines to which shard the Data Record is written.
There are two types of Consumers available to read a Kinesis Data Stream; a Standard Consumer and an Enhanced Fan-Out Consumer.
The Standard Consumer uses a polling method to get data from a stream. For reads, each shard can support up to 5 API calls per second, return up to 2 megabytes per second, and a total of 10,000 records per second.
Enhanced Fan-Out uses a push mechanism to send data to Consumers. There can be 20 Consumer Applications registered to a stream and each Consumer gets 2 megabytes per second per shard of throughput.
That was what was covered in part one of this course.
In part two, we learned that hot shards have a large number of reads and writes per second while cold shards are the opposite; they have a low number of reads and write per second.
A hot shard can cause throttling.
Cold shards waste money.
Kinesis Data Streams can scale up and down but not in real time. Scaling is done programmatically and is not done automatically by AWS. You will have to monitor your streams to ensure you have the correct amount of throughput.
The scaling process is called resharding.
Adding a shard is done using a process called Shard Splitting.
Removing a shard is Shard Merging.
When a shard is split or merged there are Parent Shards and Child Shards. The Parent Shards are closed for writing but are available for reading until they expire.
When a Parent Shard is closed, write operations intended for it are rerouted to the Child Shard.
Generally speaking, there are five layers of a data streaming service.
The source layer, the stream ingestion layer, the stream storage layer, the stream processing layer, and the destination.
While Amazon Kinesis Data Streams describes the entire streaming service from AWS, a Kinesis Data Stream--singular--is a storage layer for streaming data.
Creating a stream from the AWS CLI uses the create-stream API call.
To see the status of a stream from the AWS CLI, use the describe-stream or describe-stream-summary API calls.
To put Data Records into a stream, the API calls are putRecord() and putRecords().
The API call getShardIterator() is used to determine where to start reading data in a shard. Valid options are: a specific sequence number, the Data Record that comes after a specific sequence number, a time stamp, the oldest Data Record, or the newest Data Record.
The Kinesis Producer Library--KPL--and the Kinesis Client Library--KCL--were released by AWS to make building Producers and Consumers fast and efficient.
The KPL aggregates records and the KCL de-aggregates records. The KCL also can automatically take care of common tasks related to creating Consumer applications.
Remember to use the Principle of Least Privilege when provisioning resources inside AWS. This might be less important in a development environment however, in a production environment, one of the goals is to avoid creating an RBE, a Resume Building Event.
The Best Practices for Kinesis Data Streams include having separate security policies for Administrators, for stream resharding, for Producers to do writes, and for Consumers to do reads.
Inside a Kinesis Data Stream, Data Records can be encrypted and decrypted using KMS.
When working with Kinesis Data Streams and VPCs, use a VPC Endpoint to keep network traffic inside the VPC and away from the Public Internet.
I've covered a lot in this set of lectures. There's more to learn, of course, but this is enough to get you started.
Another reminder for you is that AWS continues to expand and augment its offerings. The rate of change in the cloud is staggering. New services and features often surprise me. So, we here at Cloud Academy will be working at adding and updating content over time.
Thankfully, the fundamentals don't change as fast. Even if a feature changes, how it works should stay relatively constant.
I hope you found this material useful and that you can use it to take your next steps on your cloud journey.
This brings me to the end of this course. But, it really isn’t the end. (It never is.) There’s more to learn and experience about data streaming and cloud computing.
Whether I want to or not, it seems that I learn something new every day.
Part of this is the rate of change in the cloud and part of it is having the freedom to explore and experiment.
Your comments, questions, and suggestions on our courses are of great value to us at Cloud Academy. We use this information to improve our courses in terms of both content and delivery.
Please feel free to reach out to us by emailing email@example.com. I'd love to hear your thoughts about this course as well as what you'd like to learn about in the future.
I'm Stephen Cole with Cloud Academy and thank you for watching!
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.