Amazon Kinesis Data Streams
The course is part of this learning path
This course is part one of two on how to stream data using Amazon Kinesis Data Streams.
In this course, you will learn about the Kinesis service, how to get data in and out of an Amazon Kinesis Data Stream, and how streaming data is modeled.
We'll also cover Kineses Producer Applications and Kinesis Consumer Applications and how they communicate with a Kinesis Data Stream.
You'll also learn about the limits and costs of streaming data with Amazon Kinesis, as well as how data can be secured and protected inside a stream.
- Obtain a foundational understanding of Kinesis Data Streams
- Learn how to get data in and out of a stream and how to use streams to get data from its source to a destination
- Learn about the limits of Kinesis Data Streams and how to avoid them
- Understand the cost structure of the service
- Learn how to protect and secure data within a stream
This course is intended for people that want to learn how to stream data into the AWS cloud using Amazon Kinesis Data Streams.
To get the most out of this course, you should have a basic knowledge of the AWS platform.
As a review, this is what I've covered in this set of lectures about Amazon Kinesis Data Streams.
Some data moves faster than others. Not all data is created or used equally. There are times that data is only valuable for a short period of time, sometimes measured in seconds. That's one of the reasons why streaming data exists today. It can capture that data and use it while it is still valuable.
The cloud comes with promises of agility, scalability, and elasticity. Agility is about changing to meet needs. Being agile is more than adapting to change, it’s also about being able to fit into places that, at one time were too big or too small. Scalability means that growth can happen when needed. Elasticity, at least in the cloud, is the opposite of scalability. Even though environments can expand, one of the cloud's key characteristics is that they can return to its original size. This means that environments in the cloud can grow and shrink as needed.
So, if you are no longer using a resource, you can turn it off -- and stop paying for it. When I was a kid, my dad would go through the house grumbling about me forgetting to turn the lights out when I left a room. He'd say things like, "Why am I paying to light up a room nobody is using?" To me, that's the fundamental principle of elasticity. It's my dad reminding me that it's a bad idea to pay for something if it isn't being used.
Kinesis Data Streams is one of the features of Amazon Kinesis. It is a massively scalable, elastic, and durable real-time data streaming service. It is used to collect data from sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, location-tracking events, and IoT devices.
Data collected is available within milliseconds inside a stream to enable applications to do things like analytics, dashboards, anomaly detection, and dynamic pricing in real time. A Kinesis Data Stream is made up of Shards. Producer applications put data into streams as Data Records.
Data Records have three main parts; a Partition Key, a Sequence Number, and a Data Blob. Data inside a Data Record is Base64 encoded text. Data Records are put into shards based on their Partition Key and ordered by their Sequence Number. A Data Record can go into one and only one shard.
A shard is both a Unit of Scale and a Unit of Parallelism. A single shard can support a data write rate of 1 megabyte per second or 1,000 records; whichever is greater.
Streams can be created using the AWS Console, programmatically using an SDK, or from the AWS CLI. Charges start to accumulate as soon as the stream has been provisioned. There is no free tier. When creating a stream, it has to have a unique name and at least one shard.
Kinesis Data Streams throughput is based on the number of shards it has. The Shard ID is unique to the shard, the Hash Key ranges do not overlap between shards, and the Sequence Number is used to keep Data Records in order.
When a Producer writes a Data Record to a stream, an MD5 Hash is created based on the Partition Key. This hash key value determines to which shard the Data Record is written. Producers, when they put records into a stream, have a pair of limits. Data can be written at a rate of 1 megabyte per second per shard or there can be 1,000 writes per shard.
There are two types of Consumers available to read a Kinesis Data Stream; a Standard Consumer and an Enhanced Fan-Out Consumer. The Standard Consumer uses a polling method to get data from a stream. For reads, each shard can support up to 5 API calls per second, return up to 2 megabytes per second, and a total of 10,000 records per second.
Enhanced Fan-Out uses a push mechanism to send data to Consumers. There can be 20 Consumer Applications registered to a stream and each Consumer gets 2 megabytes per second per shard of throughput.
That's what I've covered, so far. There is more. In part two of this course, I am going to cover Shard capacity, a Kinesis Data Stream as a streaming storage layer, and the basics of securing a Kinesis Data Stream.
AWS continues to expand and augment its offerings. The rate of change in the cloud is staggering. New services and features often surprise me. So, we here at Cloud Academy will be working to add and update content over time. Thankfully, the fundamentals don't change as fast. Even if a feature changes, how it works should stay constant.
I hope you found this material useful and that you can use it to take your next steps on your cloud journey. This brings me to the end of part one of this course. But, it really isn’t the end. There’s more to learn and experience about data streaming and cloud computing. Whether I want to or not, it seems that I learn something new every day. Part of this is the rate of change in the cloud and part of it is how the cloud gives me the freedom to explore and experiment.
Your comments, questions, and suggestions on our courses are of great value to us at Cloud Academy. We use this information to improve our courses in terms of both content and delivery. Please reach out to us at Cloud Academy by emailing email@example.com. I'd love to hear your thoughts about this course as well as what you'd like to learn about in the future.
I'm Stephen Cole with Cloud Academy. Thank you for watching and I hope you join me for part two of this course.
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.