Amazon Kinesis Data Streams
The course is part of this learning path
This course is part 2 of 2 on how to stream data using Amazon Kinesis Data Streams.
The course covers shard capacity, Kinesis Data Streams as a streaming storage layer, and the basics of securing data in a Kinesis Data Stream.
- Build upon the topics covered in Amazon Kinesis Data Streams Part 1
- Learn about shard capacity, scaling, and limits
- Obtain an in-depth understanding of how a data streaming service is layered and its operations
- Understand how to secure a Kinesis Data Stream
This course is intended for people that want to learn how to stream data into the AWS cloud using Amazon Kinesis Data Streams.
To get the most out of this course, you should have a basic knowledge of the AWS platform.
Hello! I'm Stephen Cole, a trainer here at Cloud Academy and I'd like to welcome you to this lecture presenting a discussion on the topic of Amazon Kinesis Data Streams security.
As a quick review, I'm going to start with a discussion of security principles before moving into the topic of how to secure a Kinesis Data Stream.
In a production account, it's important to follow the Principle of Least Privilege.
There are a number of reasons to do this and they generally fall under three categories; organizational, technical, and personal.
In general, we are stewards of the data in our care. From an organizational standpoint, giving people the least amount of access required to do their job ensures that the risk for unauthorized access and use of data is limited.
From a technical standpoint, building a set of controls for data stewardship that is modular makes compliance within an organization easier to manage, obey, and enforce.
Considering personal reasons and human nature being what it is, it is always easier to give people more privileges over time than it is to take access away. Try taking away permissions for anything and see what happens. Put another way, can you remember a time that someone took access privileges away from you? Even if you don't need them--or having them adds unwanted stress to your life--it has an emotional impact.
Save yourself a world of trouble and use the Principle of Least Privilege to protect the data in your care.
I'm going to do a quick review of how Identity and Access Management works inside AWS.
Cloud Academy has courseware on AWS IAM and, if you're going to spend any amount of time architecting solutions in the AWS cloud, it's an important service to understand.
For now, if you're new to how AWS IAM works or just need a little reminder, I'm going to take a moment to talk about it.
Once you've spent enough time working inside the AWS cloud, you will become painfully aware that the default permission for all services is DENY. That is, inside the AWS cloud, in order to be able to do anything, permission has to be explicitly granted.
To do this, a policy document is created and attached to the user, group, or role that needs access to a service or resource. Inside an IAM policy, there are three basic elements: Effect, Action, and Resource.
Effect is either Allow or Deny. A Deny permission cannot be overridden.
The Action is like the verb of a sentence. It describes what can be done.
For example, this statement will allow the actions DescribeStreamSummary and ListStreams.
Using an asterisk after the word Get means that any action that begins with the word Get is allowed. This will include statements like GetShardIterator and GetRecords.
The Resource is the Amazon Resource Name of the stream.
If you're new to ARNs it stands for Amazon Resource Name and they uniquely identify resources inside AWS. Individual fields are separated by colons.
I'll walk through each field starting with arn.
arn stands for--as I've said already--Amazon Resource Name. It's basically self-identification.
aws is in every ARN.
The third field is the service, in this case, Kinesis.
Fourth is the region.
The fifth field is the account number.
Last is the stream. It has a prefix of stream and a forward slash that is followed by a specific stream name.
Asterisks are allowed in ARNs and they act as a wildcard character.
Here is an IAM Policy that allows users to add data to any stream in an account.
The asterisk after PutRecord in this IAM policy means that the API calls PutRecord() and PutRecords() will work.
Here's a longer policy document. It allows a user or group to perform the DescribeStreamSummary, GetShardIterator, and GetRecords operations on a specific stream--in this case it is stream1-- and ListStreams on any stream.
Notice that, in this policy, the Resource is an asterisk. This means any stream in the account where this policy is used.
I cannot create a policy that allows access to an account other than my own. That would be absurd.
To reiterate, it's important to use the principle of least privilege in production accounts. We do not own data, we're stewards of the data in our care. It is important to protect information that has been entrusted to us.
That said, for my demos and experiments, it is much easier to either use an account that has full administrative privileges or gives full privileges for Amazon Kinesis Data Streams.
I'm not lazy, I'm efficient. In spite of how it looks.
This policy gives full access to all streams in a chosen account.
I could go a step further and use this policy.
This will allow all actions on all streams in an account.
This policy is dangerous and should not be anywhere near a production account.
Using it in production could lead to an RBE, a Resume Building Event. As in, "Welcome to your last day of work."
You have been warned.
On the topic of security, the best practices for creating IAM policies working with Kinesis Data Streams include creating 4 different policies based on roles.
There should be a policy for Administrators, for stream resharding, for Producers to do writes, and for Consumers to do reads.
To illustrate--and this is not an exhaustive list--consider this example.
Administrators need to have IAM Actions such as CreateStream, DeleteStream, AddTagsToStream, and RemoveTagsFromStream.
For those that need to reshard a stream the IAM actions include MergeShards and SplitShard.
Producers need the IAM actions DescribeStream, PutRecord, and PutRecords to be able to write Data Records.
Consumers, similarly, need the IAM actions GetRecords and GetShardIterator to be able to read from a Kinesis Data Stream.
There will be other actions required for each of these roles.
For example, as I think about it, the policy for Consumers could probably also use DescribeStream.
There's no single best policy for any job description. It depends on what other services and actions are needed in your organization. Be sure to consult the documentation to see what is possible and match the appropriate actions as needed.
It's possible that you will discover a feature of streaming data that you weren't aware of or learn a better way to manage an existing process.
Also, I should mention that it is a general AWS best practice to use temporary security credentials in the form of IAM roles whenever appropriate. Assume a role when needed and let it go when done.
In addition to using IAM policies to control access, Kinesis Data Streams can encrypt data in flight by using HTTPS endpoints.
Using the Amazon Key Management Service, KMS, encryption is available for data at rest.
Using KMS, Data Records are encrypted before they are written to the Kinesis stream storage layer and decrypted after it’s retrieved. As a result, Data Records are encrypted at rest within the Kinesis Data Streams service to meet regulatory requirements and enhance data security.
It is possible to encrypt data on the client side. The data then has to be decrypted client side as well. This is a manual process and much more challenging than using KMS because there is nothing available from AWS to help with implementing client-side encryption.
If you require FIPS 140-2 encryption, FIPS endpoints are available. If you don't know what FIPS is, you probably don't need to use a FIPS endpoint.
However, for reference, FIPS stands for Federal Information Processing Standard. It is a set of standards that describes document processing, encryption algorithms, and other information technology standards for use within non-military government agencies and by government contractors & vendors who work with those agencies.
If Kinesis Applications are in a VPC, use VPC Endpoints to access Kinesis Data Streams. All network traffic between Kinesis and the application will stay within the VPC.
Essentially, VPC Endpoints allow you to connect to AWS Services from within a VPC without accessing the public Internet. They remove the need to have an Internet Gateway or a NAT Gateway attached to a private subnet.
Endpoints are managed by AWS and automatically scale horizontally and are highly available.
What happens is that an ENI--an Elastic Network Interface--is provisioned using a private IP address inside a subnet in the VPC. A security group is used to restrict access to this ENI and it is used to send traffic to the desired service such as Kinesis Data Streams.
That's it for this lecture.
As a review, I covered issues relating to security using Kinesis Data Streams.
Remember to use the Principle of Least Privilege when provisioning resources inside AWS.
This might be less important in a development environment however, in a production environment, one of your personal and professional goals should always be to avoid creating an RBE, a Resume Building Event.
The Best Practices for Kinesis Data Streams include having separate security policies for Administrators, for stream resharding, for Producers to do writes, and for Consumers to do reads.
For streaming data that is in-flight, HTTPS endpoints are available.
Inside a Kinesis Data Stream, Data Records can be encrypted and decrypted using KMS.
When working with Kinesis Data Streams and VPCs, use a VPC Endpoint to keep network traffic inside the VPC and away from the Public Internet.
That's it for now. I hope you found this informative and engaging. I'm Stephen Cole with Cloud Academy, thanks for watching!
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.