The use of Big Data is becoming commonplace within many organizations that are using Big Data solutions to perform large scale queried data analysis with business intelligence toolsets to gain a deeper understanding of data gathered.
Within AWS, this data can be stored, distributed and consumed by various different services, many of which can provide features ideal for Big Data analysis. Typically, these huge data sets often include sensitive information, such as customer details or financial information.
With this in mind, security surrounding this data is of utmost importance, and where sensitive information exists, encryption should be applied against the data.
This course firstly provides an explanation of data encryption and the differences between symmetric and asymmetric cryptography. This provides a good introduction before understanding how AWS implements different encryption mechanisms for many of the services that can be used for Big Data. These services include:
- Amazon S3
- Amazon Athena
- Amazon Elastic MapReduce (EMR)
- Amazon Relational Database Service (RDS)
- Amazon Kinesis Firehose
- Amazon Kinesis Streams
- Amazon Redshift
The course covers encryptions options for data when it is at both at-rest and in-transit and contains for the following lectures:
- Introduction: This lecture introduces the course objectives, topics covered and the instructor
- Overview of Encryption: This lecture explains data encryption and when and why you may need to implement data encryption
- Amazon S3 and Amazon Athena Encryption: This lecture dives into the different encryption mechanisms of S3, from both a server-side and client-side perspective. It also looks at how Amazon Athena can analyze data sets stored on S3 with encryption
- Elastic MapReduce (EMR) Encryption: This lecture focuses on the different methods of encryption when utilizing EMR in conjunction such as EBS and S3. It also looks at application-specific options with Hadoop, Presto, Tez, and Spark
- Relational Database Service (RDS) Encryption: This lecture looks at the encryption within RDS, focusing on its built-in encryption plus Oracle and SQL Server Transparent Data Encryption (TDE) encryption
- Amazon Kinesis Encryption: This lecture looks at both Kinesis Firehose and Kinesis Streams and analyses the encryption of both services.
- Amazon Redshift Encryption: This lecture explains the 4 tiered encryption structure when working with Redshift and KMS. It also explains how to encrypt when working with CloudHSM with Redshift.
- Summary: This lecture highlights the key points from the previous lectures
Resources mentioned throughout this course
Cloud Academy Courses:
- Amazon Web Services: Key Management Services (KMS)
- Working with Amazon Kinesis
- Getting started with AWS CloudHSM
- Configuring HDFS Transparent Encryption in Amazon EMR
- Using SSL to encrypt a connection a Database
- Oracle Native Network Encryption (NNE)
- Encrypt and decrypt Amazon Kinesis Records using AWS KMS
- Configuring Redshift to use CloudHSM
Hello, and welcome to this lecture where I'm going to be looking at how Amazon Kinesis utilizes encryption mechanisms. I will be looking at both Kinesis Firehose and Kinesis Streams.
If you are new to Amazon Kinesis, you may find it useful to take our existing course covering AWS Kinesis found here.
Let me start by providing a high level overview of the differences between each of these services. Amazon Firehose. This service is used to deliver real-time streaming data to different services and destinations within AWS, many of which can be used for big data such as S3 Redshift and Amazon Elasticsearch.
The service is fully managed by AWS, taking a lot of the administration of maintenance out of your hands. Firehose is used to receive data from your data producers where it then automatically delivers the data to your chosen destination. Amazon Streams. This service essentially collects and processes huge amounts of data in real time and makes it available for consumption.
This data can come from a variety of different sources. For example, log data from the infrastructure, social media, web clicks during feeds, market data, etc. So now we have a high-level overview of each of these. We need to understand how they implement encryption of any data process in stored should it be required.
When clients are sending data to Kinesis in transit, the data can be sent over HTTPS, which is HTTP with SSL encryption. However, once it enters the Kinesis service, it is then unencrypted by default. Using both Kinesis Streams and Firehose encryption, you can assure your streams remain encrypted up until the data is sent to its final destination.
As we know, Amazon Firehose is used to send data to a final destination. If Amazon S3 is used as a destination, Firehose can implement encryption using SSE-KMS on S3. Access to this key in the desired S3 bucket can be given to Firehose via an IAM role to enable this data encryption to take place. Once this role has been created, the relevant permissions must be assigned, which must include the following KMS actions against the CMK used, kms:Decrypt and kms:GenerateDataKey.
You can apply the following policy as a trusted entity on the role itself, ensuring you replace the account ID with your own, which would give Kinesis Firehose the relevant access. If you have configured Kinesis Firehose to use Redshift as a destination, then Firehose still copies the data to S3 first as an intermediary location.
In this instance, the same KMS permissions mentioned previously should be implemented to enforce encryption of the data at rest and before it is sent to your Redshift cluster from S3, plus the relevant permissions required for Redshift. Similarly, with Elasticsearch as a destination, S3 can also be used to backup all of the data it sends to Elasticsearch.
And so again, it would need the same KMS permissions plus the relevant permissions for Elasticsearch. Let's now take a look at the encryption for Amazon Kinesis Streams.
Since July 2017, Amazon Streams now has the ability to implement SSE encryption using KMS to encrypt data as it enters the stream directly from the producers.
As a part of this process, it's important to ensure that both producer and consumer applications have permissions to use the KMS key. Otherwise encryption and decryption will not be possible, and you will receive an unauthorized KMS master key permission error.
Put simply, a producer is something that adds data to a Kinesis stream, such as a web service sending log data, encryption happens at the producer level.
The Consumer is usually a Kinesis application that processes data from within the Kinesis stream. Decryption happens at the consumer level.
Your producers must have the following permissions against the CMK used, kms:GenerateDataKey, and the following against the Kinesis stream, kinesis:PutRecord and kinesis:PutRecords.
Your consumers on the other hand will require the following against the CMK, kms:Decrypt, and the following against the Kinesis stream, kinesis:GetRecords and kinesis:DescribeStream. Utilizing SSE with KMS for Kinesis Streams essentially encrypts a data entering a stream before it is saved to the Kinesis Streams storage layer and then decrypted after it's accessed from the storage layer, giving full at-rest encryption within the stream.
Kinesis SSE encryption will typically call upon KMS to generate a new data key every five minutes. So, if you had your stream running for a month or more, thousands of data keys would be generated within this time frame. You may be wondering if by applying this encryption using the producers and then decrypting the data using the consumers, if any latency is added to the performance. And the simple answer is yes. It does add a small overhead, which impacts the performance of PutRecord and PutRecords and GetRecords by less than a hundred microseconds.
Before we finish this lecture, I just want to mention that AWS has released a blog post that shows how to implement encryption from client to destination by building a real-time streaming application using Kinesis, in which your records are encrypted while at rest and in transit, which you may want to take a look at here.
That now brings us to the end of this lecture. Coming up next, I shall be looking at encryption when using the Amazon Redshift service.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.