The use of Big Data is becoming commonplace within many organizations that are using Big Data solutions to perform large scale queried data analysis with business intelligence toolsets to gain a deeper understanding of data gathered.
Within AWS, this data can be stored, distributed and consumed by various different services, many of which can provide features ideal for Big Data analysis. Typically, these huge data sets often include sensitive information, such as customer details or financial information.
With this in mind, security surrounding this data is of utmost importance, and where sensitive information exists, encryption should be applied against the data.
This course firstly provides an explanation of data encryption and the differences between symmetric and asymmetric cryptography. This provides a good introduction before understanding how AWS implements different encryption mechanisms for many of the services that can be used for Big Data. These services include:
- Amazon S3
- Amazon Athena
- Amazon Elastic MapReduce (EMR)
- Amazon Relational Database Service (RDS)
- Amazon Kinesis Firehose
- Amazon Kinesis Streams
- Amazon Redshift
The course covers encryptions options for data when it is at both at-rest and in-transit and contains for the following lectures:
- Introduction: This lecture introduces the course objectives, topics covered and the instructor
- Overview of Encryption: This lecture explains data encryption and when and why you may need to implement data encryption
- Amazon S3 and Amazon Athena Encryption: This lecture dives into the different encryption mechanisms of S3, from both a server-side and client-side perspective. It also looks at how Amazon Athena can analyze data sets stored on S3 with encryption
- Elastic MapReduce (EMR) Encryption: This lecture focuses on the different methods of encryption when utilizing EMR in conjunction such as EBS and S3. It also looks at application-specific options with Hadoop, Presto, Tez, and Spark
- Relational Database Service (RDS) Encryption: This lecture looks at the encryption within RDS, focusing on its built-in encryption plus Oracle and SQL Server Transparent Data Encryption (TDE) encryption
- Amazon Kinesis Encryption: This lecture looks at both Kinesis Firehose and Kinesis Streams and analyses the encryption of both services.
- Amazon Redshift Encryption: This lecture explains the 4 tiered encryption structure when working with Redshift and KMS. It also explains how to encrypt when working with CloudHSM with Redshift.
- Summary: This lecture highlights the key points from the previous lectures
Resources mentioned throughout this course
Cloud Academy Courses:
- Amazon Web Services: Key Management Services (KMS)
- Working with Amazon Kinesis
- Getting started with AWS CloudHSM
- Configuring HDFS Transparent Encryption in Amazon EMR
- Using SSL to encrypt a connection a Database
- Oracle Native Network Encryption (NNE)
- Encrypt and decrypt Amazon Kinesis Records using AWS KMS
- Configuring Redshift to use CloudHSM
Hello and welcome to this lecture. I want to talk to you about available encryption options when using Redshift. Redshift is a fully managed service that can scale up to over a petabyte in size, which is used as a data warehouse for big data solutions. Using Redshift clusters, you are able to run analytics against your datasets using fast, SQL-based query tools and business intelligence applications to gather greater understanding of vision for your business.
So how does Redshift handle encryption of huge amounts of data?
Redshift offers encryption at rest using a four-tired hierarchy of encryption keys using either KMS or CloudHSM to manage the top tier of keys. When encryption is enabled for your cluster, it can't be disable and vice versa. When you have an unencrypted cluster, it can't be encrypted.
Encryption for your cluster can only happen during its creation, and once encrypted, the data, metadata, and any snapshots are also encrypted. The tiering level of encryption keys are as follows, tier one is the master key, tier two is the cluster encryption key, the CEK, tier three, the database encryption key, the DEK, and finally tier four, the data encryption keys themselves.
As mentioned previously, the master keys can either be managed by KMS or CloudHSM. Amazon Redshift integrates well with KMS but not CloudHSM, and so integration with a HSM device requires additional configuration or steps to implement such as adding certificates to establish a trusted connection between both resources, your HSM device and your Redshift cluster.
The encryption method differs slightly between the two options of KMS and CloudHSM. So let me break each of these down individually starting with KMS. During the creation of your cluster, you can either select the default KMS key for Redshift or select your own CMK, which gives you more flexibility over the control of the key, specifically from an auditable perspective.
The default KMS key for Redshift is automatically created by Redshift the first time the key option is selected and used, and it is fully managed by AWS. The CMK is known as the master key, tier one, and once selected, Redshift can enforce the encryption process as follows. So Redshift will send a request to KMS for a new KMS key.
This KMS key is then encrypted with the CMK master key, tier one. This encrypted KMS data key is then used as the cluster encryption key, the CEK, tier two. This CEK is then sent by KMS to Redshift where it is stored separately from the cluster. Redshift then sends this encrypted CEK to the cluster over a secure channel where it is stored in memory.
Redshift then requests KMS to decrypt the CEK, tier two. This decrypted CEK is then also stored in memory. Redshift then creates a random database encryption key, the DEK, tier three, and loads that into the memory of the cluster. The decrypted CEK in memory then encrypts the DEK, which is also stored in memory.
This encrypted DEK is then sent over a secure channel and stored in Redshift separately from the cluster. Both the CEK and the DEK are now stored in memory of the cluster both in an encrypted and decrypted form. The decrypted DEK is then used to encrypt data keys, tier four, that are randomly generated by Redshift for each data block in the database.
When performing encryption using CloudHSM, the process is different. If you are new to CloudHSM, then you may want to look at our existing course covering the service found here. When working with CloudHSM to perform your encryption, firstly you must set up a trusted connection between your HSM client and Redshift while using client and server certificates.
This connection is required to provide secure communications, allowing encryption keys to be sent between your HSM client and your Redshift clusters. Using a randomly generated private and public key pair, Redshift creates a public client certificate, which is encrypted and stored by Redshift. This must be downloaded and registered to your HSM client, and assigned to the correct HSM partition.
You must then configure Redshift with the following details of your HSM client: the HSM IP address, the HSM partition name, the HSM partition password, and the public HSM server certificate, which is encrypted by CloudHSM using an internal master key. Once this information has been provided, Redshift will confirm and verify that it can connect and access development partition.
For detailed instructions on how to configure Redshift encryption using CloudHSM, please see the following AWS documentation that will provided step-by-step details.
If your internal security policies or governance controls dictate that you must apply key rotation, then this is possible with Redshift enabling you to rotate encryption keys for encrypted clusters, however, you do need to be aware that during the key rotation process, it will make a cluster unavailable for a very short period of time, and so it's best to only rotate keys as and when you need to, or if you feel they may have been compromised.
During the rotation, Redshift will rotate the CEK for your cluster and for any backups of that cluster. It will rotate a DEK for the cluster but it's not possible to rotate a DEK for the snapshots stored in S3 that have been encrypted using the DEK. It will put the cluster into a state of 'rotating keys' until the process is completed when the status will return to 'available'.
To perform a key rotation of your cluster, it's very simple using the AWS Management Console. Select Amazon Redshift from within the management console, navigate to clusters, select the cluster you wish to rotate keys for, select the database, rotate encryption keys, and select yes, rotate keys, then your cluster will temporarily be unavailable whilst the key rotation process completes.
This now brings us to the end of this lecture on Amazon Redshift encryption. Coming up next, I shall be providing a summary of the key points throughout the previous lectures.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.