Start course

The use of Big Data is becoming commonplace within many organizations that are using Big Data solutions to perform large scale queried data analysis with business intelligence toolsets to gain a deeper understanding of data gathered.

Within AWS, this data can be stored, distributed and consumed by various different services, many of which can provide features ideal for Big Data analysis. Typically, these huge data sets often include sensitive information, such as customer details or financial information.

With this in mind, security surrounding this data is of utmost importance, and where sensitive information exists, encryption should be applied against the data.

This course firstly provides an explanation of data encryption and the differences between symmetric and asymmetric cryptography. This provides a good introduction before understanding how AWS implements different encryption mechanisms for many of the services that can be used for Big Data. These services include:

  • Amazon S3
  • Amazon Athena
  • Amazon Elastic MapReduce (EMR)
  • Amazon Relational Database Service (RDS)
  • Amazon Kinesis Firehose
  • Amazon Kinesis Streams
  • Amazon Redshift

The course covers encryptions options for data when it is at both at-rest and in-transit and contains for the following lectures:

  • Introduction: This lecture introduces the course objectives, topics covered and the instructor
  • Overview of Encryption: This lecture explains data encryption and when and why you may need to implement data encryption
  • Amazon S3 and Amazon Athena Encryption: This lecture dives into the different encryption mechanisms of S3, from both a server-side and client-side perspective. It also looks at how Amazon Athena can analyze data sets stored on S3 with encryption
  • Elastic MapReduce (EMR) Encryption: This lecture focuses on the different methods of encryption when utilizing EMR in conjunction such as EBS and S3. It also looks at application-specific options with Hadoop, Presto, Tez, and Spark
  • Relational Database Service (RDS) Encryption: This lecture looks at the encryption within RDS, focusing on its built-in encryption plus Oracle and SQL Server Transparent Data Encryption (TDE) encryption
  • Amazon Kinesis Encryption: This lecture looks at both Kinesis Firehose and Kinesis Streams and analyses the encryption of both services.
  • Amazon Redshift Encryption: This lecture explains the 4 tiered encryption structure when working with Redshift and KMS. It also explains how to encrypt when working with CloudHSM with Redshift.
  • Summary: This lecture highlights the key points from the previous lectures

Resources mentioned throughout this course

Cloud Academy Courses:

AWS Resources:



Hello, and welcome to this final lecture, where I shall be highlighting some key points from each lecture that you have covered throughout the course.

I started off by providing an overview of encryption in general, where we learned unencrypted data is known as cleartext or plaintext. Data encryption is the mechanism in which information is altered, rendering the plaintext data unreadable through the use of mathematical algorithms and keys.

A key is simply a string of characters used with the the encryption algorithm, and the longer the key, the more robust the encryption. Key cryptography is either symmetric or asymmetric. Symmetric cryptography uses a single key to perform the encryption and decryption. And common symmetric algorithms are AES, DES, Triple-DES, and Blowfish.

Asymmetric cryptography uses different keys, one to perform encryption and one to perform decryption. In this process, one key is public, and one key is private. Common asymmetric algorithms are RSA, Diffie-Hellman, and Digital Signature Algorithm.

I then started to explain about the different encryption mechanisms that are used across a range of services, specifically ones that can be used for big data, starting with S3.

We learned that S3 offers server-side encryption, SSE, and client-side encryption, CSE. SSE offers three different options. SSE with Amazon S3-Managed Keys, which is SSE-S3, SSE with AWS KMS-Managed Keys, which SSE-KMS, and SSE with Customer-provided Keys, SSE-C. Client-side encryption offers two different options, client-side encryption with KMS, CSE-KMS, and client-side encryption using a custom client-side master key, CSE-C.

SSE-S3 is managed by AWS and uses AES-256 symmetric encryption, which supports bucket policies.

SSE-KMS uses the key management service to help with key management, and it allows you to use your own keys, the CMK, through KMS, giving more control and flexibility. The CMK encrypts the data key, not the data itself, and this also supports bucket policies.

SSE-C uses customer-provided keys using AES-256. During an upload of an object, you must also send the key with it. And it only works with HTTPS to secure data in transit, specifically the key.

Client-side encryption using KMS uses an S3 encryption client, such as the Amazon S3 Encryption Client in the AWS SDK for Java and you must supply the CMK ID from KMS. And the encryption happens prior to upload and after download of an object. CSE-C uses a custom master key, which is never sent to AWS. It also uses an S3 client, like CSE-KMS, and again, encryption happens prior to upload and after download of an object. We then looked at Athena.

And Athena supports the ability to query encrypted data on S3. It can also encrypt queried results, even if the data queried was not encrypted. SSE-C and CSE-C are not currently supported by Athena. But Athena does support SSE-KMS, SSE-S3, and CSE-KMS. Athena will only query objects in the same region as where Athena is running.

And KMS Decrypt and KMS Generate Key permissions are required to allow Athena to query encrypted data on S3 using KMS. Following this lecture, I then looked at Elastic MapReduce Encryption. And the key points from this lecture were that by default EMR does not implement encryption at rest.

From EMR version 4.8.0 and onwards, you are able to configure a security configuration specifying different settings on how to manage encryption for your data. The security configuration allows you to configure encryption at rest, in transit, or both together. The security configuration exists separately from your EC2 clusters.

And using EMR version 5.7.0, you can specify your own custom AMI, allowing you to encrypt the EBS route device volume of your instances.

Using EBS as your persistent storage layer, you can implement Linux Unified Key Setup with KMS and open-source HDFS encryption. This provides two Hadoop encryption options, Secure Hadoop RPC and data encryption of HDFS block transfer.

When using S3, EMR supports the use of SSE-S3 or SSE-KMS to perform server-side encryption. You could also encrypt your data using your client before storing on S3 using CSE-KMS or CSE-C.

EMR in transit encryption, you can enable open-source TLS encryption features. When a TLS certificate provider has been configured, the following application-specific encryption features can be enabled: Hadoop, with Hadoop MapReduce Encrypted Shuffle, Secure Hadoop RPC, and data encryption of HDFS block transfer.

With Presto, when using EMR version 5.6.0 and later, any internal communication between Presto nodes will use SSL TLS. With Tez, Tez Shuffle Handler uses TLS. And Spark, the Akka protocol uses TLS, block transfer service uses SASL and Triple-DES, external shuffle service uses SASL.

Transparent encryption can also be used by implementing transparent encryption in HDFS.

Following EMR, I looked at encryption for the RDS service, and here we learned that you can configure encryption at rest during its configuration by selecting the checkbox for "enable encryption". This encryption can only be implemented during the database creation. And read replicas will have the same level of encryption as the master database.

You can also implement application-level encryption using Oracle and SQL Server Transparent Data Encryption, TDE, MySQL cryptographic functions, and Microsoft Transact-SQL cryptographic functions. To use TDE encryption, the database must be a part of an option group with the TDE option added to the group.

TDE can use two different encryption modes, TDE table namespace encryption and TDE column encryption. RDS in-transit encryption can be enabled by using SSL between your application and the RDS database. To use Oracle's native network encryption, NNE, you must add native network encryption to the database options group.

Moving on from databases, the next topic was encryption mechanisms when using the Amazon Kinesis platform for both Kinesis Firehose and Kinesis Streams. Within this lecture, we learned that data being sent to Kinesis can be sent using SSL for in-transit encryption. By default, once the data enters Kinesis, it is decrypted.

Kinesis Firehose can use SSE-KMS when sending data to S3. Kinesis Firehose must have the KMS decrypt and the KMS generate data key permission of the CMK when using this method. If data is being sent to Amazon Redshift by Kinesis Firehose, S3 will still be used as an intermediary storage location, and so the same encryption can be applied.

Since July 2017, Kinesis Streams can now apply SSE encryption to incoming data to the stream using KMS. Producers and consumers need to have the relevant permissions to the KMS CMK. Kinesis Streams will request a new data encryption key approximately every five minutes. And a performance hit of approximately 100 microseconds is added for the encryption and decryption to take place by the producers and consumers.

We then finished up by looking at encryption options within Amazon Redshift. Amazon Redshift uses a four-tiered structure of encryption keys. Tier one is the master key, tier two, the cluster encryption key, CEK, tier three, the database encryption key, DEK, and then tier four, the data encryption keys.

The master key can be generated by either KMS or CloudHSM. Integration exists between KMS and Amazon Redshift, but not between CloudHSM and Redshift. When using CloudHSM, a trust must be established between your HSM and Amazon Redshift to send secure encryption keys between the two resources. For this to take place, you must download a certificate from Redshift to your HSM device, and then configure Redshift with the following details of your HSM: the HSM IP address, HSM partition name, the HSM partition password, and the public HSM service certificate.

You can perform a key rotation using the AWS Management Console for both your CEK and DEK.

That has now brought me to the end of this lecture and to the end of this course. I hope it has given you a good understanding of encryption itself, including symmetric and asymmetric cryptography, and opened you to some encryption mechanisms that are offered by services which are commonly used for big data solutions, using Amazon S3, Athena, Elastic MapReduce, the RDS service, Kinesis Firehose, Kinesis Streams, and Redshift.

You should now be able to enforce additional security controls, including encryption, into your existing infrastructure to help secure your data.

If you have any feedback on the course, positive or negative, please do leave a comment on the course landing page. We do look at the comments, and your feedback is greatly appreciated.

Thank you for your time, and good luck with your continued learning of cloud computing.

Thank you.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.