The course is part of these learning paths
The use of Big Data is becoming commonplace within many organizations that are using Big Data solutions to perform large scale queried data analysis with business intelligence toolsets to gain a deeper understanding of data gathered.
Within AWS, this data can be stored, distributed and consumed by various different services, many of which can provide features ideal for Big Data analysis. Typically, these huge data sets often include sensitive information, such as customer details or financial information.
With this in mind, security surrounding this data is of utmost importance, and where sensitive information exists, encryption should be applied against the data.
This course firstly provides an explanation of data encryption and the differences between symmetric and asymmetric cryptography. This provides a good introduction before understanding how AWS implements different encryption mechanisms for many of the services that can be used for Big Data. These services include:
- Amazon S3
- Amazon Athena
- Amazon Elastic MapReduce (EMR)
- Amazon Relational Database Service (RDS)
- Amazon Kinesis Firehose
- Amazon Kinesis Streams
- Amazon Redshift
The course covers encryptions options for data when it is at both at-rest and in-transit and contains for the following lectures:
- Introduction: This lecture introduces the course objectives, topics covered and the instructor
- Overview of Encryption: This lecture explains data encryption and when and why you may need to implement data encryption
- Amazon S3 and Amazon Athena Encryption: This lecture dives into the different encryption mechanisms of S3, from both a server-side and client-side perspective. It also looks at how Amazon Athena can analyze data sets stored on S3 with encryption
- Elastic MapReduce (EMR) Encryption: This lecture focuses on the different methods of encryption when utilizing EMR in conjunction such as EBS and S3. It also looks at application-specific options with Hadoop, Presto, Tez, and Spark
- Relational Database Service (RDS) Encryption: This lecture looks at the encryption within RDS, focusing on its built-in encryption plus Oracle and SQL Server Transparent Data Encryption (TDE) encryption
- Amazon Kinesis Encryption: This lecture looks at both Kinesis Firehose and Kinesis Streams and analyses the encryption of both services.
- Amazon Redshift Encryption: This lecture explains the 4 tiered encryption structure when working with Redshift and KMS. It also explains how to encrypt when working with CloudHSM with Redshift.
- Summary: This lecture highlights the key points from the previous lectures
Resources mentioned throughout this course
Cloud Academy Courses:
- Amazon Web Services: Key Management Services (KMS)
- Working with Amazon Kinesis
- Getting started with AWS CloudHSM
- Configuring HDFS Transparent Encryption in Amazon EMR
- Using SSL to encrypt a connection a Database
- Oracle Native Network Encryption (NNE)
- Encrypt and decrypt Amazon Kinesis Records using AWS KMS
- Configuring Redshift to use CloudHSM
Hello and welcome to this lecture where I'm going to cover the different encryption mechanisms available for data being stored on S3 and when queried using Amazon Athena. S3 can be used to analyze large-scale data sets of information with the help of Amazon Athena which is an interactive query service that uses standard SQL.
Let me start by discussing the different encryption options available with S3. So, Amazon S3 offers both server-side encryption, SSE, and client-side encryption, CSE. I want to begin by looking at the different options available when using server-side encryption. Server-side encryption known as SSE is used for securing data at rest.
When SSE is applied, the data is encrypted at an object level before it is written to the physical disk that construct S3. If an object is encrypted with SSE, how you access the object remains the same as long as you have the relevant permissions to access the object. There are three different forms of SSE with S3, each providing a different method of encryption key management.
Firstly, SSE with Amazon S3-Managed Keys known as SSE-S3. Secondly, SSE with AWS KMS-Managed Keys known as SSE-KMS, and thirdly, SSE with Customer-Provided Keys known as SSE-C. Let's take a look of each of these in more detail starting with SSE-S3. With SSE-S3 encryption, Amazon S3 uses a unique key to encrypt each data object and then this key itself is encrypted with a master key, providing a multifactor encryption mechanism.
The complete encryption and decryption cycle of the object is all managed by AWS and can be set by selecting the Amazon S3 master key encryption option when uploading an object in the management console or by using the AWS CLI. The objects are encrypted using one of the strongest algorithms, AES-256. As we know from the previous lecture, AES is a symmetric encryption algorithm.
In this case, symmetric cryptography works well as AWS will manage the encryption and decryption of the object when it is access by a user or service, therefore the key does not need to be sent to anyone else which may risk key exposure. To help enforce an encryption requirement on S3, SSE can work in conjunction with S3 bucket policies.
So, you could enforce conditions within a bucket policy to deny any object that is not uploaded with server-side encryption within the header during a PutObject request. For example, if you are an administrator of an S3 bucket which was being used as your storage layer, you could add the following bucket policy to ensure that only objects that had the SSE-S3 encryption specified were allowed to be uploaded.
This would deny any object that did not have the SSE-S3 server-side encryption enabled. If you are not using the management console to instigate the SSE encryption and instead you were using the AWS CLI, the user would have to add the server-side encryption AES256 parameter to enforce the encryption.
Let me now move on to SSE-KMS to show you how this method of encryption works. As you may have guessed, SSE-KMS uses the key management service to help with key management during encryption. If you are unfamiliar with KMS, then I recommend that you take our existing KMS course to help you understand how the service works and the different components within the service which can be found here.
If you select to use SSE-KMS for your object encryption, you have the opportunity to either select the default AWS S3 customer master key, CMK, which is managed by AWS or to set one of your existing customer-managed CMKs. This key will be used to encrypt the data keys generated by KMS which are then used to encrypt your object data on S3.
So, to be clear, the KMS CMK is used to encrypt the data keys not the actual object itself. With SSE-S3, a multifactor encryption process was used by first encrypting the object data with a data key and then this data key was encrypted with a master key. In the case of SSE-KMS, the master key puts on the same role as the KMS-CMK selected.
When you upload an object to S3 using SSE-KMS, a request is made by S3 to KMS which returns two versions of a randomly generated data encryption key. One version of this data key is plain text which S3 stores in memory and uses to perform the encryption of the object at which point it is removed from memory.
The second version is an encrypted version of the data key and is uploaded with the object. When S3 needs to decrypt the data, S3 sends AWS-KMS the encrypted data key associated with the object and KMS uses the CMK associated to decrypt the data key and responds with a plain text version of the key allowing you to decrypt the object.
Again, this key is stored in memory and will be deleted as soon as decryption has happened. If you don't have a CMK configured, but you still want to use SSE-KMS, then S3 will automatically create the default AWS S3 CMK for you the first time you upload an object with this encryption type. It will then use the same CMK for all other uploads unless a customer-managed CMK is specified within that same region.
However, this AWS-managed CMK does not provide the same level of management that the customer-managed CMK does. By using your own customer-managed CMK, it gives you far greater flexibility of how your key is managed. For example, you are able to disable, rotate, and apply access controls to the CMK and audit it against their usage using AWS CloudTrail.
Similarly with SSE-S3, SSE-KMS also supports bucket policies. However, a different value needs to be configured for the server-side encryption parameter to indicate SSE-KMS encryption, So, the last option of SSE encryption is SSE-C, so let me now take a look at this option. So, SSE-C is server-side encryption for customer-provided keys.
So, here the encryption is provided to AWS S3 without using KMS and the S3 service itself performs the encryption. All we need to do is to supply the key. When you upload your data object, you must send your customer-provided key with the request. On this point, it's worth mentioning that SSE-C only works with requests using HTTPS.
S3 will reject any request sent using HTTP. This helps to secure the data in transit, specifically for a customer-provided key. Once the encryption has taken place which will use AES-256, AWS deletes the key from memory, but instead stores a randomly salted hash message authentication code, a HMAC value, which is used for data integrity and authentication of the key to validate future requests.
When requesting to access the object from S3, you must again supply the same customer-provided key to decrypt the object. Remember, AES is symmetric, meaning you need the same key to decrypt as you use to encrypt. There are a number of request headers that you must use when using SSE-C as shown in the table below.
Now, I have covered server-side encryption, let me now talk to you about client-side encryption options within S3. This differs from server-side encryption in the fact that data is encrypted before it is sent to S3 for storage. S3 does not form any encryption itself when client-side encryption is used as the encryption mechanism.
There are two options that you can use, client-side encryption with KMS, CSE-KMS, and client-side encryption using a custom client-side master key, CSE-C. When using CSE-KMS with a CMK, you only need to supply the CMK-ID to the Amazon S3 encryption client. For example, the Amazon S3 encryption client in the AWS SDK for Java and the encryption is managed for you by AWS.
When you upload an object to S3 using this method, supplying the CMK-ID, a request is made by the client to KMS which returns two versions of a randomly generated data encryption key. One version of this data key is plain text which the client uses to perform the encryption of the object before it's uploaded.
The second version is cipher blob of the data key and is uploaded with the object by the client as object metadata. When you retrieve the object on S3, the object is downloaded in an encrypted form along with the cipher blob data encryption key. Once downloaded, the client sends the ciphered blob to KMS to retrieve the matching plain text version of the data key to enable the decryption of the object.
It's worth mentioning that for each object that is uploaded a different data encryption key is used. When using CSE-C using the client-side master key, your key is never sent to AWS like it is with CSE-KMS, so if you lose your client side master key, then you will lose access to your data. When performing an upload of an object, again, you need to provide the key to the client, for example, the Amazon S3 encryption client when using the AWS SDK for Java, but this time it will be your client-side master key. As this is the master key, it is only used to encrypt a randomly generated symmetric data encryption key generated by the client. This data key is then used to encrypt the object data.
Once the object is encrypted, the client-side master key is used to encrypt the data key. The encrypted data key is then made a part of the metadata of the object before both the encrypted object and the data key are uploaded to S3. When the encrypted object is retrieved, it is downloaded in an encrypted format along with its metadata including the encrypted data key.
The correct client-side master key is then identified to decrypt the data key which then in turn decrypts the object. So, we've now looked at the different encryption objects available when storing and sending data to S3. Let me now talk a little about how Amazon Athena handles encryption and encrypted S3 objects when performing queries against the data.
Amazon Athena is a serverless interactive query service which uses standard SQL and automatically execute queries in parallel, making it extremely fast. Amazon Athena supports the ability to query S3 data that is already encrypted and if configured to do so, Athena can also encrypt the results of the query which can then be stored in S3.
This encryption of results is independent of the underlying queried S3 data, meaning that even if the S3 data is not encrypted, the queried results can be encrypted. A couple of points to be aware of is that Amazon Athena only supports data that has been encrypted with the following S3 encryption methods, SSE-S3, SSE-KMS, and CSE-KMS.
SSE-C and CSE-E are not supported. In addition to this, it's important to understand that Amazon Athena will only run queries against encrypted objects that are in the same region as the query itself. If you need to query S3 data that's been encrypted using KMS, then specific permissions are required by the Athena user to enable them to perform the query.
You could simply add the users to the key policy of the CMK to provide the relevant access. However, if you wanted to just restrict the specific actions required for Athena to work, then you could simply grant access to the following actions, kms:Decrypt which is required for working with encrypted data sets and queries, and kms:GenerateDataKey which is required for working with encrypted queries only.
That now brings us to the end of this lecture of S3 and Athena encryption. Coming up next, I will be discussing encryption when using Elastic MapReduce, EMR.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.