Domain 3: Specify Secure Applications and Architectures
Hello, and welcome to this final lecture, where I shall be highlighting some key points from each lecture that you have covered throughout the course.
I started off by providing an overview of encryption in general, where we learned unencrypted data is known as cleartext or plaintext. Data encryption is the mechanism in which information is altered, rendering the plaintext data unreadable through the use of mathematical algorithms and keys.
A key is simply a string of characters used with the the encryption algorithm, and the longer the key, the more robust the encryption. Key cryptography is either symmetric or asymmetric. Symmetric cryptography uses a single key to perform the encryption and decryption. And common symmetric algorithms are AES, DES, Triple-DES, and Blowfish.
Asymmetric cryptography uses different keys, one to perform encryption and one to perform decryption. In this process, one key is public, and one key is private. Common asymmetric algorithms are RSA, Diffie-Hellman, and Digital Signature Algorithm.
I then started to explain about the different encryption mechanisms that are used across a range of services, specifically ones that can be used for big data, starting with S3.
We learned that S3 offers server-side encryption, SSE, and client-side encryption, CSE. SSE offers three different options. SSE with Amazon S3-Managed Keys, which is SSE-S3, SSE with AWS KMS-Managed Keys, which SSE-KMS, and SSE with Customer-provided Keys, SSE-C. Client-side encryption offers two different options, client-side encryption with KMS, CSE-KMS, and client-side encryption using a custom client-side master key, CSE-C.
SSE-S3 is managed by AWS and uses AES-256 symmetric encryption, which supports bucket policies.
SSE-KMS uses the key management service to help with key management, and it allows you to use your own keys, the CMK, through KMS, giving more control and flexibility. The CMK encrypts the data key, not the data itself, and this also supports bucket policies.
SSE-C uses customer-provided keys using AES-256. During an upload of an object, you must also send the key with it. And it only works with HTTPS to secure data in transit, specifically the key.
Client-side encryption using KMS uses an S3 encryption client, such as the Amazon S3 Encryption Client in the AWS SDK for Java and you must supply the CMK ID from KMS. And the encryption happens prior to upload and after download of an object. CSE-C uses a custom master key, which is never sent to AWS. It also uses an S3 client, like CSE-KMS, and again, encryption happens prior to upload and after download of an object. We then looked at Athena.
And Athena supports the ability to query encrypted data on S3. It can also encrypt queried results, even if the data queried was not encrypted. SSE-C and CSE-C are not currently supported by Athena. But Athena does support SSE-KMS, SSE-S3, and CSE-KMS. Athena will only query objects in the same region as where Athena is running.
And KMS Decrypt and KMS Generate Key permissions are required to allow Athena to query encrypted data on S3 using KMS. Following this lecture, I then looked at Elastic MapReduce Encryption. And the key points from this lecture were that by default EMR does not implement encryption at rest.
From EMR version 4.8.0 and onwards, you are able to configure a security configuration specifying different settings on how to manage encryption for your data. The security configuration allows you to configure encryption at rest, in transit, or both together. The security configuration exists separately from your EC2 clusters.
And using EMR version 5.7.0, you can specify your own custom AMI, allowing you to encrypt the EBS route device volume of your instances.
Using EBS as your persistent storage layer, you can implement Linux Unified Key Setup with KMS and open-source HDFS encryption. This provides two Hadoop encryption options, Secure Hadoop RPC and data encryption of HDFS block transfer.
When using S3, EMR supports the use of SSE-S3 or SSE-KMS to perform server-side encryption. You could also encrypt your data using your client before storing on S3 using CSE-KMS or CSE-C.
EMR in transit encryption, you can enable open-source TLS encryption features. When a TLS certificate provider has been configured, the following application-specific encryption features can be enabled: Hadoop, with Hadoop MapReduce Encrypted Shuffle, Secure Hadoop RPC, and data encryption of HDFS block transfer.
With Presto, when using EMR version 5.6.0 and later, any internal communication between Presto nodes will use SSL TLS. With Tez, Tez Shuffle Handler uses TLS. And Spark, the Akka protocol uses TLS, block transfer service uses SASL and Triple-DES, external shuffle service uses SASL.
Transparent encryption can also be used by implementing transparent encryption in HDFS.
Following EMR, I looked at encryption for the RDS service, and here we learned that you can configure encryption at rest during its configuration by selecting the checkbox for "enable encryption". This encryption can only be implemented during the database creation. And read replicas will have the same level of encryption as the master database.
You can also implement application-level encryption using Oracle and SQL Server Transparent Data Encryption, TDE, MySQL cryptographic functions, and Microsoft Transact-SQL cryptographic functions. To use TDE encryption, the database must be a part of an option group with the TDE option added to the group.
TDE can use two different encryption modes, TDE table namespace encryption and TDE column encryption. RDS in-transit encryption can be enabled by using SSL between your application and the RDS database. To use Oracle's native network encryption, NNE, you must add native network encryption to the database options group.
Moving on from databases, the next topic was encryption mechanisms when using the Amazon Kinesis platform for both Kinesis Firehose and Kinesis Streams. Within this lecture, we learned that data being sent to Kinesis can be sent using SSL for in-transit encryption. By default, once the data enters Kinesis, it is decrypted.
Kinesis Firehose can use SSE-KMS when sending data to S3. Kinesis Firehose must have the KMS decrypt and the KMS generate data key permission of the CMK when using this method. If data is being sent to Amazon Redshift by Kinesis Firehose, S3 will still be used as an intermediary storage location, and so the same encryption can be applied.
Since July 2017, Kinesis Streams can now apply SSE encryption to incoming data to the stream using KMS. Producers and consumers need to have the relevant permissions to the KMS CMK. Kinesis Streams will request a new data encryption key approximately every five minutes. And a performance hit of approximately 100 microseconds is added for the encryption and decryption to take place by the producers and consumers.
We then finished up by looking at encryption options within Amazon Redshift. Amazon Redshift uses a four-tiered structure of encryption keys. Tier one is the master key, tier two, the cluster encryption key, CEK, tier three, the database encryption key, DEK, and then tier four, the data encryption keys.
The master key can be generated by either KMS or CloudHSM. Integration exists between KMS and Amazon Redshift, but not between CloudHSM and Redshift. When using CloudHSM, a trust must be established between your HSM and Amazon Redshift to send secure encryption keys between the two resources. For this to take place, you must download a certificate from Redshift to your HSM device, and then configure Redshift with the following details of your HSM: the HSM IP address, HSM partition name, the HSM partition password, and the public HSM service certificate.
You can perform a key rotation using the AWS Management Console for both your CEK and DEK.
That has now brought me to the end of this lecture and to the end of this course. I hope it has given you a good understanding of encryption itself, including symmetric and asymmetric cryptography, and opened you to some encryption mechanisms that are offered by services which are commonly used for big data solutions, using Amazon S3, Athena, Elastic MapReduce, the RDS service, Kinesis Firehose, Kinesis Streams, and Redshift.
You should now be able to enforce additional security controls, including encryption, into your existing infrastructure to help secure your data.
If you have any feedback on the course, positive or negative, please do leave a comment on the course landing page. We do look at the comments, and your feedback is greatly appreciated.
Thank you for your time, and good luck with your continued learning of cloud computing.
Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built 70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+ years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.