Overview of Encryption
Overview of Encryption

The use of Big Data is becoming commonplace within many organizations that are using Big Data solutions to perform large scale queried data analysis with business intelligence toolsets to gain a deeper understanding of data gathered.

Within AWS, this data can be stored, distributed and consumed by various different services, many of which can provide features ideal for Big Data analysis. Typically, these huge data sets often include sensitive information, such as customer details or financial information.

With this in mind, security surrounding this data is of utmost importance, and where sensitive information exists, encryption should be applied against the data.

This course firstly provides an explanation of data encryption and the differences between symmetric and asymmetric cryptography. This provides a good introduction before understanding how AWS implements different encryption mechanisms for many of the services that can be used for Big Data. These services include:

  • Amazon S3
  • Amazon Athena
  • Amazon Elastic MapReduce (EMR)
  • Amazon Relational Database Service (RDS)
  • Amazon Kinesis Firehose
  • Amazon Kinesis Streams
  • Amazon Redshift

The course covers encryptions options for data when it is at both at-rest and in-transit and contains for the following lectures:

  • Introduction: This lecture introduces the course objectives, topics covered and the instructor
  • Overview of Encryption: This lecture explains data encryption and when and why you may need to implement data encryption
  • Amazon S3 and Amazon Athena Encryption: This lecture dives into the different encryption mechanisms of S3, from both a server-side and client-side perspective. It also looks at how Amazon Athena can analyze data sets stored on S3 with encryption
  • Elastic MapReduce (EMR) Encryption: This lecture focuses on the different methods of encryption when utilizing EMR in conjunction such as EBS and S3. It also looks at application-specific options with Hadoop, Presto, Tez, and Spark
  • Relational Database Service (RDS) Encryption: This lecture looks at the encryption within RDS, focusing on its built-in encryption plus Oracle and SQL Server Transparent Data Encryption (TDE) encryption
  • Amazon Kinesis Encryption: This lecture looks at both Kinesis Firehose and Kinesis Streams and analyses the encryption of both services.
  • Amazon Redshift Encryption: This lecture explains the 4 tiered encryption structure when working with Redshift and KMS. It also explains how to encrypt when working with CloudHSM with Redshift.
  • Summary: This lecture highlights the key points from the previous lectures

Resources mentioned throughout this course

Cloud Academy Courses:

AWS Resources:



Hello, and welcome to this lecture where I will be explaining what encryption is at a high level, and when and why you may want or need to use it.

Unencrypted data can read and seen by anyone who has access to it, whether this data is stored at rest or set between two locations in transit, it's known as plaintext or clear text data.

The data is plain to see and can be seen and understood by any recipient. There is no problem with this, as long as the data is not sensitive in any way and doesn't need to be restricted. However, on the other hand, if you do have data that is sensitive, and you need to ensure the contents of this data is only viewable by a particular recipient or recipients, then you need to add a level of encryption to that data.

But what is encryption? Data encryption is the mechanism in which information is altered, rendering the plaintext data unreadable through the use of mathematical algorithms and encryption keys. When encrypted, the original plaintext data is now known as ciphertext, which is unreadable. To decrypt the data, an encryption key is required to revert the ciphertext back into a readable format or plaintext.

A key is simply a string of characters used in conjunction with the encryption algorithm, and the longer the key, the most robust the encryption. This encryption involving keys can be categorized by either being symmetric cryptography or asymmetric cryptography. Let's talk a look at both methods. I'll start with symmetric cryptography first.

With symmetric encryption, a single key is used to both encrypt and also decrypt the data. So for example, if someone was using the symmetric encryption method, they would encrypt their data with a key, and then when that same person needed to access that data, they would use the same key that was used to encrypt the data to decrypt the data.

However, if the encrypted data was being read by a different person, that person would need to be issued the same key. Remember, the same key is needed to decrypt the data that was used to encrypt it. As a result, this key must be sent securely between two parties, and here exposes a weakness in this method.

If the key is intercepted by anyone during that transmission, then the third party could easily decrypt any data associated with that key.

Some common symmetric cryptography algorithms that are used are AES, advanced encryption standard, DES, digital encryption standard, Triple-DES, and Blowfish. Now let's compare this to asymmetric encryption, which involves two separate keys.

One is used to encrypt the data, and another separate key is used to decrypt the data. These keys are both created at the same time and are linked through a mathematical algorithm. One key is considered the private key and should be kept by a single party, and should never be shared with anyone else. The other key is considered the public key, and this key can be given and shared with anyone.

Unlike with symmetric encryption, the public key does not have to be sent over secure transmission. It doesn't matter who has access to this public key, as without the private key, any data encrypted with it cannot be accessed. Both the private and public key is required to decrypt the data when asymmetric encryption has been used.

So how does it work? If another party wanted to send you an encrypted message or data, they would encrypt the message using their own public key, which could be made freely available to them or anyone. It's public for a reason. This message is then sent to you where you will use your own private key which has that mathematical relationship with your public key to decrypt the data.

This allows you to send encrypted data to anyone without the risk of exposing your private key, resolving the issue highlighted with symmetric encryption. The advantage that symmetric has over asymmetric is the speed of encryption and decryption. Symmetric is a lot faster from a performance perspective.

However, it does carry an additional risk, as highlighted. Some common examples of asymmetric cryptography algorithms are RSA, Diffie-Hellman, and Digital Signature Algorithm. So now we know what encryption is and are familiar with the differences between symmetric and asymmetric algorithms, when and why you may want to use encryption.

It may seem obvious. You want to protect your data, and that's true. But do you want to protect it at rest or when it's in transit, or both? And not forgetting other legal requirements, too. Any sensitive data that is stored at rest should be encrypted to protect both you and your customers. Should an untrusted entity gain access to the data, you can be assured that the information held within it cannot easily be accessed, safeguarding your business and your customer's data from the intrusion.

In today's world of virtualization and cloud technology, the physical location of stored data is often unknown. When you then couple this with replication, high availability, resiliency, or DR, where your data could be moved and replicated across a series of different AZs and regions automatically by design of some of the AWS services.

By encrypting your data, you be safe in the knowledge that any unexpected distribution of data by AWS would not be accessible by anyone unexpected. Be mindful that when your sensitive data is being moved and distributed, it should be done so via a secure mechanism, providing encryption in transit where possible.

Much of this can be done over HTTPS or SSL within AWS. If encryption in transit is not possible, then at the very least, the data should be encrypted prior to transmission. You may also have to apply encryption mechanisms against your data to adhere to specific compliance and legal controls that may be required to meet internal customer or external governing standards, such as PCI DSS or HIPAA.

By applying encryption to data bound by these standards and governance, it will help you achieve the required controls and compliance requirements. Big data solutions often hold very sensitive information, and so understanding how to apply and adopt encryption methods for different services that can be used for big data is crucial.

That now brings us to the end of this lecture. Coming up next, I'm going to start looking at encryption methods used for Amazon's Simple Storage Service, known as S3.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.