Security Controls: Data at Rest and In Transit
Start course

When implementing different AWS services and architecting them within your environments, whether it be production, test or dev, do you know your security responsibilities for these services?

It is very likely that you are using services from three different classifications, which each have very different boundaries for enforcing security between the customer and AWS.

These classifications are:

  1. Infrastructure services
  2. Container services
  3. Abstract services

The level of responsibility around these services are defined within three different AWS Shared Responsibility Models, and it’s essential when using AWS you understand your level of responsibility when it comes to applying security.

This course focuses on Container and Abstract services. The primary Container services we look at are: RDS, EMR and Elastic Beanstalk and the primary Abstract services include: S3, DynamoDB, SQS and Glacier.

The lectures within this course will define and guide you through the following areas to help you apply the correct level of security to your Container and Abstract services.

What are AWS Abstract & Container Services?:  This lecture provides you with a clear understanding of what abstract and container services are within AWS. There is a clear divide between the two which must be understood as responsibilities around security is a key difference between them

Security Controls: Data at Rest and In Transit:  Here we will take a look some of the available options and best practises to help you maintain integrity and protection around your data when at rest, in transit and held within a number of container and abstract services

Security Controls: Network Segmentation:  In this lecture we look at how we can use the network infrastructure and architecture to connect and restrict access to our container and abstract services to increase security through a number of different controls

Identity & Access Management:  IAM is heavily used for both container and abstract services and plays a key part in authorisation and authentication for access and management, this lecture looks at how IAM can be used to help protect access across your services

Built-in Service Security Controls:  This lecture will briefly look at some of the service specific security controls that may not have been covered in the previous lectures that you can leverage to help secure you data and environment

If you have thoughts or suggestions for this course, please contact Cloud Academy at


Referenced Resources

AWS Links:

Implementing SSE and CSE with EMR

AWS Labs GitHub Repository

Using SSL to encrypt a connection to a Database

Oracle Native Network Encryption

Configuring end-to-end encryption in a load-balanced Elastic Beanstalk envrionment

Lecture Transcript

Hello and welcome to this lecture where we sure look at the different security controls that can be used to help you architect and implement a secure solution around your container and abstract services.

From this point onwards within this course I will provide some best practice controls for the following services, for both container and abstract classifications.

So for container services we'll be looking at:

  • RDS
  • EMR, and
  • Elastic Beanstalk

And for the abstract services we'll be looking at:

  • S3
  • DynamoDB
  • SQS and
  • Glacier

These services have been selected as they're some of the most common services utilized within each classification. However, if you're using container or abstract services other than the ones listed here, then the principles that I cover throughout this course will enable you to look into those services with the same approach and at least be aware of what to look out and architect for when planning and applying your security policies.

Okay, let's get started by looking at some of these security controls that you can implement to increase the security around these different services. Starting with the protection of your data, but more specifically on how to protect your data at rest.

EMR is a managed service by AWS and is comprised of a cluster of EC2 instances. That's a highly scalable to process and run big data frameworks such as Apache Hadoop and Spark.

A key point to make it is that by default EMR instances do not encrypt data at rest. The instances used with EMR are created from pre-configured AMIs, Amazon Machine Images, that have been published and released by AWS. You are not able to use your own custom AMIs in an EMR cluster. Similarly, the same applies to EBS volumes for a persistent storage within the cluster which are again supplied by AWS.

EMR can also use DynomoDB or S3 for its persistent data store which you can either access directly or copy the data from these services to its own persistent data store on to HDFS, Hadoop Distributed File System.

Although EMR does not encrypt data at rest by default there are a number of mechanisms you can use if your data is sensitive enough or perhaps you're required to do so for compliance reasons.

If you decide to use persistent storage rather than S3 or DynamoDB, then there're a number of options available that can work together, if you enable local disk encryption in your EMR security configuration. Once enabled the following features are available.

  • Linux Unified Key Setup, which allows EBS cluster volumes to be encrypted
  • Also Open-Source HDFS Encryption, which provides two Hadoop encryption options: Secure Hadoop RPC and Data encryption of HDFS Block Transfer.

As mentioned previously, we could use S3 as our persistent data store for EMR. If this was the case then you could use S3's very own encryption tools. As we know S3 is an abstract service and as a result you could apply server-side encryption SSE-S3, to encrypt the data that is stored at rest. To help with this configuration EMR allows you to apply security configuration that specifies security settings on how you can encrypt your data. You can either encrypt data at rest, data in transit, or, if required, both together. The great thing about these security configurations is that they're not actually a part of the cluster itself. They exist within EMR and therefore you can reuse the same security configuration for existing clusters or others you plan to use in the future.

As a part of this security configuration you have the choice of AWS managing your encryption keys for you on S3, using the SSE-KMS, Server-Side Encryption with Key Management Service, or by using the SSE-S3 method. Alternatively, you could choose to manage your own encryption keys with client-side encryption. If we refer back to the shared responsibility model for abstract services, we can see this difference of responsibility clearly here.

For those unfamiliar with SSE it's an encryption method used in Amazon S3 to encrypt any object at rest. It's completely managed by AWS along with the encryption keys which themselves are also automatically encrypted and rotated regularly by S3. SSE-S3 uses the 256-bit Advanced Encryption Standard, AES-256, algorithm for its encryption. When using SSE-KMS AWS will configure and create a KMS customer muster key, CMK, and associate all relevant policies to work with your EMR cluster. More information on KMS can be found here.

In addition to SSE, S3 offers Client-Side Encryption with KMS or a customer key provider. Alternatively you could encrypt your data using your application before storing it on S3 where it would remain stored in an encrypted form, for example, using a serializer/deserializer with Hive.

More information on implementing SSE and CSE with EMR can be found here.

Another option is to apply encryption at the application level by encrypting the entire file. Alternatively and again at the application level you could encrypt individual fields within your data by using a standard serializer/deserializer such as JSON for Hadoop.

So at a fairly high level there're just a few mechanisms that you can choose to apply encryption at rest on EMR which, remember, is not provided by default. For more information on how to set up these different methods of encryption in detail I recommend you visit the relevant AWS documentation pages on EMR.

Let's now take a look at another container service: RDS.

RDS allows you to set up a relational database using a number of different frameworks, such as MySQL, MS SQL Server, and Oracle, at cetera.

During the creation of your RDS database instance you have the opportunity to enable encryption on the Configure Advanced Settings screen under Database Options, Enable Encryption. By enabling encryption here you're enabling encryption at rest for your storage, snapshots, read replicas, and your backups. Keys for all of this encryption can be managed by the AWS Key Management Service.

KMS utilizes the AES-256 encryption algorithm which is the same used in S3 for its server-side encryption. This encryption mechanism is completely transparent to any applications reading and writing data to the database.

In addition to encryption offered by RDS itself at the application level, there're additional platform level encryption mechanisms that could be used for protecting data at rest, including Oracle and SQL Server Transparent Data Encryption, TDE. This could be used in conjunction with the method already discussed, but it would impact the performance of the database. Also, MySQL cryptographic functions. More information on this can be found here. And lastly, Microsoft SQL Transact-SQL cryptographic functions. And again, you can find more information on this using the link on the screen.

In comparison to EMR, encryption at rest for RDS is simplified, thanks to the built-in application level encryption option, which EMR does not have.

We have looked at a couple of examples for how data can be encrypted at rest with container services. Now let's visit the abstract services. We have already covered elements of S3 when we discussed Elastic MapReduce, but there're additional protection mechanisms offered by S3 other than server-side and client-side encryption that helps to protect your data when at rest.

Securing and protecting your data at rest is not always about encryption, it's also about availability and reliability of being able to access your data. By default, S3 replicates your objects across all availability zones within the region where your data was uploaded. On a side note, S3 also supports cross-region replication, but this has to be configured manually. The automatic replication between availability zones ensures that you will still be able to access your data in the event of an availability zone outage, therefore protecting your data when at rest. It doesn't, however, protect the data against accidental or malicious deletion. To help protect against this, S3 allows you to implement versioning on your buckets. By default this is not enabled, but once enable it cannot be disabled, only suspended.

S3 versioning, as the name implies, allows you to version control objects within your bucket. This allows you to recover from unintended user changes and actions, including deletions, that might occur through misuse or corruption. Enabling versioning on the bucket will keep multiple copies of the object. Each time the object changes a new version of the object is created and access the new current version. One thing to be aware of with versioning is the additional storage cost applied in S3. Storing multiple copies of the same object will use additional space and increase your storage cost.

Now let's take a quick look at Glacier and DynamoDB.

By default, AWS Glacier encrypts data at rest using server-side encryption. For each archive created with Glacier, a new key is generated and the data is encrypted using AES-256 algorithm. AWS manages these keys, all key rotations, and the keys themselves are encrypted with a master key which is also stored and managed by AWS.

If you want to add another layer of protection for your data on Glacier, then you should simply encrypt the data before sending it to Glacier.

DynamoDB does not currently have support for any server-side encryption. This means any encryption for data at rest falls upon you to implement. It is possible to encrypt your data if you're using Java by using an AWS client-side library for encrypting. This can be found from the AWS Labs GitHub repository.

Another option is to encrypt your data with an application development framework before saving and storing your data with DynamoDB.

As we have looked at securing data at rest for a couple of container and abstract services, let's turn our attention now to see how we can secure data when in transit.

Let's start by looking at RDS, a container service. When communicating with the RDS instance, for example from the application, then you can secure that communication using SSL/TLS, Secure Sockets Layer/Transport Layer Security, which will encrypt that data whilst in transit.

This is recommended if you have to abide by specific compliance and governance controls or when the data being sent to RDS is highly sensitive, perhaps containing customer information.

The method in which this process is carried out varies depending on which database type you have. For more information on the implementation of the encryption please visit the link on the screen.

If you're using Oracle with RDS, then instead of using SSL encryption between the client and the database, you could use Oracle's Native Network Encryption, NNE, which will encrypt all connections to and from the database. It is, however, not possible to use both SSL and NNE together for encryption, one of them must be switched off to then use the other. More information on NNE can be found here.

When using EMR, there're again a number of options. Similar to when we looked at EMR for securing data at rest on the ABS, we mentioned the following when local disk encryption was enabled in the security configuration.

These encryption features are also used for encrypting data in transit for Hadoop along with Hadoop MapReduce Encrypted Shuffle which uses SSL/TLS.

For this course I won't go into the full range of other in-transit encrypting methods for other big data frameworks running on EMR, but realize others do exist for Spark and Tez. And so for information on what these are and how to configure them, please refer to the relevant AWS documentation.

When EMR communicates with S3 or DynamoDB to transport data then this communication will be sent over an encrypted HTTPS protocol.

When users connect to the EMR cluster for admin purposes then it's recommended that this is conducted over a Secure Shell, SSH, a cryptographic network protocol, encrypted connection.

Next, if we take a quick look at data protection for Elastic Beanstalk, it can be achieved using HTTP over SSL with signed certificates. This would allow clients to access your website and application where data will be encrypted in both directions, from the client to your Elastic Beanstalk environment. When doing so, by default, the encryption in transit will exist between the client requesting access and your Elastic Load Balancer for Elastic Beanstalk. From the ELB to your backend instances the data will then be unencrypted.

If you need end-to-end encryption for compliance or sensitivity reasons, then this level is possible but it will require additional configuration and implementation. For detailed instructions on how to implement this, please see the following link.

Let's now take a look at how encryption in transit is handled for some of the abstract services starting with S3.

With S3 being a managed abstract service encryption in transit is managed by AWS, and so all communication with S3, whether it be from the AWS Management Console over an API, or from the AWS CLI, it will be encrypted. S3 uses HTTPS and SSL connections to encrypt any communication into and out of S3 automatically.

When accessing DynamoDB over the internet the only connections that should be allowed and permitted are those that use HTTPS to ensure the data is encrypted.

Taking a step back from container and abstract services for a moment, I want to briefly mention how data in transit is secured when using the AWS Management Console. The console uses SSL between your browser and the AWS service endpoints in addition to an X.509 certificate to authenticate the identity of the console service endpoint.

If using an SDK, the AWS CLI, or an AWS API call not from the AWS Management Console, then these are RESTful APIs over HTTPS. When the SSL connection is made between the two endpoints all traffic is then encrypted and protected.

That brings us to the end of this lecture. And I know we only covered a handful of services. I hope you can see that between both container-based and abstract services there're security features that can be used to protect your data both when in transit and at rest. There're differences between the amount of configuration and control that you have between the different classifications, and again, this comes down to the different shared responsibility models that they operate between.

Coming up next we're going to look at how the configuration of your network infrastructure can actively be used as a security layer.



About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.