Data Encryption | CSL4 A6.1 |
Hello, and welcome to this lecture where I will be explaining what encryption is at a high level, and when and why you may want or need to use it.
Unencrypted data can read and seen by anyone who has access to it, whether this data is stored at rest or set between two locations in transit, it's known as plaintext or clear text data.
The data is plain to see and can be seen and understood by any recipient. There is no problem with this, as long as the data is not sensitive in any way and doesn't need to be restricted. However, on the other hand, if you do have data that is sensitive, and you need to ensure the contents of this data is only viewable by a particular recipient or recipients, then you need to add a level of encryption to that data.
But what is encryption? Data encryption is the mechanism in which information is altered, rendering the plaintext data unreadable through the use of mathematical algorithms and encryption keys. When encrypted, the original plaintext data is now known as ciphertext, which is unreadable. To decrypt the data, an encryption key is required to revert the ciphertext back into a readable format or plaintext.
A key is simply a string of characters used in conjunction with the encryption algorithm, and the longer the key, the most robust the encryption. This encryption involving keys can be categorized by either being symmetric cryptography or asymmetric cryptography. Let's talk a look at both methods. I'll start with symmetric cryptography first.
With symmetric encryption, a single key is used to both encrypt and also decrypt the data. So for example, if someone was using the symmetric encryption method, they would encrypt their data with a key, and then when that same person needed to access that data, they would use the same key that was used to encrypt the data to decrypt the data.
However, if the encrypted data was being read by a different person, that person would need to be issued the same key. Remember, the same key is needed to decrypt the data that was used to encrypt it. As a result, this key must be sent securely between two parties, and here exposes a weakness in this method.
If the key is intercepted by anyone during that transmission, then the third party could easily decrypt any data associated with that key.
Some common symmetric cryptography algorithms that are used are AES, advanced encryption standard, DES, digital encryption standard, Triple-DES, and Blowfish. Now let's compare this to asymmetric encryption, which involves two separate keys.
One is used to encrypt the data, and another separate key is used to decrypt the data. These keys are both created at the same time and are linked through a mathematical algorithm. One key is considered the private key and should be kept by a single party, and should never be shared with anyone else. The other key is considered the public key, and this key can be given and shared with anyone.
Unlike with symmetric encryption, the public key does not have to be sent over secure transmission. It doesn't matter who has access to this public key, as without the private key, any data encrypted with it cannot be accessed. Both the private and public key is required to decrypt the data when asymmetric encryption has been used.
So how does it work? If another party wanted to send you an encrypted message or data, they would encrypt the message using their own public key, which could be made freely available to them or anyone. It's public for a reason. This message is then sent to you where you will use your own private key which has that mathematical relationship with your public key to decrypt the data.
This allows you to send encrypted data to anyone without the risk of exposing your private key, resolving the issue highlighted with symmetric encryption. The advantage that symmetric has over asymmetric is the speed of encryption and decryption. Symmetric is a lot faster from a performance perspective.
However, it does carry an additional risk, as highlighted. Some common examples of asymmetric cryptography algorithms are RSA, Diffie-Hellman, and Digital Signature Algorithm. So now we know what encryption is and are familiar with the differences between symmetric and asymmetric algorithms, when and why you may want to use encryption.
It may seem obvious. You want to protect your data, and that's true. But do you want to protect it at rest or when it's in transit, or both? And not forgetting other legal requirements, too. Any sensitive data that is stored at rest should be encrypted to protect both you and your customers. Should an untrusted entity gain access to the data, you can be assured that the information held within it cannot easily be accessed, safeguarding your business and your customer's data from the intrusion.
In today's world of virtualization and cloud technology, the physical location of stored data is often unknown. When you then couple this with replication, high availability, resiliency, or DR, where your data could be moved and replicated across a series of different AZs and regions automatically by design of some of the AWS services.
By encrypting your data, you be safe in the knowledge that any unexpected distribution of data by AWS would not be accessible by anyone unexpected. Be mindful that when your sensitive data is being moved and distributed, it should be done so via a secure mechanism, providing encryption in transit where possible.
Much of this can be done over HTTPS or SSL within AWS. If encryption in transit is not possible, then at the very least, the data should be encrypted prior to transmission. You may also have to apply encryption mechanisms against your data to adhere to specific compliance and legal controls that may be required to meet internal customer or external governing standards, such as PCI DSS or HIPAA.
By applying encryption to data bound by these standards and governance, it will help you achieve the required controls and compliance requirements. Big data solutions often hold very sensitive information, and so understanding how to apply and adopt encryption methods for different services that can be used for big data is crucial.
That now brings us to the end of this lecture. Coming up next, I'm going to start looking at encryption methods used for Amazon's Simple Storage Service, known as S3.