Data Protection and Encryption
Start course
1h 8m

Google Cloud Platform (GCP) lets organizations take advantage of the powerful network and technologies that Google uses to deliver its own products. Global companies like Coca-Cola and cutting-edge technology stars like Spotify are already running sophisticated applications on GCP. This course will help you design an enterprise-class Google Cloud infrastructure for your own organization.

When you architect an infrastructure for mission-critical applications, not only do you need to choose the appropriate compute, storage, and networking components, but you also need to design for security, high availability, regulatory compliance, and disaster recovery. This course uses a case study to demonstrate how to apply these design principles to meet real-world requirements.

Learning Objectives

  • Map compute, storage, and networking requirements to Google Cloud Platform services
  • Create designs for high availability and disaster recovery
  • Use appropriate authentication, roles, service accounts, and data protection
  • Create a design to comply with regulatory requirements

Intended Audience

This course is intended for anyone looking to design and build an enterprise-class Google Cloud Platform infrastructure for their own organization.


To get the most out of this course, you should have some basic knowledge of Google Cloud Platform.


Protecting data is critical in any organization. Google Cloud Platform is very strong in this area because of its default encryption policies. Before we get into encryption, though, let’s look at Access Control Lists (or ACLs).

ACLs specify who has access to Cloud Storage buckets and objects in buckets. I’m not going to cover this topic in-depth, but there are a few things to keep in mind when you’re deciding what ACLs to apply to your Cloud Storage.

First, there are actually five different mechanisms for controlling access to Cloud Storage: IAM permissions, ACLs, Signed URLs, Signed Policy Documents, and Firebase Security Rules. With so many different ways to control access, you have to be careful not to create conflicting permissions. Start with the first two: IAM permissions and ACLs.

IAM permissions work at the project level or at the bucket level but not at the level of individual objects within buckets. For example, you can specify that a user has full control of all the objects in all of the buckets in your project, but cannot create, modify, or delete the buckets themselves. So they’re a nice way to grant broad access to buckets and objects, but if you want to set fine-grained access, such as which objects a particular group can read, then you need to use ACLs.

The confusing thing about using these two mechanisms is that you have to look at both of them to get a complete picture of access permissions. For example, you could list the ACLs for a bucket and see that only Bob has been granted write access, but it wouldn’t show that Jill has also been granted write access to all buckets by IAM. For this reason, whenever possible, you should try to use either IAM or ACLs, but not both.

Another potential source of confusion is that bucket and object ACLs are independent of each other. The ACLs on a bucket do not affect the ACLs on objects inside that bucket. For example, you might think that Jane doesn’t have access to the objects in a bucket because she hasn’t been granted access to the bucket itself, but she could have been granted access to any of the objects in the bucket.

So you should keep a couple of principles in mind. First, apply the principle of least privilege. Grant users and groups only as much access as they need. Second, keep your access control as simple as possible. Try to use as few control mechanisms as you can.

If GreatInside decides to replace its internal file server using Cloud Storage, then the best way to secure the files would be to use ACLs. You would create groups to match the teams in the company and create ACLs that give those groups access to the appropriate resources. For example, you could create a bucket for each group. Then for each bucket, you would make the associated group a writer of the bucket. Finally, you would set the object default permissions so that any new objects uploaded to the bucket would get the same permissions and everyone in the group would have full access. If the company’s needs aren’t that simple, then you would set more complex ACLs.

Now let’s move on to encryption. To ensure that your data is encrypted at all times, it needs to be encrypted when it’s in storage (also known as “at rest”) and when it is being sent over a network (also known as “in flight”). Google Cloud Platform takes care of both of these situations.

Encryption in flight is handled very simply. All of the Cloud Platform services are accessible only by API (even when you’re using other methods, such as the Cloud Console or the gcloud command, they’re making API calls under the hood). And all API communication is encrypted using SSL/TLS channels. Furthermore, every request has to include a time-limited authentication token, so the token can’t be used by an attacker after it expires. Of course, for any communications between your Google Cloud infrastructure and outside parties, such as website visitors, you have to use SSL/TLS yourself to encrypt the traffic.

Encryption at rest is just as simple if you’re willing to leave it to Google because Cloud Platform encrypts all customer data at rest by default.

So without you having to do anything, all of your data will be encrypted both at rest and in flight. Then why isn’t this the end of this lesson? Well, because your organization might want to take on some of the encryption responsibilities itself.

There are actually two layers of encryption for data at rest. First, the data is broken into subfile chunks, and each chunk is encrypted with an individual data encryption key (or DEK). These keys are stored near the data to ensure low latency and high availability. The DEKs are then encrypted with a key encryption key (or KEK). The keys are AES-256 symmetric encryption keys.

Google always manages the data encryption keys, but your organization can manage the key encryption keys if that’s your preference. There are two options for doing this: Customer-managed encryption keys or Customer-supplied encryption keys.

With the customer-managed option, you use the Cloud KMS service to create, rotate (or automatically rotate), and destroy your encryption keys. The keys are hosted on Google Cloud. You can have as many keys as you want, even millions of them if you actually need that many. You can set user-level permissions on individual keys using IAM and monitor their use with Cloud Audit Logging.

Cloud KMS is a nice service, but why wouldn’t you just let Google manage your key-encryption keys and not have to deal with it yourself? The biggest reason is compliance with standards or regulatory requirements, such as HIPAA (for health information) or PCI (for credit card information).

If your organization requires that you generate your own keys and/or that they’re managed on-premises, then you have to use Customer-supplied encryption keys. Be aware that this option is only available for Cloud Storage and Compute Engine.

With CSEK, Google doesn’t store your key. You have to provide your key for each operation, and your key is purged from Google Cloud after the operation is complete. Here’s how to do it from the command line with each of the two supported services. To encrypt the disk on a Compute Engine instance, you add the csek-key-file flag and point it to a file that contains the key. To encrypt data you’re uploading to Cloud Storage, you have to do it a bit differently. Rather than adding an encryption flag to the gsutil command, you need to add the encryption key to your .boto file, which is the configuration file for the gsutil command. Then all of your gsutil commands will use that key.

It only stores an SHA256 hash of the key as a way to uniquely identify the key that was used to encrypt the data. When you make a request to read or write the data in the future, your key can be validated against the hash. The hash cannot be used to decrypt your data.

There’s a big risk in using this method, though. If you lose your keys, you won’t ever be able to read your data again, and you’ll end up deleting it so you won’t be paying storage charges for unreadable data.

So far all of the encryption methods we’ve covered, including default encryption, Cloud KMS, and CSEK have been examples of server-side encryption. This is where your data is encrypted after Google Cloud receives your data. The only major difference between the 3 methods is where the key comes from. But there is another way. It’s client-side encryption. This means that you encrypt the data before you send it to Google Cloud. Google won’t even know that it’s already encrypted and it will encrypt it again. When you read your data back, Google Cloud will decrypt it on the server side first and then you’ll decrypt your own layer of encryption on the client side. The same warning applies - if you lose your keys, your data will effectively be gone.

Since our case study includes credit card information, we’ll need to be PCI DSS compliant, so we should use Cloud KMS to manage our keys. I’ll talk more about PCI compliance in the next lesson.

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).