Introduction to Google Cloud Storage Options
Start course
1h 34m

Google Cloud Platform has become one of the premier cloud providers on the market. It offers the same rich catalog of services and massive global hardware scale as AWS as well as a number of Google-specific features and integrations. Getting started with GCP can seem daunting given its complexity. This course is designed to demystify the system and help both novices and experienced engineers get started.

This Course covers a range of topics with the goal of helping students pass the Google Associate Cloud Engineer certification exam. This section focuses on identifying relevant GCP services for specific use cases. The three areas of concern are compute, storage, and networking. Students will be introduced to GCP solutions relevant to those three critical components of cloud infrastructure. The Course also includes three short practical demonstrations to help you get hands-on with GCP, both in the web console and using the command line.

By the end of this Course, you should know all of GCP’s main offerings, and you should know how to pick the right product for a given problem.

Learning Objectives

  • Learn how to use Google Cloud compute, storage, and network services and determine which products are suitable for specific use cases

Intended Audience

  • People looking to build applications on Google Cloud Platform
  • People interested in obtaining the Google Associate Cloud Engineer certification


To get the most out of this course, you should have a general knowledge of IT architectures.


Google Cloud gives you a lot of ways to store your data. There are a variety of managed services that handle infrastructure, scaling, backups, all of that for you, and then there are more configurable approaches for people that want to directly administer everything.

Probably the easiest way to break it down is to make a distinction here between storage and databases. So when we talk about databases, we usually mean data stored with a certain structure. The database has storage, of course, but the key thing is the software for accessing it. Storage is the more generic term. Google Cloud has five distinct offerings for simple storage without any concern for structure or database software.

So as mentioned, the first is VM-attached persistent disk storage. Now, this is analogous to an Amazon EBS volume. This is a block of HDD or SSD storage that's attached to individual Compute Engine instances. Now, this is a great solution if you're doing a model at the gap on a single server and you need all of your data in one place and possibly a database engine running on that instance. Two really nice features about GCP's persistent disk storage is zero downtime scalability and automatic encryption. So you get the peace of mind of knowing that your data is secure at rest, it's encrypted, and it's very easy to add more storage to an instance without interrupting anything.

Now, GCP also has an NFS storage product called Cloud Filestore. This is similar to persistent disk, you can mount NFS instances on two VMs for applications that need a file system interface. This is a type of NAS, or network-attached storage. So the critical thing to consider here is latency. Filestore offers two different performance tiers: there's a standard performance tier and premium. So the main difference here is that the premium tier offers substantially higher read/write throughput, so be sure to check the specs here very carefully, we'll add a little chart you can look at and, you know, there are many applications, of course, that can be bottlenecked by NAS systems.

So the three remaining GCP storage products are all very similar, they are all managed object storage systems similar to Amazon S3. The difference between these three products concerns the frequency of access of the things that you're storing. So the names of these products in order of greatest flexibility, the highest price, they are Cloud Storage, Cloud Storage Nearline, and Cloud Storage Coldline.

So all of these products, the Cloud Storage, they work using buckets—that's the standard approach. We create a bucket in the console or using the command line and in these buckets we can store any arbitrary data we want and again like S3, we can set all kinds of security configuration on the buckets. We can integrate them with our applications, we can make them publicly available, we can grant access so that things can read and write from Cloud Storage. The only thing different about the Nearline and the Coldline versions of Cloud Storage are the act access frequency SLA and the cost. So Nearline is for data that is accessed relatively rarely, say, once a month, and because of that, it's much cheaper than just the regular cloud storage. Now, Coldline is really for archiving. This is, by far, the cheapest Cloud Storage you can get but with its access SLA, it is really only for data that's accessed maybe once per year, if ever. So Coldline is a really good place to dump log files, compliance records, noisy metrics, you know, debug files, redundant backups, other data that might only be relevant in a really unusual kind of disaster scenario.

So that, those are the basic you know raw storage methods for VM instances and for other CCP service. You know, you could use Filestore or persistent disk if you need to attach storage directly to a server. And now you should know a little more about Cloud Storage as a solution for managed storage buckets. Now, for most applications though we're going to need some structure to our data. We need more than just a bucket. We want to run SQL-like queries we want an SQL-like interface or some sort of query language, we want a proper database solution. And for that, GCP offers five distinct products again.

So, let's start with the more popular offering, and that would be Cloud SQL is analogous to Amazon RDS. Again, sorry with all the Amazon comparisons but it's the easiest way if you're familiar with it. So what cloud SQL is you create database instances similar to VM instances in your account in specific regions. There are actually a few different flavors of Cloud SQL, three different ones. There's a MySQL one, a PostgreSQL one, and a vanilla SQL type. So this is just depending on the type of SQL engine you wanna use, based on your preferred SQL implementation. All versions though include the same basic features: automatic encryption of data at rest, seamless scalability for adding read replicas, and multi-region support. You can add more CPU and more memory for tougher workloads, all of that is supported.

Now, Cloud SQL is not the only relational database offering for GCP, there's also something called Cloud Spanner and Cloud Spanner is actually a very interesting product. It's a very interesting SQL solution in that it tries to give you all the benefits of a traditional relational database without any of the trade-offs demanded by massive scale. So if you've ever studied cap theorem, CAP, you might know the old saying that there's consistency, there's availability, and partition tolerance and you can only get two of those things. You have to pick two. So if you want consistency and availability, you have to sacrifice partition tolerance. If you want partition tolerance and availability, you have to sacrifice consistency, so on and so forth. Cloud Spanner is a product that tries to break this rule and give you all three and it does this by throwing Hardware at the problem. So the trade-off is price.

So let me be a little more specific. With cloud spanner basically you get high availability, strong consistency, and the ability to scale horizontally. And this last point is accomplished by adding nodes. So nodes are compute resources that can be configured as part of your Cloud Spanner setup. What they do is they add CPU and RAM resources to your spanner database and they allow you to ensure strong performance even across many geographic regions. So with Cloud Spanner, you can have a massive amount of traffic, literally thousands of writes and reads per second, with strong consistency guarantees. You get, you know, classic relational acid guarantees and guaranteed availability across the globe with guaranteed performance using an SQL-like interface.

So basically, you have your cake and eat it too, you have everything. Now, the trade-off, as we said, is price. Generating the necessary amount of nodes and replicas will not be cheap. Now, check the GCP pricing tool here for more details. Here's a quick example just for reference. As you can see, the small, for a small workload, a single node in a single region, the spanner deployment is actually an order of magnitude more expensive than Cloud SQL. In general, Cloud SQL is going to be much cheaper and is the more standard approach for a managed SQL solution in GCP. Cloud Spanner is really for more unique use cases. Very unusual use cases. So be sure to look into that documentation there. 

Now, GCP offers also two distinct NoSQL databases, one is called BigTable and one is called Firestore. Now, the former, BigTable, is a wide column database and the latter, Firestore, is a document database. So if you have worked with like Cassandra or MongoDB, CouchDB, maybe you are familiar with NoSQL systems, you might know a little bit about, kind of the use cases and the problems they're are meant to solve. Column databases tend to be great for more write-heavy workloads, you know, such as scan data, time series data. Document databases are often used for JSON or unstructured text. So again, these are for particular use cases. 

And then finally, we have the Memorystore and Firebase. The memory store is simply a managed Redis service. This is for storing data in memory on customizable nodes. It's generally used as a caching layer to speed up read performance, kind of like ElastiCache again in AWS, however, it can also be used as a datastore in its own right, if you want to have all of your data just in memory. Firebase database is a real-time syncing database that stores data as JSON documents. It's used often in mobile applications that need to keep client data in sync, so Google acquired Firebase, as you may know, several years ago and tightly integrated the Firebase database product with GCP. This is distinct from Firestore, so you know, be sure to look at the documentation there:

If you're familiar with Firebase, this is the GCP version of it and it's definitely a great option if you're already using Firebase.

So, okay, that's it. Well that was a lot and thankfully we're done, we've covered all of the main GCP data storage options. You should have a good understanding of each but maybe not enough to make strong recommendations for a specific use case. So, in our next lesson, we'll cover that exact issue. We will talk very specifically about which storage technology you should choose. We'll talk about them generally, and we'll relate them to specific GCP use cases. So when you're ready, we'll see you there.

About the Author

Jonathan Bethune is a senior technical consultant working with several companies including TopTal, BCG, and Instaclustr. He is an experienced devops specialist, data engineer, and software developer. Jonathan has spent years mastering the art of system automation with a variety of different cloud providers and tools. Before he became an engineer, Jonathan was a musician and teacher in New York City. Jonathan is based in Tokyo where he continues to work in technology and write for various publications in his free time.