Choosing the Right Storage Option
Start course
1h 34m

Google Cloud Platform has become one of the premier cloud providers on the market. It offers the same rich catalog of services and massive global hardware scale as AWS as well as a number of Google-specific features and integrations. Getting started with GCP can seem daunting given its complexity. This course is designed to demystify the system and help both novices and experienced engineers get started.

This Course covers a range of topics with the goal of helping students pass the Google Associate Cloud Engineer certification exam. This section focuses on identifying relevant GCP services for specific use cases. The three areas of concern are compute, storage, and networking. Students will be introduced to GCP solutions relevant to those three critical components of cloud infrastructure. The Course also includes three short practical demonstrations to help you get hands-on with GCP, both in the web console and using the command line.

By the end of this Course, you should know all of GCP’s main offerings, and you should know how to pick the right product for a given problem.

Learning Objectives

  • Learn how to use Google Cloud compute, storage, and network services and determine which products are suitable for specific use cases

Intended Audience

  • People looking to build applications on Google Cloud Platform
  • People interested in obtaining the Google Associate Cloud Engineer certification


To get the most out of this course, you should have a general knowledge of IT architectures.


Now, choosing the right data storage technology is absolutely critical for developing a performant scalable cloud application. Many great app ideas and startups have struggled due to bottlenecks and data management issues caused by using the wrong database.

So our focus here is database, not pure storage, per se. We're not going to worry about GCP persistent disk or Filestore. Google Cloud Storage service can actually be used as an application data layer or database if you want but it's generally not as good as a proper database that structures data in a predictable way. So we're going to focus on GCPs dedicated managed database offerings.

So, when choosing a storage technology, there are really four critical issues to keep in mind. One: what is the data model? Two: what are my access patterns? Three: what is the expected amount of data now and into the foreseeable future? And then four: are there any external constraints around cost, compliance, data location, etc.?

So, data model, that's our first consideration because it's most likely to cause suffering if we get this wrong. If we get the data model wrong, we're gonna suffer later on trying to fix it. You know, one example of that if you opt for a technology like Cassandra, column database, and you need to support a variety of complex arbitrary read queries on your data, you're gonna have a bad time, it just isn't meant for that. So deciding between relational versus NoSQL, that's a good place to start. If you're dealing with, again, time-series data, scan data, JSON documents, binary blobs, there are a number of technologies particularly suited for those use cases. You know, you might want to look at Cassandra, MongoDB and then in Google Cloud, there are versions of that. BigTable, for example.

Now, consider also access patterns here and here we have to consider several different factors. Do we expect to mostly write a lot of data without doing many reads or perhaps the inverse of that, perhaps something more balanced? Do we have a strict SLA that requires very fast data retrieval? Do we need to support access from many different regions? Do we need some sort of caching strategy, either through hardware and database options or an additional technology like Redis?

So this question of access patterns is very, very important. And then, the third consideration is data volume and growth. Now here, we need to consider the expected data size both at inception and over time, and this could be a very hard thing to predict, so it's good practice to overestimate. So, for example, if you're starting with say 10 gigs of production data and expect it to grow to 100 gigs in a year, that's your expectation. Then you want to plan for at least one terabyte of data or more, and this will ensure that you're ready for unexpected usage spikes or application behavior, some, you know, spiking log files, something like that. We need to account for data volume and growth, both for budgeting purposes and for our operational concerns. We need to know how difficult it is also to increase storage capacity. You know, with some solutions it's easier to increase storage than others.

And then finally, we need to think about, you know, the non-technical factors, regulatory issues, legal requirements around data, and of course, budget. So in some countries, for sensitive industries that deal with personal data, there may be requirements that data be encrypted and stored within a specific geographic region. These issues need to be accounted for upfront because again that can create huge operational headaches and even get a company into legal trouble if they're ignored.

So now, how do we take these four considerations and then apply them to the GCP storage technologies we've introduced?

So let's start with data model. GCP CloudSQL is a great option if you need a robust, managed, relational database service. You can, of course, just set up your own database instances on a Compute Engine VM, but then you don't get the same strong uptime guarantees, built-in security, backups, upgradability, it's a trade-off of more fine-grained control in exchange for more operational overhead. And then of course, Cloud Spanner is another great SQL option if you need global scalability and can tolerate a little bit more latency and cost.

Now, if you're in the NoSQL world, then you have a couple of things to consider in GCP. You have memory store which is a Redis service for low latency and very fast in-memory storage, depending on your use case. It may be your main storage engine or it could be a caching layer in your architecture. And then you have Cloud Firestore as your document DB option and BigTable, which is your column-based storage offering. So, consider your own app's data model and determine which of these make sense as a good fit.

And then we have to factor in access patterns. Now, we should be able to narrow down our range of choices by considering data model first and then thinking about whether our workload is going to be read-heavy, write-heavy, balanced, or something else. So we need to consider what sort of queries we need to support, what latency can we tolerate, whether or not we need to support multiple geographic regions, so for example, there's the case of CloudSQL versus Cloud Spanner. The latter is a clear choice if we need global support access and the former is a clear choice if you want the lowest possible latency particularly at the regional level.

And then, the third consideration, data volume, well, for GCP products we might think about an in-memory solution like Memorystore. This might be super fast but it can be expensive and painful to scale if our data grows by several orders of magnitude. And the best way to make this decision is to use the GCP pricing tool of course and get estimates for different storage technologies at very large amounts of data and this should help future-proof your choice from a cost perspective. You also need to consider the difficulty of increasing storage. Cloud Spanner and BigTable, in particular, are very easy to grow if necessary.

So this last consideration, more nuanced to compliance, regulatory issues, budgeting. These non-technical issues can cause major disruptions if you don't account for them, so the nice thing about GCP storage solutions, in general, is they're very flexible. For example, all of these solutions support multi-region in different ways. All of them support access control encryption, detailed logging, and auditing. If you're in a sensitive industry like healthcare and finance, it's still a good idea to do a more thorough deep dive on the documentation just for, you know, whichever solution might seem like the best fit.

So that concludes the second and final lesson in our section on GCP storage. You should now have a solid understanding of the major offerings and should be able to determine which is likely the best for your use case. Congrats on making it this far. In the last part of this section, we'll do a demo, we'll walk through planning an implementation of a storage solution in the web console. It'll be a blast. See you there.

About the Author

Jonathan Bethune is a senior technical consultant working with several companies including TopTal, BCG, and Instaclustr. He is an experienced devops specialist, data engineer, and software developer. Jonathan has spent years mastering the art of system automation with a variety of different cloud providers and tools. Before he became an engineer, Jonathan was a musician and teacher in New York City. Jonathan is based in Tokyo where he continues to work in technology and write for various publications in his free time.