In this course, we will take a virtual tour of the main offerings of Google Cloud Platform Services.
- Artificial Intelligence and Machine Learning
- Security and Operations
- Anyone who wants to learn about the main services available on Google Cloud Platform
- Basic understanding of computers, servers, and data centers
- Basic understanding of cloud principles
In this lesson, I am going to cover the main “Storage” services available on Google Cloud Platform.
Storage services are all about storing data. Different types of data require different types of services. For example, data can be divided into two main types: structured and unstructured. Structured data is composed of clearly defined data types and patterns. Think of stuff like names, dates and credit card numbers. Structured data is easy to search and usually is kept in a relational database. Unstructured data is just the opposite. It is more freeform and has no obvious organization. So think about things like music, photos and video.
When most people think of storage, they probably think of files in a filesystem. This would be an excellent example of unstructured data. If you want to read and write files to the cloud, then you can use Google Cloud Storage. Cloud Storage is perfect for serving web pages, archiving log files, or you can even use it as a data lake. A data lake is basically a repository of raw data that can be used for later analysis or machine learning. Essentially, it is just a really big collection of unorganized data.
Cloud Storage is fast, secure, durable, and has almost unlimited capacity. Because there are different types of unstructured data, Cloud Storage offers several different storage classes:
- Standard storage is best for short-lived or frequently accessed data. This class costs the most, but there is no minimum for length of storage.
- Nearline storage is best when you need to store things for a longer period of time (at least 30-days) and need to access it less frequently (eg: once per month or less). Files stored in nearline will cost less than standard.
- Coldline storage is even cheaper, but it has a longer storage minimum (90 days) and should only be used for files you need to access four times a year or less.
- Archive storage is the cheapest. This is the best choice when you need to keep files for a long time due to compliance. Archive storage must be kept for at least 365 days, and it should only be accessed less than once a year.
All four storage classes give you immediate access to your files. This is different from some other cloud providers, where the lowest cost storage can take hours to access.
Now while it stores files, Cloud Storage does not work exactly like a traditional file system. It is considered “object storage”. So there are no actual directories, and everything is kept inside “buckets”. You also cannot edit or update part of a file. You can only delete and recreate it. If you need “block level” storage access with a true hierarchical structure, you can use Filestore instead. Filestore can be used to create NFS-compatible file shares that can be mounted on your virtual machines and containers. If you just want to store and share files, Cloud Storage is perfect. But if you need lower level block access or say you need to constantly append data to the end of a file, Firestore is a much better match.
Now these options are fine if you just want to store your data in files. But what about databases? Well Google has several services for those too. If you want to store structured data inside organized tables, columns and rows (much like a traditional SQL database) then Cloud SQL could be a good choice. It is fully-managed by Google, so it is very easy to get started. And it supports the most common database engines: MySQL, PostgreSQL, and Microsoft SQL Server. So if you are already using one of those, you can easily migrate to Cloud SQL and then let Google manage the details for you. Cloud SQL is perfect for storing data you need to sort, transform, and search.
If for some reason Cloud SQL isn’t powerful enough, then Google offers something even better: Cloud Spanner. Cloud Spanner is unique because it is a relational database that’s massively scalable. You can get much higher performance than Cloud SQL, however it comes with a higher cost. Cloud Spanner is for companies that need an extremely powerful, multi-regional database that can handle a heavy amount of I/O.
Finally, if you need to store and work with structured “Big Data” (which is extremely large and complex data sets) then BigQuery might be your best bet. BigQuery is Google’s data warehouse service. Now, a data warehouse stores data from multiple sources, including data lakes and databases, and can be used to analyze huge, multidimensional datasets. Databases are optimized for transactions. But data warehouses are optimized for analytics. BigQuery can process petabytes of data and it integrates with many other services. For example, Looker (a business intelligence solution) works with BigQuery to help you explore your data and share insights in real time.
Of course, Google also supports non-relational or “NoSQL” databases as well. NoSQL databases do not use tables, columns, rows, or schemas to organize data and are particularly useful for storing unstructured data. NoSQL supports flexible data models, it scales horizontally, and it has incredibly fast queries. However, searching, sorting and joining data is much more limited.
There are three main options here, and they include:
- Firestore, which is ideal for building client-side mobile and web applications
- Firebase, which is best for syncing data between users in real-time, such as in collaboration apps
- Bigtable, which is best for running large analytical workloads
Time to summarize the main storage options:
If you need to store files for sharing, then Cloud Storage will probably be your best option.
If you need to edit and update files, then you might want to look at using Filestore.
For SQL databases you have three main choices:
Cloud SQL is great for when you need a standard MySQL, PostgreSQL, or Microsoft SQL Server
Cloud Spanner will handle larger workloads that Cloud SQL can’t
Big Query is great for creating a data warehouse for analytics
You also have three choices when it comes to noSQL databases as well:
Firestore is great for smaller applications (such as web and mobile)
Firebase is better for larger datasets
Bigtable is best for running very large analytical workloads
That should cover the main storage offerings on GCP.
Daniel began his career as a Software Engineer, focusing mostly on web and mobile development. After twenty years of dealing with insufficient training and fragmented documentation, he decided to use his extensive experience to help the next generation of engineers.
Daniel has spent his most recent years designing and running technical classes for both Amazon and Microsoft. Today at Cloud Academy, he is working on building out an extensive Google Cloud training library.
When he isn’t working or tinkering in his home lab, Daniel enjoys BBQing, target shooting, and watching classic movies.