Once you have implemented your application infrastructure on Google Cloud Platform, you will need to maintain it. Although you can set up Google Cloud to automate many operations tasks, you will still need to monitor, test, manage, and troubleshoot it over time to make sure your systems are running properly.
This course will walk you through these maintenance tasks and give you hands-on demonstrations of how to perform them. You can follow along with your own GCP account to try these examples yourself.
Learning Objectives
- Use the Cloud Operations suite to monitor, log, report on errors, trace, and debug
- Ensure your infrastructure can handle higher loads, failures, and cyber-attacks by performing load, resilience, and penetration tests
- Manage your data using lifecycle management and migration from outside sources
- Troubleshoot SSH errors, instance startup failures, and network traffic dropping
Intended Audience
- System administrators
- People who are preparing to take the Google Professional Cloud Architect certification exam
Prerequisites
- Google Cloud Platform: Fundamentals course or experience with Google Cloud Platform
Resources
- The GitHub repository for this course is at https://github.com/cloudacademy/managing-gcp.
Suppose you’ve been running a Cloud SQL for MySQL instance for a while, and it’s starting to run out of storage space. There are a few ways you could deal with this. If the database contains lots of old data that you can delete to free up space, that might be a good option. If the extra usage is caused by a temporary spike, and the data will be deleted soon anyway, then you might want to leave the storage capacity at its current level. You have to be careful, though, because if the instance runs out of space, it could go offline.
But you also have to be careful about increasing the instance’s storage capacity because once you increase it, the change is permanent, and you can never decrease it. So you could be paying extra money every month for space that you don’t need. In most cases, though, you’ll want to increase the storage space to avoid problems.
The first way to increase the capacity is very simple. You can just edit the instance’s configuration and change the storage capacity setting. However, if you think you’ll need to increase the storage space again in the future, then you might want to consider enabling the “Automatic storage increase” setting. Note that this option is enabled by default now, so you’d only have to turn it on if you disabled it when you created the instance.
You should also consider setting the “Automatic storage increase limit.” Since you can never decrease the storage capacity of an instance, you may want to prevent a runaway increase by setting a limit. By default, it’s set to 0, which means there’s no limit other than the maximum that’s available for the machine type of the instance.
While we’re on the subject of defaults, there’s another setting you may want to consider changing. Every few months, Google applies updates to each Cloud SQL instance. This requires a reboot that typically takes only a few minutes, but you’ll probably want to set a maintenance window for when Google will do the update. You can specify a 1-hour window on any day of the week. If you don’t set a window, then by default, Google can perform the update at any time.
Amazingly, you can change almost all of the configuration settings for a Cloud SQL instance after you’ve created it, even the machine type and the zone! The instance will go offline for a few minutes when you change the machine type or the zone, though.
The only settings you can’t change after you’ve created an instance are the instance ID, the region, and the storage type (that is, either SSD or HDD). Also, once you’ve configured a private IP address for an instance, you can’t remove it.
Another issue you might run into is called replication lag. To achieve high availability for a Cloud SQL instance, you need to create a failover replica. With this configuration, every write operation on the primary database is also made on the replica. Normally, there’s only a slight delay before the update is performed on the replica, but there are times when the replica can fall behind, and there’s a significant lag before updates are made.
Although the replica will not lose any of the updates, a significant lag can cause a failover to take longer if the primary fails. If the replication lag becomes too high, then Google’s SLA will no longer be valid, so it’s important to address this issue.
To make sure you’ll know when replication lag occurs, you can set up an alert. You just need to configure Cloud Monitoring to monitor the replica_lag and network_lag metrics.
When replication lag is caused by a temporary spike in database updates, the replica will eventually catch up, and you don’t need to take any action. If the lag is continually high, then you could try optimizing queries. If that doesn’t solve the problem, then you may need to shard your database into multiple instances to spread out the load.
And that’s it for Cloud SQL configuration.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).