The course is part of these learning paths
Once you have implemented your application infrastructure on Google Cloud Platform, you will need to maintain it. Although you can set up Google Cloud to automate many operations tasks, you will still need to monitor, test, manage, and troubleshoot it over time to make sure your systems are running properly.
This course will walk you through these maintenance tasks and give you hands-on demonstrations of how to perform them. You can follow along with your own GCP account to try these examples yourself.
- Use Stackdriver to monitor, log, report on errors, trace, and debug
- Ensure your infrastructure can handle higher loads, failures, and cyber-attacks by performing load, resilience, and penetration tests
- Manage your data using lifecycle management and migration from outside sources
- Troubleshoot SSH errors, instance startup failures, and network traffic dropping
- System administrators
- People who are preparing to take the Google Professional Cloud Architect certification exam
- Google Cloud Platform: Fundamentals course or experience with Google Cloud Platform
After you've implemented your infrastructure in Google Cloud platform, the first thing you'll want to do is set up a monitoring system that will alert you when there are major problems. The easiest way to do this is to use Stackdriver, Google's powerful monitoring, logging and debugging tool.
Before you can use Stackdriver, you need to create an account. First make sure that the project you want to monitor is selected. Then select Monitoring from the console menu. It brings up a separate page and tells you that you don't have an account. Choose the option to create a Stackdriver account. It might look a bit different from this when you do it, but regardless of how it looks, the first two steps will be to create a Stackdriver account and associate it with a project, which is what I'm doing here.
One of the great things about Stackdriver is that it works with AWS too, so if you use AWS then you can follow these steps to add an AWS account to monitor.
Here are the instructions for installing the Stackdriver monitoring agent and the logging agent. You don't need to install the agent to be able to use Stackdriver, but if you install it you can get more information about an instance, such as about the third-party software we're running on the instance. I'll just skip that and hit Continue.
This is where you can tell Stackdriver to send you performance reports. I'll select weekly reports.
Now click Launch Monitoring. Then just click the Upgrade button. That's it.
Suppose you want Stackdriver to monitor a web server and let you know if it goes down.
Go to Alerting and select Uptime Checks. Click the Add Uptime Check button.
For the title, let's call it Example. Since we want to check if a web server is up, leave the check type as HTTP and the resource type as URL.
For the hostname, I'm going to put in the IP address of an instance I have that's running a web server. Now change Check Every to one minute so it's easier to test.
Now click the Test button. Since the web server at that address is up it came back right away.
Now I'm going to stop the web server and test it again. This time the connection failed as expected. Click the Save button.
Finally, we'll create an alerting policy. You can be alerted by email, text message or a variety of other options, such as Slack. Click Create Alerting Policy. We'll get it to send an email when the web server is down. Enter your email address under Notifications.
This is an interesting option. You can write some instructions that will be sent with the notification, so that if someone else is responsible for resolving infrastructure issues then they'll know what to do. I'll call this policy Example. Now click Save Policy.
It'll take a while after you create an uptime policy before Stackdriver runs the first uptime check. So, don't worry if you don't see anything on the dashboard right away. I'll skip ahead to when the uptime check has run. Okay, now you can see it's showing that the web server is down. After a little while it will send a notification email. Here's what it looks like.
Now we'll start Apache up again and see if the policy sees it. I'll just skip ahead a few minutes. Yes, it sees that the web server is up now.
By the way, if you want to use Stackdriver to monitor Amazon EC2, then you'll also need to allow HTTP connections and add credentials to your instance.
If you want to see data graphically, then click on Dashboards and then Create Dashboard. You could just leave the name as Untitled Dashboard, but I'll call it Example Dashboard.
Now click the Add Chart button. Change the resource type to URL Uptime Check. When you click on metric type it will give you only one option, Endpoint Latency, so select that. Click Save and the graph will be added to your dashboard.
This graph shows the network latency between each region and the web server. Unfortunately, it doesn't show us when the web server was down, but that's not the intention of the graph. This network latency data from six different locations around the world can be quite helpful in other ways though, especially if some of your users are reporting slow performance.
Note that you'll need to refresh this page to see the latest data.
Suppose you'd like to get more information about the instance where the web server is running, such as the CPU load. To get this information you need to install the Stackdriver agent. First, install Stackdriver.
Use the curl command to download the install script for the Stackdriver monitoring agent. Now run the install script.
While we're here let's install the logging agent too. That will prepare this instance for the next lesson when we use Stackdriver logging. First, download the install script. Then run it.
Now let's go back and create another chart. This time when we click on metric type there a ton of metrics to choose from, including CPU, memory, disk, network and more. Let's choose CPU Load Average past one minute. Now you can see what the VM's load average was over time.
If you've been following along using your own Stackdriver account then you should go back and delete the monitoring you set up. First, go to Policies Overview in the Alerting menu, and delete your Uptime Check Policy. Then go to Uptime Checks and delete the one you created. Finally, go to Dashboards and delete the dashboard you created.
That's it for this lesson.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).