Deploying Applications and Services on Compute Engine
Start course


NOTICE: This course is outdated and has been deprecated


Modern software systems have become increasingly complex. Cloud platforms have helped tame some of the complexity by providing both managed and unmanaged services. So it’s no surprise that companies have shifted workloads to cloud platforms. As cloud platforms continue to grow, knowing when and how to use these services is important for developers. 

This course is intended to help prepare individuals seeking to pass the Google Cloud Professional Cloud Developer Certification Exam. The Cloud Developer Certification requires a working knowledge of building cloud-native systems on GCP. That covers a wide variety of topics, from designing distributed systems to debugging apps with Stackdriver. 

This course focuses on the third section of the exam overview, more specifically the first five points, which cover deploying applications using GCP compute services.

Learning Objectives

  • Implement appropriate deployment strategies based on the target compute environment
  • Deploy applications and services on Compute Engine and Google Kubernetes Engine
  • Deploy an application to App Engine
  • Deploy a Cloud Function

Intended Audience

  • IT professionals who want to become cloud-native developers
  • IT professionals preparing for Google’s Professional Cloud Developer Exam


  • Software development experience
  • Docker experience
  • Kubernetes experience
  • GCP experience



Hello and welcome! In this lesson we'll be talking about Compute Engine. Specifically, we'll be covering the following learning objectives from the exam guide. We'll talk about launching a compute engine instance using the console and SDK, we'll talk about moving a persistent disk to a different virtual machine, we'll cover creating auto-scaled managed instance groups using templates, we'll talk about generating and uploading custom SSH keys, we'll talk about configuring a virtual machine for Stackdriver monitoring and logging, we'll cover creating an instance with a startup script that installs software, we'll talk about custom metadata tags, and finally, using load balancers for Compute Engine instances. Now, there's a lot here to cover, so let's dive in.

The gcloud compute instances create command will create an instance. Be familiar with it before taking the exam. If you're using Compute Engine in production, you'll probably use this for different sorts of development and test instances, so you'll probably be familiar already. For actual production apps, it's likely that you'll want to use a higher-level abstraction to manage instance creation, and that includes things such as managed instance groups, auto scalers, health monitors, etc. The reason, of course, being you want to delegate instance management tasks as much as makes sense for your use case with a goal of shifting that resource management, such as instance creation and termination, to something else that's managed. Compute Engine instances are created based on OS images, and these OS images can be public, private, created by someone else and shared with you, or created based off of a snapshot of an existing instance. The selected OS image is used for the boot disk. Now, instances can also have additional non-boot disks that are attached, which can be based on existing images or blank. These additional persistent disks are available in different disk types. There are both zone- and region-based hard disks and solid-state drives, there's also local SSD options that are physically attached to the server running the virtual machine instance. Disks can be added and removed as needed, they can be moved between instances, and they can also be shared in read-only mode with multiple instances. The gcloud compute instances create command allows disks to be created and attached in-line using the --create-disk flag. If an instance is already created, then a disk can be created and added with two separate operations using the gcloud compute disks create command, with the argument supplied for size, type, and block size, will create a new persistent disk, and then you can use a gcloud compute instances attach-disk command to attach the disk to an existing instance. If the disk you attach to is based on an existing disk then it's going to be formatted however it was previously formatted. If it's an empty disk that requires us to format the disk for ourselves, and the way that we do that differs between Windows and Linux as well as the format that we want to use. 

All right, let's pivot to metadata. Metadata are key-value pairs that are programmatically accessible by code running inside of an instance. Now, there are two types of metadata: there's project-level and instance-level metadata. Project-level metadata is accessible to all instances in a project, instance-level is limited to the instance. The metadata server stores default system information, such as the project ID, SSH keys, instance names, startup and shutdown scripts, etc. You can set your own metadata values for a project or instance with the add-metadata subcommand or with the metadata flag when creating an instance. Metadata is accessible from a URL that is only available from inside the instance or globally with the gcloud command. Data, such as SSH keys and startup scripts, are stored as metadata. Startup and shutdown scripts allow instances to run a script at startup and shutdown, respectively, and they're just metadata with special key names that is reserved by Google. The names of the keys are different between Linux and Windows, where Windows provides more granularity over script execution. Metadata has some size limits, both for single keys and across all keys. So, when dealing with large scripts or making versioning easier, oftentimes it's common to have the scripts downloaded and executed by the actual startup script. So this way you shift that script to be a remote source that the startup script just downloads and runs. SSH also uses metadata, so that allows for both project-wide and instance-level SSH keys, including the ability to override project-level keys for specific instances. The simple way to connect to a Linux instance is to allow Google to manage the SSH keys for us, and we can do that with the gcloud compute ssh command, which uses the OS Logon behind the scenes to map an authenticated gcloud user to an auto-generated and uploaded SSH key. Now, for teams using gcloud, this option is low-effort because they don't have to take care of the key management. Though there are some use cases, imagine needing to provide external users access, where you may want to manage those SSH keys for yourself. For this, you'll need to obtain the user's public key, you'll need to format it in the required format, and then you can upload it as a project or instance-level key. 

Windows uses RDP for remote connectivity, which requires username and passwords. One of the common tasks to include in startup scripts is code that will install Stackdriver monitoring and logging agents. Now, these are two separate agents and can be installed independently or not at all. Both of the agents provide a level of editable configuration should that use case arise where its default functionality just isn't enough and when installing on instances without internet access and HTTP proxy can be used. The logging agent will pick up known logs. There's a lot of different logs for different applications that it already knows how to grab. But in order to use Stackdriver logging in our own applications, we need to use the language-specific library, and each library is installed using the established dependency management system for the language and its runtime. 

For workloads that require multiple instances to be grouped together logically, there are instance groups, and there are two types of instance groups: managed and unmanaged. Unmanaged are unmanaged in the sense that Google doesn't have to manage them, however, we developers do. Unmanaged instance groups require us to manually add the amount of instances that we want, and we have to select the specific instances. And because of that, we can't auto scale with unmanaged groups because it wouldn't know how to add any instances if we have to pick them manually. So managed instance groups are based on templates, which means that the instance can be created and re-created as needed dynamically, and that in turn supports autoscaling and health monitoring. Templates define the parameters used when creating an instance. And when autoscaling is disabled, an instance group requires the number of instances that it should run. When autoscaling is enabled, we can specify the minimum and maximum number of instances along with some additional scaling parameters. Managed instance groups can be scaled based off of different metrics, they can also auto-heal instances that are unresponsive by removing and recreating them, and again, with different options for determining what responsive means in our use case. With a managed instance group created, we have a group of identical instances, and oftentimes, that requires a load balancer to distribute traffic between them. 

There are six different types of Google Cloud load balancers: some are global, some are regional, some are layer 4, some are layer 7, some perform TLS termination and some pass it off to the backend instances, some are internal, some are external. The decisions that go into knowing which one of these to use in a given use case is beyond the scope of this lesson. However, Google does have some good documentation on it that I think is worth the read, so I'll share that link ( The console showcases the different load balancing options pretty well since it walks through the different options, so let's use that as a reference. On the initial creation screen, it asks the networking layer for our traffic. TCP and UDP being layer 4 means that these are not application load balancers. So if you're using TCP for web traffic, you're gonna lose the layer 7 data, such as HTTP headers. So this is where HTTP load balancing does the trick as a layer 7 load balancer. Now, after selecting an initial option, it further checks our use case. In the case of the HTTP load balancer, we're asked if this is going to be used internally to distribute traffic between our services or if it's going to be publicly accessible from the internet. TCP load balancers can also be internal or external. For external TCP traffic, we can use multi-region, however, internal traffic is regional only. UDP load balancing can also be internal or external. Load balancers have front ends and back ends. The front end basically is the IP address and port, and that becomes the entry point for our traffic. Back ends define which instances it needs to distribute traffic to and how to determine when that group of servers is at capacity. Each load balancer has options for how to specify the instances that are used. However, all of them support instance groups. Now, since managed instance groups support autoscaling and instance auto-healing, load balancing managed instance groups provides access to the most functionality. 

Let's do a demo of some of these learning objectives. Let's create a virtual machine. We're going to add a disk after it's created, and then move that disk to another instance. We're going to use a micro-instance, and otherwise, we're going to just leave all of the defaults. If we had instance-level metadata, we could set it here. If the instance was part of a cluster, we might use preemptable instances to save cost, and we can leave the default host maintenance setting here to migrate, allowing Compute Engine to move our instance to another host if the host is undergoing maintenance, or we could allow the instance to be terminated. With this created, let's connect in using SSH. If you recall, OS Logon manages SSH keys on our behalf, which we can use to connect through the browser or through the command-line interface. This command lists off the connected disks, and if you notice here, we have just this one. This one here is our boot disk, so what we want to do now is edit this instance and add a persistent disk. This is a standard hard disk, and it's blank, which means we need to format it. This is an unformatted disk. It's going to be in read/write mode, so knowing that, we know that we cannot attach this to more than one instance at a time because it's in write also, and we have deletion rules here to determine if the disk should be kept after the instance is deleted. I'm going to use 10 gigs here, and the Google-managed crypto will keep the block size at 4K, and the default device name is fine. So, saving this is going to update the instance and attach this disk, allowing us to interact with it from inside of the operating system. Now, listing the devices again shows the new disk as SDB. Blank disks need to be formatted, and again, that depends on your use case which type you use. Now, with this done, we can mount this device to a location of our choosing, and I've saved a shell script and a text file to disk. Now, this disk was set to keep after an instance is deleted, so after it's deleted the disk will remain, and creating a new instance and selecting our newly created disk shows that we have our additional disk here to use with another instance. Now, once again, we need to mount this disk inside the OS so that we can actually use it, and we can do that as needed in that ad hoc style that we just did or on boot up. Now, once we've mounted the disk we can see that our text file exists as does the shell script. 

All right, let's pivot and create a load-balanced, auto-scaled, managed instance group. First up, managed instance groups need a template. This demo is just going to use a default installation of nginx that is installed with the startup script, and this is useful for bootstrapping our applications. Now, with the template created, we can create an instance group. The group can run instances in one or more zones, the auto scaler adds and removes instances based on our criteria, the health check allows us to determine if an instance is able to serve traffic, failed health checks result in unhealthy instances being replaced by creating a new instance based on our template. Creating this group will start up instances if the instance has a public IP address, which is something we can specify. Then we can see that nginx is installed, and this happened through our startup script. Now, to distribute traffic from these instances to the internet we can use an external HTTP load balancer. The backend services need to specify the backend to use. HTTP load balancing supports multiple backend services, while other types only allow one. We'll use the managed instance groups. The backend needs to specify the balancing mode. The available options are the number of connections, CPU utilization, or request rate. Capacity is a modifier for the balancing mode. Now, when it says 100%, the values specified here are considered to be capacity. If you were to leave these values here set, and then we changed the capacity to 50%, it would tell Compute Engine that we want to run at 50% of the values defined here. Adding a health check allows the load balancer to identify unhealthy instances and some additional features: session affinity, should you want to have traffic directed to the same instance for multiple requests, and we can override connection draining, which is the time period where an instance is allowed to wrap up existing connections before removing the instance. The front end becomes the entry point. It's the IP and port. For HTTP, the port is limited to 80 or 8080, and we can use ephemeral or static IP addresses. HTTP load balancers allow for host and path-based routing, which is doable because this is a layer 7 balancer, and creating a load balancer takes a little while to get up and running. So, after a few moments, the 404 errors will go away, and the load balancer's IP address will start responding to traffic from the instance group. 

Now, one of the built-in ways to perform application deployments on Compute Engine is to use the instance group's rolling update. A rolling update requires a new instance template, and because we're using a new template, we have the option for different types of deployment, such as canary deployments. Now, why do you need to select a new template? Well, that's because templates are immutable. You can create a new template based on an existing copy of a template, but you have to have a new template to replace an old one. Once you have multiple templates, once you have the template you're starting with and the template that you wanna move to, you can roll them out with the specified deployment parameters, allowing you to control how these actually get deployed and minimizing any sort of downtime. All right, this has been a dense lesson, I know. So there's a lot to cover for services, such as Compute Engine, when taking these sorts of exams because they're used for so many different things. They're so generalized that you can use these for so many workloads. It makes for a lot of different aspects of the service to cover. Focus specifically on things like managed instance groups and actually using load balancers, autoscaling, these sorts of things that have the most support of all the feature sets because that's kind of where Google expects that you'll be using this most of the time. All right, I hope this has been helpful. Thank you so much for watching, and I will see you in another lesson.

About the Author
Learning Paths

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.