1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Introduction to Google Kubernetes Engine (GKE)



6m 30s
6m 59s
4m 11s
Start course

Kubernetes has become one of the most common container orchestration platforms. It has regular releases, a wide range of features, and is highly extensible. Managing a Kubernetes cluster requires a lot of domain knowledge, which is why services such as GKE exist. Certain aspects of a Kubernetes cluster vary based on the underlying implementation.

In this course, we’ll explore some of the ways that GKE implements a Kubernetes cluster. Having a basic understanding of how things are implemented will set the stage for further learning.

Learning Objectives

  • Learn how Google implements a Kubernetes cluster
  • Learn how GKE implements networking
  • Learn how GKE implements logging and monitoring
  • Learn how to scale both nodes and pods

Intended Audience

  • Engineers looking to understand basic GKE functionality


To get the most out of this course, you should have a general knowledge of GCP, Kubernetes, Docker, and high availability.


Hello and welcome. In this lesson we're going to talk about the different scalability options for GKE. By the end of this lesson, you'll be able to describe both how cluster and pod level auto-scaling functions.

A Kubernetes cluster consists of nodes and nodes have finite resources, so imagine we have a GKE cluster with two nodes with two CPUs each, and then we deploy our workloads to the cluster and we're consuming all of the available CPU, We're now limited because we don't have enough resources for the additional workloads without adding more nodes.

GKE provides a mechanism for dynamically adding and removing nodes to a pool which it calls the cluster autoscaler. The cluster autoscaler is going to add and remove nodes as needed based on the resource requests that we have for our pods.

If we deploy our workload and the pods are unscheduled due to lack of resources, the cluster autoscaler can add nodes and these nodes will remain as long as they're needed. When it's time to remove the nodes, the autoscaler starts to perform connection draining and that allows a window of ten minutes for the connections to complete and after that 10-minute period, the node gets forcibly removed.

The cluster autoscaler requires us to set a minimum and maximum number of nodes per zone. That's all we need to provide; the rest is automatic.

Because these minimum and maximum values represent the number of nodes per zone, you need to make sure that you consider the cluster's availability when configuring the cluster auto-scaling. Here's what I mean. If you specify a minimum of three nodes and you're running in three zones, you're going to have a minimum of nine nodes, so make sure you consider your availability when it comes time to auto-scaling.

The cluster autoscaler can support up to a thousand notes, assuming a max of 30 pods on each node. Also enabling and disabling autoscaling can cause the cluster master to reboot which could temporarily leave the master unavailable.

So the cluster autoscaler works by monitoring resource requests. When there aren't enough nodes in the pool to meet the demand the autoscaler can add new nodes. Once the cluster can manage the workload with fewer nodes, the autoscaler is going to start to drain connections to the nodes and then terminate them.

Conceptually, scaling workloads is similar except we're scaling pods rather than nodes. There are two types of pod autoscaler which are horizontal and vertical.

When we use Kubernetes controllers such as deployment, we get to specify the number of pod replicas that we want to deploy. If we want to increase or decrease that number all we have to do is change it and deploy the change, though that requires human intervention.

Kubernetes provides a built-in resource for horizontal pod autoscaling. The horizontal pod autoscaling can manage an existing set of pods by monitoring resources and adjusting the replica count as needed. 

Kubernetes also provides a built-in resource for vertical pod autoscaling. Vertical scaling allows the autoscaler to either recommend or manage CPU and memory limits for a pod. When creating a vertical pod autoscaler setting the update mode to auto is going to allow the autoscaler to adjust the resource requirements of a pod. Since a running pod can't change its own requirements, the autoscaler removes the existing pods and creates new ones. So to avoid experiencing an excessive amount of pod restarts, we can set a disruption budget. All right, let's stop there and summarize what we've covered so far. Cluster auto scaling allows us to adjust the size of our node pools based on our resource requirements. We can also scale at the pod level and there are two types of pod autoscaler which are horizontal and vertical. The horizontal pod autoscaling allows us to add and remove pods as needed based on our resource needs and the vertical part autoscaler allows us to redefine the needs for our pod. And the way it does that is by either recommending the CPU and memory limits that we should be using for our pod or by automatically adjusting that for us.

All right that's going to wrap up this lesson. In the next lesson, we're actually going to head into the console, do some exploration, so if you're ready to keep learning then I'll see you in the next lesson.

About the Author
Learning Paths

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.