Start course
2h 30m

Kubernetes is a production-grade container orchestration system that helps you maximize the benefits of using containers. Kubernetes provides you with a toolbox to automate deploying, scaling, and operating containerized applications in production. This course will teach you all about Kubernetes including what it is and how to use it.

This course is paired with an Introduction to Kubernetes Playground lab that you can use to follow along with the course using your own Kubernetes cluster. The lab creates a Kubernetes cluster for you to use as we perform hands-on demos in the course. All of the commands that are used in the course are included in the lab to make it easy to follow along.

Learning Objectives 

  • Describe Kubernetes and what it is used for
  • Deploy single and multiple container applications on Kubernetes
  • Use Kubernetes services to structure N-tier applications 
  • Manage application deployments with rollouts in Kubernetes
  • Ensure container preconditions are met and keep containers healthy
  • Learn how to manage configuration, sensitive, and persistent data in Kubernetes
  • Discuss popular tools and topics surrounding Kubernetes in the ecosystem

Intended Audience

This course is intended for:

  • Anyone deploying containerized applications
  • Site Reliability Engineers (SREs)
  • DevOps Engineers
  • Operations Engineers
  • Full Stack Developers


You should be familiar with:

  • Working with Docker and be comfortable using it at the command line

Source Code

The source files used in this course are available here:


August 27th, 2019 - Complete update of this course using the latest Kubernetes version and topics

May 7th, 2021 - Complete update of this course using the latest Kubernetes version and topics



We've seen deployments work their magic in the last lesson. We also saw how to scale the deployment replicas but it would be nice to not have to manually scale the deployment. That's where autoscaling comes in. Kubernetes supports CPU-based autoscaling and autoscaling based on a custom metric that you can define. We're gonna be focusing on CPU for this course.

Autoscaling works by specifying a desired target CPU percentage and a minimum and a maximum number of allowed replicas. The CPU percentage is expressed as a percentage of the CPU resource request of that Pod. Recall that Pods can set resource requests for CPU to ensure that they're scheduled on a node with at least that much CPU available. If no CPU request is set, autoscaling won't take any action.

Kubernetes will increase or decrease the number of replicas according to the average CPU usage of all of the replicas The autoscaler will also increase the number of replicas when the actual CPU usage of the current Pods exceeds the target and vice versa for decreasing the number of Pods. It will never create more replicas in the maximum nor will they decrease the number of replicas below your configuring minimum. You can configure some of the parameters of the autoscaler, but the default will work fine for us.

With the defaults, the autoscaler will compare the actual CPU usage to the target CPU usage. And either increase the replicas if the actual CPU is sufficiently higher than the target, or it will decrease the replicas if the actual CPU is sufficiently below the target. Otherwise it will keep the status quo. Autoscaling depends on metrics being collected in the cluster.

Kubernetes integrates with several solutions for collecting metrics. We're going to be using the Metrics Server which is a solution that is maintained by Kubernetes itself. There are several manifest files on the Kubernetes Metrics Server GitHub repo that declare all of the resources. We will need to get Metrics Server up and running before we can use autoscaling.

Once Metrics Server is running, autoscalers will retrieve those metrics and then make calls with the Kubernetes metrics API. The lab instance includes a Metrics Server manifest in the Metrics Server sub-directory. It's outside the scope of this course to discuss all the resources that comprise of the Metrics Server. So all we need to do is create them and we can count on metrics being collected in the cluster.

Here we can use the kubectl apply command and then specify the Metrics Server folder to create all of the resources within the Metrics Server folder. kubectl control will then create all of the manifests it finds in that directory. You can see quite a few of these resources are created. One of them is the deployment, in the Metrics Server, runs actually as a pod in the cluster, and that pod is managed by that deployment. It takes a minute or two for the first metrics to start trickling in.

Let's confirm that the Metrics Server is running by watching the pod. With kubectl top pods namespace deployments. This will list the CPU and memory uses of each pod in the namespace. You can use the top command to benchmark a pod's resource utilization, and then subsequently debug resource utilization issues. Our pods are all using a small fraction of one CPU. The m stands for milli. 1000 milli CPUs equals one CPU.

Now that we have metrics, the other thing the autoscaler depends on is having a CPU request in the deployments odd spec. Let's see how that looks in the app-tier deployment. I've highlighted the change from the previous lesson. Each pod will now request 20 milli CPU. Kubernetes will only scale the pods and each node with at least 0.02 CPU's remaining. I also set the replicas to five to keep five replicas running.

Now, if we try to create the resources, kubectl will tell us that they actually already exist. Create will check if a resource of a given type and name already exists and it will fail if it does. We could delete the deployment and then recreate it but it would be nice to avoid the downtime that is involved. Instead, Kubernetes provides a command that can apply changes to existing resources. That's what kubectl applies. So let's apply that to 6.1 now.

Apply will update our deployment and do include the CPU request. It will warn us about mixing create and apply, but we can go ahead and ignore that. I'd encourage you to take the certified Kubernetes administrator course here on CloudAcademy if you'd like to learn more about the differences between create and apply.

So we've set the request low enough that the five replicas can remain scheduled in the cluster as we can see if we get the deployments output. Five actual pods are ready matching the five pods we desired. This completes like the prerequisites for autoscaling. The autoscaler, which has the full name of HorizontalPodAutoscaler because it scales horizontally or out, it's just another resource in Kubernetes we can use a manifest to declare.

The HorizontalPodAutoscaler kind is part of the autoscaling version one API. It's spec includes a min and max to set and lower the upper bounds on running replicas. The targetCPUUtilizationPercentage field sets the target average CPU percentage across the replicas. With the target set to 70%, Kubernetes will decrease the number of replicas if the average CPU utilization is 63% or below and increase replicas if it is 77% or higher.

Lastly, the spec also includes a scale target reference, that identifies what is actually scaling. In this case, we are targeting the app-tier deployment. We've added the equivalent kubectl autoscale command to achieve the same result, but we'll stick with the manifests for everything. So let's create the autoscaler with kubectl create file 6.2. Now we can watch the deployment until the autoscaler kicks in with the watch command. Well, would you look at that, the kernel is already updated, Kubernetes does not disappoint.

We can also describe the HorizontalPodAutoscaler to see what events took place. Now, it would be painful to type out pod autoscaler many times, but fortunately kubectl accept shorthand notations for resource types. So we'll just run kubectl api dash resources for a full list of those shorthand notations. The output is sorted by the API group that appears in the third column. The lone autoscaling resource is the horizontalpodautoscalers and we can use hpa as the short name.

So let's describe it with kubectl describe deployments hpa. We can see the successful rescale events and the current metrics are all below the target. We can also get the HorizontalPodAutoscaler for a quick summary of the current state with kubectl get namespace deployments hpa. The first number in the target expresses the current average CPU utilization as a percentage of the CPU request. 

We can see that we are well below the target but we are at the minimum replicas so it won't scale any further down. Let's say we wanted to modify the minimum to two replicas. We could modify the manifest, save it, and then use the apply command or we could use the kubectl edit command which combines those three actions into one.

So let's edit the odd autoscaler. The server side version of the manifest is presented in the vai console editor. If you haven't used vai before, don't worry, I'll tell you everything we need to do. In general it's a good idea to stick with modifying our local manifest so the changes can easily be checked into a VCs, but I want you to know that the edit command is available. You'll notice that the server's manifest contains additional fields that we didn't configure. The server includes several fields automatically to help it manage resources. Type dash space one to jump the cursor down to the first occurrence of space one, which is our minReplicas field value.

Now press A to start editing the file. Then press your right arrow key to move the cursor after the one then press backspace two to change the minReplicas to two. Then press escape to stop editing followed by colon write quit or wq, to write to the file and quit to the editor. And Kubernetes will go out and automatically apply those changes to the HorizontalPodAutoscaler. Now you can watch the deployment with the Linux watch command. It'll typically happen within 15 seconds which is the default period for the HorizontalPodAutoscaler to check if it should scale.

This wraps up our tour for autoscaling Kubernetes. To recap, Kubernetes depends on metrics with being collected in a cluster before you can use autoscaling. We accomplish that by adding the Metrics Server to the cluster. You must also declare CPU request in your deployments pod template so that autoscaling can compute each pod's percentage CPU utilization. With those prerequisites taken care of, you can use the HorizontalPodAutoscaler. You configure it with a target CPU percentage and then min and max replicas.

Kubernetes will do all the heavy lifting for us, dynamically scaling our deployment based on the current state of the load. While we were doing this, we were able to also pick up the kubectl apply command, to update a resources rather than deleting and recreating it. And the edit command, which is shorthand for editing a live resource and then having it automatically applied. In the next lesson, we're gonna wrap up our coverage over deployments by discussing how to deployments help you when deploying code or configuration changes. I'll see you there.

About the Author
Learning Paths

Jonathan Lewey is a DevOps Content Creator at Cloud Academy. With experience in the Networking and Operations of the traditional Information Technology industry, he has also lead the creation of applications for corporate integrations, and served as a Cloud Engineer supporting developer teams. Jonathan has a number of specialities including: a Cisco Certified Network Associate (R&S / Sec), an AWS Developer Associate, an AWS Solutions Architect, and is certified in Project Management.