The course is part of these learning paths
Overview of Kuberentes
Deploying Containerized Applications to Kubernetes
The Kubernetes Ecosystem
Kubernetes is a production-grade container orchestration system that helps you maximize the benefits of using containers. Kubernetes provides you with a toolbox to automate deploying, scaling, and operating containerized applications in production. This course will teach you all about Kubernetes including what it is and how to use it.
This course is paired with an Introduction to Kubernetes Playground lab that you can use to follow along with the course using your own Kubernetes cluster. The lab creates a Kubernetes cluster for you to use as we perform hands-on demos in the course. All of the commands that are used in the course are included in the lab to make it easy to follow along.
The source files used in this course are available in the course's GitHub repository.
- Describe Kubernetes and what it is used for
- Deploy single and multiple container applications on Kubernetes
- Use Kubernetes services to structure N-tier applications
- Manage application deployments with rollouts in Kubernetes
- Ensure container preconditions are met and keep containers healthy
- Learn how to manage configuration, sensitive, and persistent data in Kubernetes
- Discuss popular tools and topics surrounding Kubernetes in the ecosystem
This course is intended for:
- Anyone deploying containerized applications
- Site Reliability Engineers (SREs)
- DevOps Engineers
- Operations Engineers
- Full Stack Developers
You should be familiar with:
- Working with Docker and be comfortable using it at the command line
August 27th, 2019 - Complete update of this course using the latest Kubernetes version and topics
Metrics Server GitHub Repository: https://github.com/kubernetes-incubator/metrics-server
We’ve saw deployments work there magic in the last lesson. We also saw how to scale the deployment replicas. But it would be nice to not have to manually scale the deployment. That’s where autoscaling comes in.
Kubernetes supports CPU based autoscaling and autoscaling based on a custom metric you define. We’ll focus on using CPU for this course.
Autoscaling works by specifying a desired target CPU percentage, and a minimum and maximum number of allowed replicas. The CPU percentage is expressed as a percentage of the cpu resource request of the pod. Recall that pods can set resource requests for CPU to ensure they are scheduled on a node with at least that much CPU available. If no CPU request is set auto scaling won’t take any action. Kubernetes will increase or decrease the number of replicas according to the average CPU usage of all the replicas. The autoscaler will increase the number of replicas when the actual CPU usage of the current pods exceeds the target and vice versa for decreasing the number of pods. It will never create more replicas than the maximum you set nor will it decrease the number of replicas below your configured minimum. You can configure some of the parameters of the autoscaler but the defaults will work fine for us. With the defaults the autoscaler will compare the actual cpu usage to the target cpu and either increase the replicas if the actual cpu is sufficiently higher than the target or decrease the replicas if the actual cpu is sufficiently below the target. Otherwise it will keep the status quo.
Autoscaling depends on metrics being collected in the cluster so that the average pod CPU can be computed. Kubernetes integrates with several solutions for collecting metrics. We will use metrics server which is a solution maintained by kubernetes. There are several manifest files on the kubernetes metrics-server Github repository that declare all the required resources. We will need to get metrics server up and running before we can use autoscaling. Once metrics server is running Autoscalers can retrieve them using the Kubernetes metrics API.
The lab instance includes the metrics server manifests in the metrics-server sub directory. It’s outside the scope of this course to discuss all of the resources that comprise metrics server. All we need to do is create them and we can count on metrics being collected in the cluster. To do that we can use our trusty kubectl create command and specify the directory as the file target
kubectl create -f metrics-server/
kubectl then creates all of the manifests it find in the directory. You can see quite a few resources are created. One of them is a deployment. Metrics server runs as a pod in the cluster and that pod is managed by a deployment. It takes a minute for the first metrics to start trickling in.
You can confirm that metrics server is doing its thing by watching the
Watch kubectl top pods -n deployments
command. This lists the cpu and memory usage of each pod in the namespace. You can use the top command to benchmark a pods resource utilization and debug resource utilization issues. Our pods are all using a small fraction of one cpu. The m stands for milli. 1000 milliCPUs equals one CPU.
Now that we have metrics, the other thing that the autoscaler depends on is having a cpu request in the deployment’s pod spec. Let’s see how that looks in the app tier deployment. I’ve highlighted the change from the previous lesson.
Each pod will now request 20 millicpu.
Kubernetes will only scale the pods on nodes with at least 0.02 CPUs remaining. I also set the replicas to 5 to keep the 5 replicas running.
Now if we try to create the resources kubectl will tell us they already exist.
kubectl create -f 6.1.yaml -n deployments
Create will check if a resource of a given type and name already exists and it will fail if it does. We could delete the deployment and then create it. It would be nice to avoid the downtime that is involved with that though. Instead kubernetes provides a command that can apply changes to existing resources. kubectl apply is that command
kubectl apply -f 6.1.yaml -n deployments
Apply will update our deployment to include the cpu request. It will warn us about mixing create and apply, but we can ignore that for this course. I’d encourage you to take the Certified Kubernetes Administrator course here on cloud academy to learn more about the differences between create and apply.
We set the request low enough that the five replicas can remain scheduled in the cluster as we can see from the get deployments output
kubectl get -n deployments deployments app-tier
5 actual pods are ready matching the 5 pods we desired.
This completes the prereqs for using autoscaling. The autoscaler which has the full name of Horizontal pod autoscaler because it scales horizontally or out, is just another resource in kubernetes so we can use a manifest to declare it. The horizontal pod autoscaler kind is part of the autoscaling v1 api. Its spec includes the min and max to set lower and upper bounds on running replicas. The target CPU utilization percentage field sets the target average CPU percentage across the replicas. With the target set to 70 percent, kubernetes will decrease the number of replicas if the average CPU utilization if 63% or below and increase the replicas if it is 77% or higher, using the default target tolerance of 10% of the target. The tolerance ensures that kubernetes isn’t constantly scaling up and down around the target. Lastly the spec also includes a scale target reference that identifies what it is scaling. We are targeting the app tier deployment.
I’ve added the equivalent kubectl autoscale command to achieve that achieves the same result, but we’ll stick with manifests for everything.
Let’s create the autoscaler
kubectl create -f 6.2
Now we can watch the deployment until the autoscaler kicks in.
watch -n 1 kubectl get -n deployments deployments app-tier
Well would you look at that, the counts updated. K8s does not disappoint. Pretty slick, huh?
We can also describe the horizontal pod autoscaler to see what events took place. Now it would be painful to type out horizontal pod autoscaler many times. Fortunately kubectl accepts shorthand notations for resource types.
We can run
for a full list of shorthand notations. The output is sorted by the api group that appears in the third column. The lone autoscaling resource is the horizontal pod autocalers and we can use Hpa as the short name.
Describing the hpa
kubectl describe -n deployments hpa
We can see the successful rescale events and that the current metrics are all below the target. We can also get the hpa for a quick summary of the current state
kubectl get -n deployments hpa
The first number in the target expresses the current average cpu utilization as a percentage of the cpu request. We can see that we are well below the target but we are at the min replicas so it won’t scale any further down. Let’s say we wanted to modify the minimum to two replicas. We could modify the manifest, save it and use the apply command or we can use the kubectl edit command which combines those three actions into one.
kubectl edit -n deployments hpa
The server-side version of the manifest is presented in the vi console editor. If you haven’t used vi before, don’t worry. I will tell you everything we need to do. In general it is usually a good idea to stick with modifying our local manifest files so the changes can easily be checked into a version control system, but I want you to know that the edit command is available. You will notice that the servers manifest contains additional fields that we didn’t configure. The server includes several fields automatically to help it manage resources. Type
/ [space] 1
To jump the cursor down to the first occurence of [space] 1, which is our minReplicas field value. Press
To start editing the file, then press
Your right arrow key to move the cursor after the 1, then press Backspace 2
To change the min replicas to 2. Then press
Escape to stop editing followed by colon w q to write the file and quit the editor. Kubernetes applies the change you made.
Now you can watch the deployment
watch -n 1 kubectl get -n deployments deployments app-tier
until the autoscaler bumps the minimum number of replicas. It should happen within 15 seconds which is the default period for the hpa to check if it should scale.
This wraps up our tour of autoscaling in kubernetes.
To recap our adventure with autoscaling
Kubernetes depends on metrics being collected in the cluster before you can use autoscaling. We accomplished that by adding metrics server to the cluster.
You must also declare a cpu request in your deployments pod template so that autoscaling can compute each pods percentage cpu utilization.
With those prerequisites taken care of you can use the horizontal pod autoscaler or hpa. You configure it with a target cpu percentage and min and max replicas. Once it is created kubernetes does all the heavy lifting to dynamically scale your deployment based on the current state of the load.
While we were doing this we also picked up the kubectl
Apply command to update resources rather than deleting and creating
And the Edit command which is a short form of editing then applying.
In the next lesson will cover wrap up our coverage of deployments by discussing how deployments help you when deploying code or configuration changes. See you there.
About the Author
Logan has been involved in software development and research since 2007 and has been in the cloud since 2012. He is an AWS Certified DevOps Engineer - Professional, AWS Certified Solutions Architect - Professional, Microsoft Certified Azure Solutions Architect Expert, MCSE: Cloud Platform and Infrastructure, Google Cloud Certified Associate Cloud Engineer, Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), Linux Foundation Certified System Administrator (LFCS), and Certified OpenStack Administrator (COA). He earned his Ph.D. studying design automation and enjoys all things tech.