hands-on lab

Troubleshooting in Kubernetes

Advanced
1h 30m
2,055
4.6/5
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.
Lab description

Failures are inevitable when running large clusters. This Lab teaches you how to detect, diagnose, and remedy a variety of Kubernetes failures at the network, node, control-plane, and application levels. You will use tools included in Kubernetes, such as kubectl, as well as a variety of Linux operating system tools like systemctl, journalctl, ss, and openssl to build a comprehensive Kubernetes troubleshooting toolkit. In addition to reacting to failures, the Lab points out some ways that you can proactively reduce the chance of failures when working with Kubernetes.

This Lab is valuable to anyone working with Kubernetes, but the content has been prepared considering topics described in the Certified Kubernetes Administrator (CKA) Exam Curriculum. Completion of the Lab will help you get hands-on experience, which is essential for passing the CKA exam.

Warning: This lab has been outdated and split into a set of smaller labs covering the topics in an easier-to-consume format

Lab Objectives

Upon completion of this Lab, you will be able to:

  • Troubleshoot Kubernetes connection failures
  • Troubleshoot Kubernetes node failures
  • Troubleshoot Kubernetes component failures
  • Troubleshoot Kubernetes application failures

Lab Prerequisites

You should be familiar with:

  • Working with Kubernetes to deploy applications
  • Working at the command line in Linux

The following Labs can be used to fulfill the prerequisites: Deploy a Stateless Application in a Kubernetes Cluster and Deploy a Stateful Application in a Kubernetes Cluster

Updates

October 21st, 2022 - Updated the lab to use the Cloud Academy Web Terminal

October 19th, 2022 - Updated to run Kubernetes 1.24

October 19th, 2021 - Updated the lab instructions to use EC2 Instance Connect to connect to the bastion host

September 7th, 2021 - Update the Troubleshooting Kubernetes Cluster Access Issues certificate instructions and commentary

September 19th, 2020 - Updated to the latest Kubernetes version

August 28th, 2020 - Updated the SSH instructions to reflect the new EC2 user interface

May 29th, 2019 - Updated the monitoring to use Metrics Server, and increased the master node's instance size to accommodate the heftier requirements

January 10th, 2019 - Added a validation Lab Step to check the work you perform in the Lab

Environment before
Environment after
About the author
Avatar
Logan Rakai
Lead Content Developer - Labs
Students
214,407
Labs
222
Courses
9
Learning paths
56

Logan has been involved in software development and research since 2007 and has been in the cloud since 2012. He is an AWS Certified DevOps Engineer - Professional, AWS Certified Solutions Architect - Professional, Microsoft Certified Azure Solutions Architect Expert, MCSE: Cloud Platform and Infrastructure, Google Cloud Certified Associate Cloud Engineer, Certified Kubernetes Security Specialist (CKS), Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), and Certified OpenStack Administrator (COA). He earned his Ph.D. studying design automation and enjoys all things tech.

LinkedIn, Twitter, GitHub

Covered topics
Lab steps
Connecting to the K8s Cluster
Troubleshooting Kubernetes Cluster Access Issues
Troubleshooting Kubernetes Cluster Node Failures
Troubleshooting Kubernetes Cluster Component Failures
Troubleshooting Kubernetes Applications