1. Home
  2. Training Library
  3. DevOps
  4. Courses
  5. Introduction to Kubernetes

Volumes

Contents

keyboard_tab
Course Introduction
1
Introduction
PREVIEW4m 6s
Deploying Containerized Applications to Kubernetes
6
Pods
11m 34s
7
Services
5m 10s
10
11
13
Probes
8m 26s
15
Volumes
11m 42s
The Kubernetes Ecosystem
Course Conclusion
18

The course is part of these learning paths

Certified Kubernetes Administrator (CKA) Exam Preparation
course-steps 4 certification 2 lab-steps 6
Introduction to Kubernetes
course-steps 1 certification 1 lab-steps 3 description 1
play-arrow
Start course
Overview
DifficultyBeginner
Duration1h 58m
Students3780
Ratings
4.4/5
star star star star star-half

Description

Kubernetes is a production-grade container orchestration system that helps you maximize the benefits of using containers. Kubernetes provides you with a toolbox to automate deploying, scaling, and operating containerized applications in production. This course will teach you all about Kubernetes including what it is and how to use it.

This course is paired with an Introduction to Kubernetes Playground lab that you can use to follow along with the course using your own Kubernetes cluster. The lab creates a Kubernetes cluster for you to use as we perform hands-on demos in the course. All of the commands that are used in the course are included in the lab to make it easy to follow along.

The source files used in this course are available in the course's GitHub repository.

Learning Objectives 

  • Describe Kubernetes and what it is used for
  • Deploy single and multiple container applications on Kubernetes
  • Use Kubernetes services to structure N-tier applications 
  • Manage application deployments with rollouts in Kubernetes
  • Ensure container preconditions are met and keep containers healthy
  • Learn how to manage configuration, sensitive, and persistent data in Kubernetes
  • Discuss popular tools and topics surrounding Kubernetes in the ecosystem

Intended Audience

This course is intended for:

  • Anyone deploying containerized applications
  • Site Reliability Engineers (SREs)
  • DevOps Engineers
  • Operations Engineers
  • Full Stack Developers

Prerequisites

You should be familiar with:

  • Working with Docker and be comfortable using it at the command line

Updates

August 27th, 2019 - Complete update of this course using the latest Kubernetes version and topics

 

Transcript

Containers in a pod share the same network stack but each has its own file system. It can be useful to share data between containers, for example having an initcontainer prepare some files that the main container depends on. The file system of containers also are limited to the lifetime of the container. This can present undesirable side effects. For example if the data tier container we are using in our examples crashes or fails a liveness probe it will be restarted and all of the data it had been storing will be lost forever. 

 

This lesson covers different ways Kubernetes handles non-ephemeral data, separating data from containers, Kubernetes Volumes, and Kubernetes PersistentVolumes. Our goal for this lesson is to deploy the data tier from our sample application using persistent volumes so the data can outlive the data tier pod. Again, this lesson builds on the code from the previous lessons. Let’s first discuss more about the options for storing persistent data and then apply them to our data tier. 

 

Kubernetes includes two different data storage types. Both are used by mounting a directory in a container and can be shared by containers in the same pod. Pods can also use more than one volume and persistent volume. Their differences are mainly in how their lifetime is managed. One type exists for the lifetime of a particular pod, and the other is independent from the lifetime of pods. 

 

Volumes are tied to a pod and their lifecycle. Volumes are used to share data between containers in a pod and to tolerate container restarts. Although you can configure volumes to use durable storage types that survive pod deletion, you should consider using volumes for non-durable storage that is deleted when the pod is deleted. The default type of volume, is called empty dir, and it creates an initially empty directory on the node running the pod to back the storage used by the volume. Any data written to the directory remains if a container in the pod is restarted. Once the pod is deleted the data in the volume is permanently deleted. It’s worth noting that since the data is stored on a specific node, if a pod is rescheduled to a different node, the data will be lost. If the data is too valuable to lose when a pod is deleted or rescheduled, you should consider using persistent volumes.

 

PersistentVolumes are independent from the lifetime of pods and is separately managed by Kubernetes. They work a little bit differently volumes. Pods may claim a persistent volume, and use it throughout their lifetime. PersistentVolumes will continue to exist outside of their pods. Persistent volumes can even be mounted by multiple pods on different nodes if the underlying storage supports multiple readers or writers. Persistent volumes can be provisioned statically in advance by a cluster admin or dynamically for more flexible self-serve use cases.

 

Pods must make a request for storage before they can use a persistent volume. The request is made using a persistent volume claim or pvc. A PVC declares how much storage the pod needs, the type of persistent volume, and the access mode. The access mode describes how the persistent volume is mounted whether it is read only or read write and if it can be mounted by one node or many. There are three supported access modes to choose from: read-write once, read-only many, or read-write many. If there isn’t a persistent volume available to satisfy the claim and dynamic provisioning isn’t enabled, the claim will stay in a pending state until such a persistent volume is available.

 

The persistent volume claim is connected to a Pod by using a regular volume with the type set to persistent volume claim.

 

Both volumes and PersistentVolumes may be backed by a wide variety of volume types. As we learned before, it is usually preferable to use persistent volumes for more durable types and volumes for more ephemeral storage needs. Durable volume types include the persistent disks of many cloud vendors such as Google Cloud engine persistent disks, Azure Disks, and Amazon elastic block store. There’s also support for more generic volume types such as network file system or NFS, and iSCSI. 

 

That is quite a lot to take in but everything should solidify with an example. Our objective is to use a PersistentVolume for the sample applications data tier since we want the data to outlive its pod. In our example the cluster has an Amazon elastic block store volume statically provisioned and ready for us to use. To see dynamic provisioning in action I’d encourage you to complete the Lab on Cloud Academy entitled Deploy a Stateful Application in a Kubernetes Cluster. 

 

Before we get into volumes I want to cement the issue we are trying to solve. We can illustrate the issue of pod containers losing their data when they restart by forcing a restart of the data tier pod. First of all, let's look at the counter that has been running since our deployments lesson

kubectl -n deployments logs support-tier-... poller --tail 1

That is the value of our counter at the moment and every second it will keep increasing. Now if I force the pod to be restarted we can observe the impact on the counter. One way to do that is to kill the redis process which will cause the data tier container to exit and the data tier pod will automatically restart it. We can use the exec command allows us to run a command inside of a container, the same way docker exec does. Let’s open a bash shell inside the container

kubectl exec -n deployments data-tier-... -it /bin/bash

The change of command prompt tells us we are in the container now. We can now use the kill command to stop the main process of the container. But what is the ID of the process? The ID of the main process, which is redis in this case, will always be one since it is the first process that runs in the container.

kill 1

We can see the command prompt changed back since the container terminated so our shell was also terminated.

Now we can get the pods to show that the data tier has restarted 

kubectl -n deployments get pods

The output tells us that yes there has been a restart of the pod. Now we can look at the counter value through the poller logs again

kubectl -n deployments logs support-tier-... poller --tail 1

And see that it is much lower than before because the data tier was completely wiped out when the pod restarted. This is what we want to avoid.

 

Let’s start by creating a new volumes namespace

kubectl create -f 9.1.yaml

Now, on to the data tier. There are three additions to the manifest, a persistent volume, a persistent volume claim, and a volume to connect the claim to the pod. 

 

First, is the PersistentVolume. It is the raw storage where data is ultimately written to by the pod’s container. It has a declared storage capacity and other attributes. Here, we've allocated 1 gibibyte. The access mode of ReadWriteOnce means this volume may be mounted for reading and writing by a single node at a time. Note that it is a limit on node attachment and not pod attachment. PersistentVolumes may list multiple access modes and the claim specifies the mode it requires. The persistent volume can only be claimed in a single accessmode at any time. Lastly we have an awsElasticBlockStore mapping which is specific to the type of storage backing the PV. You would use a different mapping if you were not using an EBS volumefor storage. The only required key for aws elastic block store is the volume ID which uniquely identifies the EBS volume. It will be different in your environment than mine so I’ve added an insert volume id placeholder that we will replace before we create the PV.

 

Next we have the persistent volume claim. The PVC spec outlines what it is looking for in a PV. For a PV to be bound to a PVC, it must satisfy all of the constraints in the claim. We are looking for a PV that provides the read-write once access mode and has at least 128 mebibytes of storage. The claim request is less than or equal to the persistent volumes capacity and the access mode overlaps with the available access modes in the PV. This means the PVC request is satisfied by our PV and will be bound to it. 

 

Lastly, the deployments template now includes a volume which links the PVC to the deployment’s pod. This is accomplished by using the persistentvolume claim mapping and setting the claim name to the name of the pvc which is data tier volume claim. You will always use persistent volume claim when working with PVs. If you wanted to use an ephemeral storage volume you would replace it with an emptyDir mapping or other types that don’t connect to a PV. Volume can be used in the pod’s containers and init containers but they must be mounted to be available in the containers. The volume mounts list includes all the volume mounts for a given container. The mountPaths for different containers can be different even if the volume is the same. In our case we only have one and we are mounting the volume at /data which is where redis is configured to store its data. This will cause all of the data to be written to the PV.

 

Now we are left with replacing the volume ID placeholder with the actual ID of the Amazon EBS volume the lab environment created for us. You could get it from the EC2 console in your browser but we’ll use the AWS CLI for this example. The volume can be obtained from the aws ec2 describe command.

aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text

The full command is available to copy in the Introduction to Kubernetes Playground Lab. The filter selects only the PV volume which is labeled with a Type = PV tag and the query outputs only the volume ID property of the volume. The introduction to the AWS CLI Lab on Cloud academy explains more about the AWS CLI if you are interested. We only need it to get the volume ID in this course. I’ll store the id in a variable named vol_id

vol_id=$(aws ec2 describe-volumes --region=us-west-2 --filters="Name=tag:Type,Values=PV" --query="Volumes[0].VolumeId" --output=text)

Then we can use stream editor or sed to substitute the the occurrence of INSERT_VOLUME_ID with the volume ID stored in vol_id

sed -i "s/INSERT_VOLUME_ID/$vol_id/" 9.2.yaml 

 

And with that we are ready to create the data tier using a persistent volume. We’ll also create the app and support tiers which don’t have anything new compared to previous versions

kubectl create -n volumes -f 9.2.yaml -f 9.3.yaml -f 9.4.yaml 

 

Let’s get the persistent volume claim which has the short name of pvc in kubectl to confirm the claim’s request is satisfied by the PV

kubectl describe -n volumes pvc

The status of bound confirms that the PVC is bound to the PV.

 

Now if we describe the data tier pod 

kubectl describe -n volumes pod data-tier-...

We can see the pod initially failed to schedule because the claim needs to wait awhile before it is bound to the PV. Once it is bound the pod is scheduled and we can see the Successful attach volume event.

 

Not only can our new design tolerate a data tier pod container restart, but the data will persist even if we delete the entire data tier deployment which will delete the data tier pod and prevent any new pods from being created. If everything goes to plan we should be able to recover the redis data if we then replace the deployment. That is because the deployment template is configured to use the same PVC and the PVC is still bound to the PV storing the original redis data. Let’s verify all of this.

 

Before we delete the data tier deployment lets get the last log line from the poller to see where our counter is at

kubectl logs -n volumes support-tier-... poller  --tail 1

If we delete the deployment and then replace it we should see a number higher than this if the data is persisted. Let’s do that.

Delete the data tier deployment

kubectl delete -n volumes deployments data-tier 

And confirm there are no data tier pods left running

kubectl get -n volumes pods 

 

Now recreate the data tier deployment

kubectl create -f 9.2.yaml -n volumes

Create tells us everything except the deployment already exists and only the deployment was created. Now it takes a couple minutes for all of the readiness checks to start passing again and for some old connections to time out. This is mainly a side effect of the example application not being particularly good at handling this situation and not because of delays intrinsic to Kuberentes. The fact that Kubernetes can self heal the application is a testament to kubernetes abilities.

After a minute or two we can get the poller’s last log

kubectl logs -n volumes support-tier-... poller  --tail 1

And voila, the counter has kept on ticking upward from where we left off before deleting the deployment. Our persistent volume has lived up to its name.

 

This concludes our lesson on volumes. We've covered volumes, PersistentVolumes, and PersistentVolumeClaims. In our example We’ve shown how to use a persistent volume to avoid data loss by keeping the data independent from the lifecycle of the pod or the pod’s volume. We also saw how kubectl exec allows us to run commands in existing containers when we demonstrated how container restarts cause data loss when volumes aren’t used. You now how a solid foundation for volumes and persistent volumes. 

The next lesson covers two other useful Kubernetes features you should keep in your toolbox. They also have a nice tie in with volumes. Just a few more lessons to go. Keep it up, and I'll catch you in the next one.

About the Author

Students38476
Labs101
Courses11
Learning paths9

Logan has been involved in software development and research since 2007 and has been in the cloud since 2012. He is an AWS Certified DevOps Engineer - Professional, AWS Certified Solutions Architect - Professional, Microsoft Certified Azure Solutions Architect Expert, MCSE: Cloud Platform and Infrastructure, Google Cloud Certified Associate Cloud Engineer, Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), Linux Foundation Certified System Administrator (LFCS), and Certified OpenStack Administrator (COA). He earned his Ph.D. studying design automation and enjoys all things tech.

Covered Topics