image
Volumes
Start course
Difficulty
Beginner
Duration
2h 30m
Students
25289
Ratings
4.4/5
starstarstarstarstar-half
Description

Kubernetes is a production-grade container orchestration system that helps you maximize the benefits of using containers. Kubernetes provides you with a toolbox to automate deploying, scaling, and operating containerized applications in production. This course will teach you all about Kubernetes including what it is and how to use it.

This course is paired with an Introduction to Kubernetes Playground lab that you can use to follow along with the course using your own Kubernetes cluster. The lab creates a Kubernetes cluster for you to use as we perform hands-on demos in the course. All of the commands that are used in the course are included in the lab to make it easy to follow along.

Learning Objectives 

  • Describe Kubernetes and what it is used for
  • Deploy single and multiple container applications on Kubernetes
  • Use Kubernetes services to structure N-tier applications 
  • Manage application deployments with rollouts in Kubernetes
  • Ensure container preconditions are met and keep containers healthy
  • Learn how to manage configuration, sensitive, and persistent data in Kubernetes
  • Discuss popular tools and topics surrounding Kubernetes in the ecosystem

Intended Audience

This course is intended for:

  • Anyone deploying containerized applications
  • Site Reliability Engineers (SREs)
  • DevOps Engineers
  • Operations Engineers
  • Full Stack Developers

Prerequisites

You should be familiar with:

  • Working with Docker and be comfortable using it at the command line

Source Code

The source files used in this course are available here:

Updates

August 27th, 2019 - Complete update of this course using the latest Kubernetes version and topics

May 7th, 2021 - Complete update of this course using the latest Kubernetes version and topics

 

Transcript

Containers in a pod share the same network stack, but each has their own file system. It could be useful to share your data between containers. For example, having an init container prepare some files that the main container depends upon. The file system of containers are also limited to the lifetime of the container, so this could present some undesirable effects. For example, if the data tier container we are using in our examples crashes or fails a likeness probe, it will be restarted, and all of our data will be lost forever.

So this lesson is gonna cover the different ways Kubernetes hands non-ephemeral data that bring data from containers, Kubernetes volumes, and Kubernetes persistent volumes. Our goal for this lesson is to deploy the data tier from our sample application, using a persistent volume so the data can outlive the data tier pod. Again, this lesson builds on the code from the previous lessons, so let's first discuss more about the options for storing persistent data and then apply them to our data tier.

Kubernetes includes two different data storage types. Both are used by mounting a directory in one container, and then that could be shared by containers in the same pod. Pods can also use more than one volume or persistent volume. Their differences are mainly in how their lifetime is managed. One type exists for the lifetime of a particular pod and the other is independent from the lifetime of the pods. Volumes are tied to a pod and their life cycle. Volumes are used to share data between containers in a pod and to tolerate container restarts.

Although you can configure volumes to use durable storage types that survive pod deletion, you should consider using volumes for non-durable storage that is deleted when the pod is deleted. Default type of volume is called emptyDir and it creates an initially empty directory on the node running the pod to back the storage used by the volume. Any data written to the directory remains if a container in the pod is restarted.

Once the pod is deleted, the data in the volume is permanently deleted. It's worth noting that since the data is stored on a specific node, if a pod is rescheduled to a different node, that data will be lost. If the data is too valuable to lose when a pod is deleted or rescheduled, you should consider using persistent volumes. Persistent volumes are independent from the lifetime of pods and are separately managed by Kubernetes. They work a little bit differently.

Pods may claim a persistent volume and use it throughout their lifetime. The persistent volumes will continue to exist outside of their pods. Persistent volumes can even be mounted by multiple pods on different nodes if the underlying storage supports multiple readers or writers. Persistent volumes can be provisioned statically in advanced by a cluster admin or dynamic for a far more flexible self-serve use case.

Pods must make a request for storage before they can use a persistent volume. The request is made using a persistent volume claim, or PVC. A PVC declares how much storage the pod needs, the type of persistent volume, and the access mode. The access mode describes the persistent volume and whether it is mounted in read-only, read-write, or read-write many. There are three supported access modes to choose from, read-write once, read-only many, or read-write many. If there isn't a persistent volume available to satisfy the claim and dynamic provisioning isn't enabled, the claim will stay in a pending state until such persistent volume is ready. The persistent volume claim is connected to the pod by using a regular volume with the type set to persistent volume claim.

Both volumes and persistent volumes may be backed by a wide variety of volume types. It is usually preferable to use persistent volumes for more durable types and volumes for more ephemeral storage needs. Durable volume types include the persistent disks in many cloud vendors, such as Google Cloud Engine persistent Disks, Azure Disks, and Amazon Elastic Block Store. There's also support for more generic volume types, such as network file system or NFS and iSCSI. That is quite a lot to take in, but everything should solidify with an example.

So our objective is to use a persistent volume for the sample application's data tier, since we want the data to outlive the pod. In our example, the cluster has an Amazon Elastic Block Store volume statically provisioned and ready for us to use. To see dynamic provisioning in action, I'd encourage you to complete the lab on CloudAcademy entitled "Deploy a Stateful Application in a Kubernetes Cluster."

Before we get into volumes, I want to cement the issue we are trying to solve. We can illustrate the issue of a pod computer losing data when they restart by forcing a restart of the data tier pod. First off, let's look at the counter that's been running since our deployments lesson. That is the value of our counter at the moment, and every second, it will keep increasing.

Now, if I force the pod to be restarted, we can observe the impact on the counter. One way to do that is to kill the Redis process, which'll cause the data tier container to exist, and the data tier pod will automatically restart. We can use the exec command and run a command inside of the container the same way Docker exec does. So let's open a bash shell inside of that container.

The change of a command prompt tells us we're in the container now. We can now use the kill command to stop the main process of the container. But what is that ID? The ID of the main process, which is Redis in this case, will always be one, since it is the first process that runs inside of the container. We can see the command prompt change back, since the container terminated, so our shell was also terminated.

Now we can look at the counter value through the poller logs again and see that it is much lower than before because the data tier was completely wiped out when the pod restarted. This is what we want to avoid. So let's create our new namespace and get on to the volumes. Now on to the data tier. There are three additions to the manifest, a persistent volume, a persistent volume claim, and a volume to connect the claim to the pod.

First is our persistent volume. This is the raw storage where the data is ultimately written to by the pod's container. It has a declared storage capacity and other attributes. Here we've allocated one gibibyte. The access mode of read-write once means this volume may be mounted for reading and writing by a single node at a time. Note that it is a limit on the node attachment, not pod attachment.

Persistent volumes may list multiple access modes in the claim that specifies the mode it requires. The persistent volume can only be claimed in a single access mode at any time. Lastly, we have an AWS Elastic Block Store mapping, which is specific to the type of storage backed by the PV. You would use a different mapping if you were not using an EBS volume for storage. And the only required key for AWS's Elastic Block Store is the volume ID, which is uniquely identified by the EBS volume. It will be different in your environment than mine, so I've added an insert volume ID placeholder that we will place before we recreate the PV. 

Next, we have the persistent volume claim. The PVC spec outlines what it is looking for in a persistent volume. For a persistent volume to be bound to a PVC, it must satisfy all the constraints in the claim. We are looking for a persistent volume that provides the read-write once access mode and has at least 128 mebibytes of storage. The claim request is less than or equal to the persistent volume's capacity and the access mode overlaps with the available access modes in the persistent volume. This means that the PVC request is satisfied by our persistent volume and will be bound to it.

Lastly, the appointment's template now includes a volume which links the PVC to deployments pod. This is accomplished by using the persistent volume claim mapping and setting the claim name to the name of the persistent volume, which is data tier volume claim. You will always use persistent volume claim when working with PVs. If you wanted to use an ephemeral storage volume, you would replace it with an emptyDir mapping or other types that don't connect to a persistent volume.

Volumes can be used in the pods containers and init containers, but they must be mounted to be available in the containers. The volume mounts list includes all the volume mounts for given container. The mount pass for different containers could be different even if the volume is the same. In our case, we only have one, and we are mounting the volume at slash data, which is where the Redis is configured to store its data. This will cause all of the data to be written to the persistent volume.

Now we are left with replacing the volume ID placeholder with the actual ID of the Amazon EBS volume. You can get it from the EC2 console in your browser, but we'll just use the AWS CLI in this example. The volume can be obtained from the AWS EC2 describe command as follows. The full command is available to copy in the introduction to Kubernetes playground lab. The filter only selects the persistent volume which is labeled with the type equals PV tag, and the query outputs only the volume ID of the property volume.

The introduction to the AWS CLI on CloudAcademy explains this more so in greater detail. We only need to get the volume ID in this case. Then we can use the stream editor or set to substitute the occurrence of insert volume ID with the volume ID stored in vol_id. And with that, we are ready to create the data tier using a persistent volume. We'll also create the app and support tiers, which don't have anything new compared to the previous versions.

Let's get the persistent volume claim, which has a short name of PVC in kubectrl, to confirm the claims request is satisfied. The status of bound confirms that the persistent volume claim is bound to the persistent volume. Now, if we describe the data tier pod, we'll see that the pod initially failed to schedule because the claim needs to wait a while before it is bound to the persistent volume.

Once it is bound, the pod is scheduled and we can see a successful attachment with the volume event. Not only can our new design tolerate data tier pod container restart, but the data will persist even if we delete the entire data tier deployment. If everything goes to plan, we should be able to recover the Redis data if we then replace the deployment. That is because the deployment template is configured to use the PVC, and the PVC is still bound to the persistent volume storing our original Redis data.

So let's verify all of this. Before we delete the data tier deployment, let's get the last log line from the poller to see where our counter is at. If we delete the deployment and then replace it, we should see a number higher than this if the data is persistent. So let's do that. And now we're going to confirm that there are no data tier pods left running, and we'll now recreate the data tier deployment.

Now, it takes a couple of minutes for all the readiness checks to start passing again and for some old connections to time out. This is mainly a side effect of the example application not being particularly good at handling this type of situation. So after a minute or two, we can get the poller's last log, and voila, the counter has kept on ticking upward from where we left off before deleting the deployment.

Our persistent volume has lived up to its name. This concludes our lesson on volumes. We've covered volumes, persistent volumes, and persistent volume claims. In our example, we've shown how to use a persistent volume to avoid data loss by keeping the data independent from the life cycle of the pod or the pod's volume. We also saw how kubectrl exec allows us to run commands in existing containers when we demonstrated how container restarts cause data loss when volumes aren't used. You now have a solid foundation for volumes and persistent volumes.

In our next lesson, we're gonna be covering two other useful Kubernetes features that you should keep in your toolbox. They also have a nice tie-in with volumes. There's a few more lessons to go. Keep it up and I'll catch you in the next one.

About the Author
Students
28335
Courses
8
Learning Paths
2

Jonathan Lewey is a DevOps Content Creator at Cloud Academy. With experience in the Networking and Operations of the traditional Information Technology industry, he has also lead the creation of applications for corporate integrations, and served as a Cloud Engineer supporting developer teams. Jonathan has a number of specialities including: a Cisco Certified Network Associate (R&S / Sec), an AWS Developer Associate, an AWS Solutions Architect, and is certified in Project Management.