Introduction to Persistent Storage Options in Docker
Start course
Difficulty
Intermediate
Duration
1h 38m
Students
19936
Ratings
4.8/5
starstarstarstarstar-half
Description

In this lesson, we are going to cover persistent storage options.

We will start by discussing the layers of each Docker image, particularly the last layer, which is the writable one.

You will learn about the three ways that Docker handles persistent storage:
- Bind Mounts: work by mounting a file or directory that resides on the host, inside the container. You will learn about the positives and negatives of using bind mounts.
- Volumes: the preferred way to handle persistent file storage. You will learn about the different drivers that volumes can handle such as Amazon's S3, Google Cloud Storage, and more.
- In Memory: often referred to as tmpfs (temporary file systems). These are not persistent, so they are typically used for holding sensitive information like access tokens.

You will use a simple application to test several bind mounts and volumes to better understand the concepts behind them.

Then, we will go through a process to recreate some of the same output as with the bind mounts and volumes, but these will only exist while the container is running.

Finally, we will start and stop several containers to showcase the different persistent storage options. You will be able to make an education decision the next time.

Transcript

Welcome back! Now, if all of your applications were stateless and never needed data storage, life would be much simpler. However, in the real world we need data storage. We've talked about the different layers that make up a Docker image. The last layer of the image is a writable layer. Which means applications that need write access will work just fine.

However, as soon as you stop the container, whatever you've written there is gone. Which means using the writable layer is not an effective way to handle persistent storage. Luckily, Docker provides three options. Which are bind mounts, volumes and in memory option called tmpfs, as in temporary file system.

Bind mounts have been around for a while. They work by mounting a file or directory that resides on the host, inside the container. This remains an effective mechanism that allows you to access files from the host inside the container. And once the container stops, the data remains because it lives on the host.

The downside here is that bind mounts aren't as decoupled from the host as you might like. You need to know the exact path on the host that you want to mount in the container. The upside is that this could work well for development, because you don't need to rebuild the image to access the new source code, so you make changes to your source and it reflects immediately.

Docker still supports bind mounts because they work well. However, the preferred way to handle persistent file storage is with volumes. Volumes are basically just bind mounts, except that Docker manages the storage on the host. So you don't need to know the fully qualified path to a file or directory. This makes it easier when working cross platform, because Docker handles the volume.

Volumes aren't limited to the local host file system either, they allow you to use different drivers. The drivers support the use of external storage mechanisms such as Amazon's S3, Google Cloud Storage, and more. When you stop a container using volumes or bind mounts, the data remains on the host. In contrast, the third storage option, tmpfs, is different.

Temp FS is an in temporary file system is an in memory file system. From inside the container you still interact with the files the same way you would any other file. The difference is that tmpfs is not persistent. Because tmpfs allows file system access for the life of the running container, it's commonly used to hold sensitive information, that includes things like access tokens.

Let's try these out. To test out bind mounts and volumes, I've created an app in Go that will loop 50 times and it will write the host name and the loop counter to a file. The file is specified as a command line argument so you can pass in any file you want. So this will allow us to write some data to the volumes from multiple containers.

I've already compiled this, and the binary is in the same directory as the Dockerfile. The Dockerfile for this demo is very basic. It's based on the scratch image, it copies the binary to the root directory, and finally it sets the default command to run the binary, and pass in the path to where we want to write the data to be written to.

So that's our volume. The directory of /logs is going to be the mounted directory, and the name myapp is just an arbitrary name I've made up, so it has no real meaning. Here in the terminal, you can see there are no containers, and here's the list of existing images. In this directory you can see the Dockerfile and binary.

Let's test out bind mounts first, and for that we need a directory to mount. So I've created a directory under /var/demo/logs, and you can see here that it's empty. Okay, let's build the image that we'll use for the demo. Let's call it scratch_volume. Okay, that didn't take long, and there it is in the list.

So now, I want to paste in a command that I have copied to my clipboard. Most of the parameters used here have been used throughout the course. With the exception of the mount flag. This is the most current way to specify different storage options, since it's the most flexible. Notice that the type is set to bind.

Then you need to specify the directory source and the destination. The source is the directory on the host you want to mount, and the destination is the path inside the container where it will be mounted. In this example the /var/demo/logs directory will be available inside the container at /logs. Recall that this image is going to write some data to the a file inside of the mounted /logs directory.

So let's start up a few containers because I want to show how multiple containers will have access to the same directories and files. Okay, now, that this has been running for a moment, there should be some data that has been written from the container to the bind mount. Here you can see the last 30 lines written to the myapp file that the application created.

If I print out the unique host IDs in the file, here we go, you can see there are four entries. If I compare them against the container IDs you can see that each container was able to write to the same shared file. Remember that when using a bind mount, the directories and files are managed by you and not Docker.

This means that the listing here for volumes will show zero results. However if you list off the files under the /var/demo/logs you're going to see the myapp file is still available to use on the host. Okay, let's prune these containers so we can clean things up a bit, and there we go. Now let's try the volume.

Here's roughly the same command as before except the type is now set to volume. Also with a bind mount the source is a fully qualified path, and now it's not. Since Docker manages it, you can just use a name for the volume. Docker allows you to create the volume with the docker volume command. However, if you use the mount flag with a type set to volume, and the volume doesn't exist, it's going to automatically be created for you.

Notice how it shows up under the volumes as a local volume and it's named logs. Docker manages the location of the volume, which you can find by using the docker volume ls command. Notice here that the mount point is under /var/lib/docker/volumes. Let's start up a few more containers, okay great! Now we can use tail to show the results streaming in.

Let's use the tail command and refresh every second. Notice all of the results from the different hosts are being appended. Let's once again display all of the unique IDs to see if all of the containers successfully were able to write their data. Let's cat the file, and then let's cut based on a space, and we'll grab the second field, and we'll pipe through sort, and we'll cap it off piping through the unique binary.

Okay, and you can see we have 4 IDs. Listing off the existing containers you can see it's the same IDs. With a bit more bash scripting we can sort just the container IDs and there you can see the IDs are in the same order now. Just makes it easier to compare. So what does all of this actually show? Both bind mounts and volumes are roughly the same.

However, volumes are managed by Docker. Both allow multiple containers to access the same mounts. Both keep the data on the host after the containers are stopped or removed. Let's check out the final storage type, which is the temp filesystem, tmpfs/ Tmpfs is similar to the others in that you can create it with the mount flag.

However since it's only creating an in memory construct you don't need to specify a source directory, because there's nothing on the host that's gonna persist that info. For this demo we'll use the ubuntu image so we can create an interactive bash session. The type here is tmpfs and the destination is set to /logs.

So inside the container we'll have a directory under /logs that behaves just like the other storage options; except that it only exists while the container is running. There's nothing here under the /logs. So let's add a file. We can just echo some text out to standard out, and we'll call the file demo.

Now, I'm going to exit the container and keep it running with CTRL+p followed by CTRL+q. You can see that based on the listed containers that it's still running. If I list off the volumes, you can see that it doesn't show any new volumes. I show this to demonstrate that all three storage types use the same mount flag, however, they're all implemented in a different way.

If we start a new Ubuntu container with a tmpfs mount we can see that unlike bind mounts and volumes, tmpfs is container specific. Let's list off the files under /logs, and it's empty. Let's reattach to this container start with ID starting with 4d. Alright, if we look at the logs directory and print the demo file to screen, you can see that the demo file text is there and the file still exists.

That's because the container hasn't been stopped, so while it's running that information is available. So now if we stop the container by existing out. Okay let's list off you can see that it's stopped, and there it is. Now let's start the container again, okay. And now let's attach to the container. And here we are, back at the bash prompt.

And if we list off the contents of the logs directory you can see that it's now empty. Tmpfs is a great option when you need file system access that's isolated to the container. So this is really great for things like sensitive information, access tokens, or any sort of sensitive information that your app might need while the container's running is useful for this sort of mount.

Storage is a crucial part of most applications, and that's why Docker provides you options. Each of these is a viable option, depending on the use case. However, if you're not sure which type to use, then volumes are probably the safest bet. Alright, let's wrap up here. In the next lesson, we're going to learn more about tagging.

So, if you're ready to keep learning, then I'll see you in the next lesson!

About the Author
Students
95980
Labs
28
Courses
46
Learning Paths
54

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.