Cloud Academy Office Hours: April 2017

In mid-April, we hosted the first Cloud Academy “office hours” webinar. This was intended to be an open Q&A session for all things cloud computing that would allow us to address some of your most common questions and topics, from the general Cloud to Docker, Kubernetes, AWS, and more. In all, we received 37 questions! (Thanks to everyone who participated and submitted questions!!) In this post, I’m going to feature some of my favorite questions and elaborate on some of the topics discussed in the live event.

If you’d like to see more “office hours” type webinars and posts, please let us know in the comments. Now, let’s dive into the questions!

What’s the best way to learn how to use Docker and its capabilities?

Learning how to use Docker (and containers) is no different than any other technology in general. My preferred approach is to mix educational material like courses and blog post tutorials with hands-on experiencing building our own prototypes. This helps you understand the concepts and apply them to a problem that you’re familiar with. I recommend that you pick your own application and get going. Mix that with my course on Docker fundamentals then learn to deploy with Kubernetes. You can also go through the webinar back catalog for more great learning material.

What skills are required to learn Docker/Kubernetes? Are coding skills required?

Coding skills are not required to learn Docker or Kubernetes because technically you don’t need to code to use either of them. However, the technology and its benefits will be lost on you if you have no prior experience building applications. Here’s an example: How will you know what to put in a Dockerfile if you’ve never prepared a development environment or compiled some software? The short answer is you won’t. I suggest that you put containers and container orchestration further down your list until you’ve built and deployed a few applications. If you have a web application or two under your belt, then dive right it.

We are still at the virtualization state – what modifications do you suggest before we involve any container technology?

This depends on how stateful your virtualization-based solution is. The solutions that are currently available for deploying containerized solutions are best suited for stateless, horizontally scalable application containers. They do support stateful applications but this is a more difficult road to go down. Here is my recommendation: First, make sure that your current infrastructure is horizontally scalable. Next, you can move from running your processes on “bare metal” inside the virtual machine (VM) to containers inside the VM. This gives you a place to get acquainted with the technology before replacing the underlying infrastructure. Finally, you can replace your infrastructure and deployment process with a container orchestration solution.

Which container orchestration tool (e.g. DCOS or Kubernetes) is best suited for serverless microservices?

First, I need to unpack the question a bit. Container orchestration software does not generally apply to serverless solutions. Serverless and container solutions focus on different artifacts. Consider AWS Lambda or Azure Functions. You publish individual functions/methods and the cloud provider runs them for you. Containers focus on process isolation and dependency management. There are some projects to run something like “functions as a service” on top of container orchestration platforms like Kubernetes. This is possible because an individual function may be executed in a container. This is very early days. I’m also guessing supporting this type of application is not core to the container orchestration projects right now.

Now, let’s talk about microservices applications deployed as containers. Remember that microservices do not mandate containers, but it is safe to assume this association in the current climate. Personally, I don’t think that one orchestration tool is technically better than another for microservices. The best one comes down your own personal preference and unique technical requirements.

What is the best way to monitor your containers and what is the difference between monitoring containers vs. virtual machines?

I’m excited that this question came up because it’s so common and an important part of any production application. Here, I’ll elaborate on the answer given in the session. The best way to monitor containers is to apply the existing technologies and methodologies for monitoring cloud-based systems. Odds are that you have experience with something like Nagios, Collectd, ganglia, or any system that collects data via an agent. In my experience, this is the most common and easiest way.
Conversely, you can use a system like Prometheus to pull metrics from target systems. At this point, you have data. Now, ship this off to your ingress points such as Graphite, or a SaaS like Librato. Next, work with the data just like any other data. The point is that the technical approach does not change much, just the scope of your data.

Let’s assume for a moment that you have a VM running somewhere. You can collect data about that VM from the hypervisor itself or from an agent running in the VM. Now, assume that there is a container running in that VM. A container is just another running process on your system.

How would you monitor something like Apache? You may watch CPU usage, memory, or IO metrics. These metrics come directly from the Docker daemon. An agent may connect to the Docker daemon and ship this data off. The difference compared to VMs arises in how dynamic your overall infrastructure is. If a new VM comes up, it must be monitored. The same goes for containers. If a new container comes up then it should be monitored as well.

It’s easy to automatically collect CPU/memory/disks. You will need to put more effort into deciding how to monitor the processes themselves. Consider a web server running in a container. You may want to consistently check that the server handles the request. Does this go into your monitoring tool or do you leverage the health checks built into Docker? How do new containers automatically pick the specific types of monitoring they need? These are the types of questions that you need to ask, depending on how dynamic your solution is. The answers may guide you to the answer in comparing it to virtual machines.

Among the currently available orchestration tools, is there a clear market leader?

I don’t have any hard numbers to declare the market leader in terms of the number of deployed applications. However, Kubernetes is definitely the market leader by community size.

Networking in Docker: How to persist data in Docker and how do containers in Docker talk to each other?

Let’s unpack this into two questions: 1) How to persist data with Docker containers?; 2) How do Docker containers talk to each other? I’ll start with #1 because it’s shorter and easier to answer. You should use Docker Volumes for persistent data. Volumes are independent of containers and may be reused across different containers.

Question #2 is more complicated. I’ll do my best to cover the high-level points with a bit of hand-wavy explanations. (The Docker networking guide can fill in everything behind the hand-wavy magic.) Docker networking, broadly speaking, operates at two different levels: host and non-host networks. Containers on the host network operate just like any other process running on the Docker host. You can access their exposed ports on the host IP or hostname.

Non-host networks are more powerful and more complicated. Modern Docker version use SDN (software-defined networking) to accommodate many different use cases. This boils down to creating a network for all of the containers in an application. Give each container a name. Then each container in the network can resolve the others on hostname (e.g. DNS). Docker sets up things like IPs and host file entries.

This topic gets much more complicated when you talk about cross-host networking like multiple Docker hosts inside a Docker Swarm cluster. I won’t touch on that because it’s not in my expertise. I would recommend that you check the networking guides for each orchestration platform if you want to learn more about these solutions.

Are containers a threat to virtual machines?

This post gives me the opportunity to refine my answer from the webinar. I see containers replacing VMs for deploying applications. They are not at all a threat to virtualized machines as a technology. That is actually not possible since they solve two completely different problems.

Here’s an example. Our previous deployment infrastructure at Saltside used golden AMI images. Each commit built an AMI with a predetermined SHA and all run-time dependencies. The AMI also included all of the monitoring agents and various other things that we needed. Then, we put that AMI into an auto-scaling group behind a load balancer. This is simple and worked well for years. However, it’s not the most resource efficient approach. It’s possible that one EC2 instance could run multiple processes instead of just one. Eventually, we moved to containers, which allowed us to pack more processes onto a single machine. This did not replace virtual machines completely. It just changed what we did with them.

Containers are not intended to run multiple processes. Virtual Machines are. VMs are not going anywhere. We’ll always need VMs to build and scale infrastructure solutions regardless of whether it runs standard processes, “serverless” functions, or containers.

How do you see Docker or Kubernetes or any containers in the next two years or so?

Cool, another personal opinion question! I predict a world that focuses on orchestration rather than runtimes. We are in the middle of that transition. The industry focuses on the runtime when discussing the development phase because this is what developers are interacting with. Now, we’ve moved out of this phase because teams are more interested in deploying the containerized application that they’ve built.

This is where orchestration tools come into the picture. Kubernetes (my preferred tool) is actively working to define interfaces for the different components in their internal stack. This makes things like networking and runtime interchangeable. Hopefully, in the next few years, we will be more focused on how we deploy containerized applications rather than how they’re run behind the scenes.

We’ve seen this before with VMs. These days, we don’t care about the VM technology itself—”just give me a VM”. My guess is that, in the next couple of years, Docker (or Moby or whatever they’ve been rebranded as) will be decreasingly important. I also predict that the communities will develop a “functions as a service” solution built on top of container orchestration.

What’s your preferred stack?

These days I work with non-monoliths. My preferred approach is to keep each application in its own code repo and keep a mono-repo for “releases” of all applications. A “release” is a change in the configuration or code of any of the applications. This repo is packaged up as a Helm chart and deployed to Kubernetes. Application-level concerns such as what language or web framework to use are made irrelevant to the underlying infrastructure. Application developers can use whatever technology they like and everything is deployed in the same way (containers via Kubernetes). Things like which monitoring system to use are context specific.

That’s a wrap on this post. I hope it clarifies some of my answers in the session. Stay tuned to the Cloud Academy blog and webinars for more helpful content on Docker, Kubernetes, AWS, and all things cloud.

Cloud Academy