Docker has made great strides in advancing development and operational agility, portability, and cost savings by leveraging containers. You can see a lot of benefits even when you use a single Docker host. But when container applications reach a certain level of complexity or scale, you need to make use of several machines. Container orchestration products and tools allow you to manage multiple container hosts in concert. Docker swarm mode is one such tool. In this course, we’ll explain the architecture of Docker swarm mode, and go through lots of demos to perfect your swarm mode skills.
After completing this course, you will be able to:
- Describe what Docker swarm mode can accomplish.
- Explain the architecture of a swarm mode cluster.
- Use the Docker CLI to manage nodes in a swarm mode cluster.
- Use the Docker CLI to manage services in a swarm mode cluster.
- Deploy multi-service applications to a swarm using stacks.
This course is for anyone that is interested in orchestrating distributed systems at any scale. This includes:
- DevOps Engineers
- Site Reliability Engineers
- Cloud Engineers
- Software Engineers
This is an intermediate level course that assumes:
- You have experience working with Docker and Docker Compose
Swarm mode is made to be familiar to single host Docker users. When you deploy a service, it is similar to running a container. You can specify an image, volumes, networks, published ports, After all, service tasks ultimately run containers. But there are container orchestration features of swarm mode that are unique to running services in a swarm.
We'll look at the following orchestration features of swarm mode:
" (Service placement) Which nodes service tasks are placed on
" (Update behavior) how service updates are rolled out, and
" (Rollback behavior) how services can be rolled back to a previous version.
As we've discussed, services can declare a set number of replicas as a replicated service or can be started on every worker node in a cluster as a global service. For replicated services, decisions need to be made by swarm managers for where service tasks will be scheduled, or where the service will be placed. A replicated service's tasks will be spread across nodes by default. That is to promote high availability in case a node fails. But there are three ways that you can influence where a service is placed:
1. CPU and Memory reservations
2. Placement constraints
3. Placement preferences
You can specify each at service creation time. Global services can also be restricted to a subset of nodes with these conditions. Although a node will never have more than one task for a global service. Let's take a closer look at each.
CPU and Memory reservations
Similar to running individual containers, you can declare CPU and memory reservations for services. Each service task can only be scheduled on a node that has enough available CPU and memory to meet the given reservations. Any tasks that remain stay in a pending state until a node with sufficient resources becomes available. Global services will only run on nodes that meet a given resource reservation.
Setting sufficient memory reservations for services is important when there isn't an abundance of CPU and memory available for the applications you are running. If services attempt to use more memory than is available, the container or Docker daemon could get killed by the out of memory or OOM killer.
Placement constraints allow you to restrict the placement of tasks by providing equality and inequality conditions. The conditions compare node attributes to a string value. There are a few built-in attributes for each node
1. node.id matches the ID of a node
2. node.hostname matches a node's hostname
3. node.role matches a node's role, either manager or worker
You can also define your own labels. You can configure labels on a Docker engine or on a node. Engine labels are usually used to indicate things like operating system, system architecture, available drivers. An example is engine.labels.operatingsystem and values could be Ubuntu 14.04 or Windows Server 2016. Node labels are added by Swarm administrators for operational purposes. Node labels can indicate they type of application a node is intended to run, the datacenter location a node is in, the server rack a node is in, et cetera. An example is node.labels.datacenter and values could be north, south, east, or west.
When you provide multiple placement constraints for a service, all constraints must be satisfied by a node in order to be scheduled a service task. If resource reservations are also provided, all constraints and resource reservations must be met. This is true for replicated and global services.
Placement preference is not required as was the case for resource reservations and placement constraints. Instead, placement preferences influence how tasks are distributed across appropriate nodes. Currently the only distribution option is spread which will evenly spread tasks.
Labels are again used as the attribute for spreading tasks. For example, assume every node in a swarm has a datacenter label with either east or west as the value. Using the datacenter label and the spread placement preference, half of the tasks will be scheduled on east datacenter nodes and the other half on west datacenter nodes.
Multiple placement preferences can be specified. In this case a hierarchy of preferences is created. For example, if the first preference is datacenter and the second Is server-rack, tasks will be evenly spread across nodes in each datacenter, and within each datacenter tasks are spread evenly across racks.
Nodes that are missing a placement preference label are included in the spread and receive tasks in proportion equal to all other label values. They are treated as the group having the null value for the label. Placement preferences are ignored by global services.
That's all that there is to influencing service placement in swarm.
You can also configure the way that swarm applies updates to services. Swarm supports rolling updates where a fixed number of replicas are updated at a time until all service replicas have been updated.
You can configure several update parameters:
1. Update parallelism, which sets the number of tasks the scheduler updates at a time
2. Update delay, which sets the amount of time between updating sets of tasks, and
3. Update failure action, which can be set to pause, continue or automatically rollback if an update fails. The default is to pause.
These are the three main settings. There are also settings to configure what qualifies as failure. You can set a ratio for the number of failed task updates to tolerate before failing a service update, and set the frequency for monitoring for a failure.
These parameters give you some flexibility in how aggressively or conservatively you roll out an update to the swarm.
Rolling Back Updates
Docker swarm keeps track of the previous configuration for services. This allows you to rollback manually at any time or automatically when an update fails, as we discussed.
The same options available for configuring update behavior are available separately for configuring rollbacks. For example, rollback parallelism sets how many nodes to roll back at a time.
In this lesson, we saw how you can influence the nodes that swarm schedules services on by using resource reservations, placement constraints, and placement preferences. Resource reservations and placement constraints must be satisfied, while placement preferences won't prevent a task from being scheduled. We also discussed how rolling updates and rollbacks can be configured in Swarm. Updates and rollbacks share the same available configuration options.
In the next lesson, we'll see how swarm mode keeps a consistent view of the swarm. An important topic for any distributed system. When you are ready, continue on to the next lesson to learn about swarm mode consistency.
Logan has been involved in software development and research since 2007 and has been in the cloud since 2012. He is an AWS Certified DevOps Engineer - Professional, AWS Certified Solutions Architect - Professional, Microsoft Certified Azure Solutions Architect Expert, MCSE: Cloud Platform and Infrastructure, Google Cloud Certified Associate Cloud Engineer, Certified Kubernetes Security Specialist (CKS), Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), and Certified OpenStack Administrator (COA). He earned his Ph.D. studying design automation and enjoys all things tech.