Container orchestration is a popular topic at the moment because containers can help to solve problems faced by development and operations teams. However, running containers in production at scale is a non-trivial task. Even with the introduction of orchestration tools, container management isn’t without challenges. Container orchestration is a newer concept for most companies, which means the learning curve is going to be steep. And while the learning curve may be steep, the effort should pay off in the form of standardized deployments, application isolation, and more.
This course is designed to make the learning curve a bit less steep. You'll learn how to use Marathon, a popular orchestration tool, to manage containers with DC/OS.
Learning Objectives
- You should be able to deploy Mesos and Docker containers
- You should understand how to use constraints
- You should understand how to use health checks
- You should be familiar with App groups and Pods
- You should be able to perform a rolling upgrade
- You should understand service discovery and load balancing
Intended Audience
- Sysadmins
- Developers
- DevOps Engineers
- Site Reliability Engineers
Prerequisites
To get the most from this course, you should already be familiar with DC/OS and containers and be comfortable with using the command line and with editing JSON.
Topics
Lecture | What you'll learn |
---|---|
Intro | What to expect from this course |
Overview | A review of container orchestration |
Mesos Containers | How to deploy Mesos containers |
Docker Containers | How to deploy Docker containers |
Constraints | How to constrain containers to certain agents |
Health Checks | How to ensure services are healthy |
App Groups | How to form app groups |
Pods | How to share networking and storage |
Rolling Upgrades | How to preform a rolling upgrade |
Persistence | How to use persistent storage |
Service Discovery | How to use service discovery |
Load Balancing | How to distribute traffic |
Scenario | Tie everything together |
Summary | How to keep learning |
If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.
Welcome back! In this lesson, we're going to see how to set attributes on agents and then use constraints to determine where an app can run and where it can't. Some of the reasons you might want to use constraints might be to distribute apps across different racks. This is kind of similar to Azure's fault domains, if you're familiar with that concept.
The general idea is to make sure that your apps are distributed in such a way that if an entire server rack were to go down the app would still have instances running on other racks. Another use might be to ensure only one instance of an app runs on a given agent or maybe you want to ensure that certain apps only run on agents that have specific hardware.
For example, maybe you want to run on agents with SSDs or with GPUs or something like that. Now, I have six agents in my cluster, one public and five private. For this demo, let's imagine that each of the private agents is being run on a different server rack. Let's set an attribute on one of the agents so that it specifies it as rack one.
Before connecting to the private agent to set this up, I want to show the details page for the nodes so that you can see where the attributes are going to show up in the UI. Here on the nodes page there are six nodes. Drilling into the first one and showing the details. Okay so there's no header here for attributes, there's just the two headers.
If the node has attributes, they'll be listed under the attributes section. So let's check out the next node. Okay, this is similar to the first one. Let's move on to the next and again there's no attributes. All right, how about the next one? All right, perfect, there it is. So this node ending with IP address 111, has an attribute of public IP set to true and this, as you may have guessed, is a public node, which means all the other nodes are private nodes.
So we know that the first few nodes in the list were private nodes, so let's select the first one and that's where we'll we add some attributes. Private agents are behind a firewall and port 22 isn't exposed to the world. Actually the private agents don't even have a public IP address. Though, we do need to connect into this node so that we can add the attribute.
No worries, we can use agent forwarding to solve this problem and the easiest way is to connect with the DCOS node SSH command. In order to connect to a private node, you need to have its ID, so let's copy that and let's head over to the terminal. The command to connect is DCOS node SSH followed by the MESOS ID flag and I'll paste that here and then we need to use the flag master proxy to connect to the private agent through the master and we're in, nice.
So, poking around, you can see that from who am I this is the core user. Running uname tack a gives some clues about which cloud provider this is running on, as well as that this is Core OS and to add the agent we need to change directories so it's CD into the var/lib/dcos directory. And the file we need to add the attributes to is called mesos slave common however you can see that it doesn't exist here.
So let's create it and let's set the attribute at the same time by echoing the attribute into the file and there we go. Now, let's just verify that this is set with cat command and perfect, it's exactly what we'd expect. Now, before the attribute will be reflected on the node, we need to remove the previous attributes directory and then restart the service.
So to do that, I'm going to paste in these commands. The first is going to remove the latest directory and then the second one here is going to restart the mesos slave service which also recreates that latest directory that we just deleted. Okay, now, if we head back to the UI, we should see the attribute and there it is.
So this node is now considered rack one according to these attributes and the other private nodes don't have any attributes. So, let's deploy an app so we can test this out. Take a look at this JSON here. This should look familiar. This is the first app that was created in the course, and the only difference is it now has a constraint.
The pattern here is that the constraint is in an array with three values. The first is the attribute, the second is the operator, and the third is the value. If we were to write this out as an English statement, it might be something like deploy hello world to an agent where the rack attribute isn't one.
So what this constraint will do is make sure that this app is not going to run on that agent that we just added the attribute to of rack equals one. So, let's create this app and see how it works. It's the same command that you've seen a dozen times DCOS marathon app add. Okay. In the UI under services, there's the service and if we drill into this, we can see that there are the 30 instances, and the nodes that are running this are listed here too.
Now, you may not remember it but the IP address for the agent we called rack one ends in 186 and none of these listed here have an IP address ending in 186, which means none of these are rack one. So this is working. Actually, let's jump back to that node detail page for 186 so you can see that 186 is actually rack one and if we go back to the service and I show them listed off here, you can see 113, 236, et cetera but no 186.
So this is working as expected. Now, let's try this again, only this time let's make sure that this can only run on rack one. So I have another JSON file here. The ID is hello world from rack one, and the constraint says that this should be run on rack one. Let's create this and it's taking just a moment to deploy and there it is.
By drilling into the service you can see all of these tasks are running on 186. Now you may be wondering what other operators are available for constraints and that's a great question. There's the UNIQUE operator which ensures that all instances of the app have a unique value for that attribute. The GROUP_BY operator ensures all app instances will be grouped evenly around the attribute.
The CLUSTER operator allows you to ensure that the app will only run on agents with the specified attribute. Now, you might be wondering how this is different than the demo where I used the LIKE operator and that's a great question. The answer is that the LIKE and UNLIKE operators aren't just performing string comparison, they use a regex match.
So as you may have guessed, you can also combine constraints. So that if you wanted to make sure that apps run on a unique host in rack one, you can do something like that. Now, this flexibility allows you to ensure that apps run on the agents that you intend for them to run on. So, let's recap what we've covered so far.
Constraints use attributes that are set on agents to ensure that the apps are run only where they're supposed to. This gives a lot of flexibility and will help to make sure that apps are running on servers that are best suited for the workload. So the steps are first you need to manually set up the attributes you need to SSH into the agent where you want to add the attributes.
You need to CD into the var lib DCOS directory. You need to edit or create a file called MESOS dash slave dash common. You have to add the attributes to the file with the pattern of MESOS underscore attributes equals and then your key value pairs separated by colons. After that, you need to remove the latest file and I showed that it was the /var/lib/mesos/slave/meta/slaves/latest followed by restarting the MESOS slave daemon and then you can deploy a service to test it out.
All right, that's going to wrap up this lesson here. In the next lesson we're gonna cover how to use health checks and if you're ready to see some health checks in action, then I'll see you in the next lesson.
Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.