Developed with


Course Introduction
Wrap Up
2m 12s
Start course

Do you remember the days of deploying an N-tier application to on-premises servers? The planning that went into determining the right amount of hardware to use so that you weren’t under or significantly over-provisioned. Deployments were often problematic because what ran well on the developer’s computer didn’t always work outside of their environment. Deployments also were assumed to cause downtime, and scheduled during non-peak hours.

In the event of a hardware failure, your app might have been unavailable depending on how much hardware you had access to, and how the application was designed. Failovers may or may not have been automatic, and frankly, it was all a lot of work.

Well, if you thought that was difficult, imagine trying to do all of this at the scale of Google, Facebook, Twitter, Netflix, or similar companies.

All of the companies I just mentioned found that hyperscale computing required a new way to look at things. And regardless of the actual tools that they used, they all had the same solution, which was to treat their entire data center as a single entity.

And that’s what DC/OS does: it’s a central OS for your data center, and it’s the topic of this course.

Learning Objectives

  • You should understand how DC/OS is used
  • You should have a high-level understanding of DC/OS
  • You should be familiar with the UI
  • You should be familiar with the CLI
  • You should be able to install services from the catalog

Intended Audience

  • Sysadmins
  • Developers
  • DevOps Engineers
  • Site Reliability Engineers


  • Familiarity with containers
  • Comfort with the command line


Lecture What you'll learn
Intro What to expect from this course
A Brief History The history of DC/OS
Overview An overview of DC/OS
Components About the components of DC/OS
Exploring the UI How to navigate the UI
Installing WordPress (UI) How to install WordPress from the Catalog
Installing WordPress (CLI) How to install WordPress from the Catalog
Summary How to keep learning


DC/OS Open Source vs. Enterprise comparison
DC/OS agent OS requirements

If you have thoughts or suggestions for this course, please contact Cloud Academy at


Welcome back! We’ve covered what Apache Mesos and DC/OS are, now, in this lesson we’ll look at the components of DC/OS.

If you recall, I mentioned previously that DC/OS is built on top of Apache Mesos. So let’s start at the bottom of the stack and dive into Mesos a bit.

Apache Mesos consists of a master node, public agents, private agents and then one or more schedulers.

The master node manages the state for the cluster. It’s responsible for knowing how many agents there are, the total available resources, what’s running on the cluster and where.

You can have multiple masters for high availability, however there’s only going to be one active master which is referred to as the leader. There’s a component called zookeeper that is responsible for electing a new leader in case the current leader fails.

Private agents are nodes in the cluster that aren’t publically accessible. Now, that doesn’t mean you can’t open up a port and connect to a node directly. However, the intent is that they’ll remain private.

Public agents on the other hand are nodes in the cluster that will have public facing services.For example, a Public agent can run your Marathon LB load balancer to distribute requests to your Nginx web servers on the Private agents.

Other than one being public and one private, public and private agents are really the same thing under the hood.

If you recall, I mentioned in a previous lesson that Mesos allows multiple schedulers for handling various workloads and use cases. While there are different schedulers for Mesos, I’ll focus on one bundled with DC/OS called Marathon.

Marathon is used to handle long running tasks using either Mesos containers, which are based on cgroups, or Docker containers.

Let’s check out how these components interact together. First Mesos needs to know about available resources in the cluster. So agents connect to the Mesos master and let Mesos know what resources they have available.

Once Mesos knows about all of the available resources it’s ready to start scheduling tasks. Now a user can make a request to a scheduler such as the Marathon scheduler.

So, imagine you want to have an nginx container started up that uses 1 CPU and 1GB of RAM. You would make that request to the marathon scheduler via the UI or CLI. It’s then Marathon’s responsibility to check with the Mesos master to see if there are available resources; if so Marathon will send over the details to the Mesos master.

Once the Mesos master has the details about the task to run it can send that info over to an agent.
After the agent has started the container it let’s the master know, in turn the master lets Marathon know, and then Marathon informs any end users.

Now that the task is running on the agent, the agent has fewer resources available, so it informs the Mesos master so that it knows the state of the cluster’s resources.

At this point, you may be wondering about what happens in the event of a crash. Let’s imagine that the nginx container crashes on the agent.
In this case the agent informs the Mesos master, who informs the scheduler, and then the scheduler tells the master how to handle the crash. For example, it can ask that the container be restarted.

So, here’s the key takeaway for you: Mesos is NOT a container orchestration tool. Rather, it’s a cluster management tool that allows you to treat your cluster like one giant computer. And the reason Mesos is so popular and powerful is because it allows you to use different types of schedulers. Marathon in the example you just saw is a popular open source scheduler used for container orchestration on Mesos. However, it’s not a component of Mesos.

Take a look at this diagram here, it shows how Mesos is the base layer of DC/OS, and then there’s the “DC/OS” layer which provides all of the things Mesos doesn’t. This includes a lot of functionality to make using Mesos much easier.

That includes things such as container orchestration by including Marathon as the default container orchestration scheduler. It has a package manager and an application ecosystem that allows you to download and install third party software. There’s service discovery, virtual networking, jobs and services, and more.

In the enterprise version you also get secret management which is an important part of modern applications, end-to-end encryption, identity and access management, and technical support.

Take a look at this DC/OS architecture diagram. It shows all of the components of DC/OS with the open source components in a purplish color, and the enterprise components in a pinkish color.
This diagram not only shows the components of DC/OS, it also kind of speaks to the complexity involved in managing a cluster. That’s one of the things I like about DC/OS, it removes many of the barriers to entry for tasks such as cluster management and container orchestration; allowing engineers to manage these tasks with relative ease.

I’m not going to dive into detail on all of these components in this course. However, I do want to review the types of services and where they run.

First, look at the bottom of this diagram where it shows the components running on all nodes. All of the nodes in the cluster running services for logging and metrics. This includes functionality for logging, rotating the logs, diagnostics and metrics.

Under the networking stack, there are services such as minuteman which is provides a service discovery mechanism and load balancing for containers.

Spartan is used as a DNS forwarder also used in service discovery.

Navstar is an orchestrator for overlay networks.

Then the other two components are related to DNS and port mapping.

The final component that is on all nodes is a package management API used to manage the versions of DC/OS components on the node.

Heading up to the service running on the master nodes, it starts out with the admin router. This is the single gateway to the DC/OS cluster. It handles auth and serves as a proxy to the services in the cluster.

Next there’s the cluster management, which has the Mesos master as well as Exhibitor and Zookeeper. Zookeeper is used by DC/OS for a few things, including Mesos leader election, and persisting Marathon’s state. Exhibitor is used to make sure Zookeeper stays up-and-running.

The next section is the Cosmos package manager. This package manager is used as a service package manager, allowing you to install services in much the same way you’d install a node module with npm, or a Python library with pip.

Below this is the container orchestration components. Marathon as you know is used for managing long running tasks with Docker containers or Mesos containers. Metronome is used for short running scheduled tasks.

The next section has some networking components. And then there’s this section here that only applies to the enterprise edition, which has features for IAM and security.

Heading over to the agent nodes, they have an admin router agent that interacts with the admin router on the master node. They have this blank section here for user tasks, which are the things that you’ll be running.

There’s a Mesos agent, which as you know is responsible for interacting with Mesos to schedule tasks.

The storage component is called REX Ray and it’s responsible for providing interfaces to all persistent storage. REX-Ray has several different adapters allowing you to use storage on AWS, Google Cloud, Rackspace and more.

Finally, there’s the container runtimes that exist on the agents. These allow you to run different types of containerized workloads.

As you can see, DC/OS is the result of several distributed systems all working together. I do recommend that after this learning path, as you’re starting to feel more comfortable with DC/OS, that you spend some time understanding each of the components individually.

Alright, that’s going to wrap up this lesson. In the next lesson we’ll look at how to navigate the UI. So I’ll see you in the next lesson!

About the Author
Learning Paths

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.