The course is part of this learning path
Do you remember the days of deploying an N-tier application to on-premises servers? The planning that went into determining the right amount of hardware to use so that you weren’t under or significantly over-provisioned. Deployments were often problematic because what ran well on the developer’s computer didn’t always work outside of their environment. Deployments also were assumed to cause downtime, and scheduled during non-peak hours.
In the event of a hardware failure, your app might have been unavailable depending on how much hardware you had access to, and how the application was designed. Failovers may or may not have been automatic, and frankly, it was all a lot of work.
Well, if you thought that was difficult, imagine trying to do all of this at the scale of Google, Facebook, Twitter, Netflix, or similar companies.
All of the companies I just mentioned found that hyperscale computing required a new way to look at things. And regardless of the actual tools that they used, they all had the same solution, which was to treat their entire data center as a single entity.
And that’s what DC/OS does: it’s a central OS for your data center, and it’s the topic of this course.
- You should understand how DC/OS is used
- You should have a high-level understanding of DC/OS
- You should be familiar with the UI
- You should be familiar with the CLI
- You should be able to install services from the catalog
- DevOps Engineers
- Site Reliability Engineers
- Familiarity with containers
- Comfort with the command line
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|A Brief History||The history of DC/OS|
|Overview||An overview of DC/OS|
|Components||About the components of DC/OS|
|Exploring the UI||How to navigate the UI|
|Installing WordPress (UI)||How to install WordPress from the Catalog|
|Installing WordPress (CLI)||How to install WordPress from the Catalog|
|Summary||How to keep learning|
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
Welcome back! Before we dive into the world of DC/OS, pick your favorite time machine and head back in time with me.
Let’s head back to the year 2000; the average price for a gallon of gas is $1.26; Bill Gates steps down as CEO of Microsoft; Harry Potter and the Goblet of Fire is published; and Google is working on its own resource orchestration software called Borg.
Borg was a system that allowed Google to think of its data center as a single entity. And it gave them a competitive advantage. Borg allowed Google to run processes in isolation across the cluster. This isolation was accomplished by a feature of the Linux kernel that Google engineers created called control groups, or cgroups for short. If you’re not familiar with cgroups, think of them as containers for Linux processes.
Having a scheduler that can run processes anywhere in the cluster allowed Google engineers to focus more on the process itself, rather than where it should be deployed and how. For years systems such as Borg were strictly internal tools and if you wanted something similar you had to build it yourself.
Between 2000 and 2010 there weren’t many companies that had systems like Borg. However, around 2010 some grad students at U.C. Berkley were working on creating a distributed OS named Mesos, that had schedulers for different types of tasks, such as batch, services, streaming, etc.
So finally, after several years, the idea of treating the data center as a single pool of resources was no longer just for large companies and academia. Mesos became an Apache project, and was made available to the world as an open source project.
You might be wondering what is Mesos exactly? Apache Mesos is a distributed kernel with APIs for resource management and scheduling across a cluster. What this means is that Mesos can be used to manage the tasks that you want to run on your cluster.
During this same period of time around 2010, Twitter was a few years old, and they were experiencing some of the growing pains that are common with mixed workloads and high traffic apps.
They were struggling with managing failures and maintenance windows, as well as with resource utilization. These problems aren’t too painful when you only have a few servers, however, when you have hundreds or thousands, any amount of pain is amplified.
Now, I don’t want to spoil the ending for you, but you’ve probably already figured it out. Twitter solved their problems by using Mesos.
Mesos allowed Twitter to treat all of their servers in the cluster as a single server with different scheduling mechanisms.
Services such as the web application could be scheduled to run as a monitored long running task. Batch jobs could be scheduled to utilize any extra CPU the cluster might have. And it was the job of Mesos to manage the available resources of the cluster and distribute the running tasks across the cluster. This meant that if service task failed due to a node in the cluster going down that it would be started somewhere else.
As an open source project, Mesos kept growing and evolving, eventually being used at companies such as AirBnB, Netflix, and Paypal. Apple uses Mesos to run Siri. Microsoft uses Mesos for Azure. And this diverse group of companies using Mesos for mission critical projects speak to its value.
In 2013 one of the grad students that created Mesos, who was then working at Twitter, formed the company Mesosphere with a few other people.
Mesosphere went on to create DC/OS which builds on top of Apache Mesos to provide an easy to install and operate, open source, data center operation system.
There are two versions of DC/OS, there’s the open source version and an enterprise version. While most all of the functionality exists in both versions, the enterprise version also includes some great out of the box security mechanisms as well as support from Mesosphere. I’ll include a link in the course description with a comparison of the two.
Throughout this course we’ll primarily focus on the open source version so that you can try all of this for yourself without requiring an enterprise license. However, if I do show anything that’s an enterprise only feature, I’ll try to call it out.
Alright, let’s wrap up here. In the next lesson we’ll look at DC/OS from 20,000 feet. See you in the next lesson!
Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.