hands-on lab

Building a Data Pipeline in DC/OS

This lab is currently under maintenance and unavailable. We are actively working to resolve this issue and we apologize for any inconvenience.

DC/OS was declared end of life October 31, 2021 and the content is no longer maintained

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.
Lab description

Notice: DC/OS has been declared end-of-life. The lab instructions have been brought up to the end-of-life release. Due to limitations in DC\OS, the final lab step now simulates the real-time analysis of tweets.

It is relatively simple to create powerful data pipelines in DC/OS. In this Lab, you will learn how to perform streaming data analytics by building a data pipeline in DC/OS that combines multiple services and a Twitter-like application. You will review many of the fundamental concepts in using DC/OS along the way, including installing packages, using Marathon-LB to load balance traffic, and working with virtual IPs.

Lab Objectives

Upon completion of this Lab you will be able to:

  • Install DC/OS packages with custom options using the DC/OS CLI
  • Deploy a data pipeline using Kafka, Cassandra, and a social networking app
  • Use the Zeppelin package and DC/OS Spark to perform basic streaming analytics on the data pipeline

Lab Prerequisites

You should be familiar with:

  • Basic and intermediate DC/OS concepts including Virtual IPs and Marathon-LB
  • Working at the command-line in Linux
  • AWS services to optionally understand the architecture of the pre-created DC/OS cluster

Lab Environment

Before completing the Lab instructions, the environment will look as follows:

After completing the Lab instructions, the environment should look similar to:



January 19th, 2022 - Updated lab instructions to reflect the latest (end of life) DC/OS experience

August 1st, 2021 - Resolved an issue preventing the DC/OS cluster from provisioning

October 2nd, 2020 - Replaced CoreOS virtual machines (no longer available in AWS) with CentOS

January 10th, 2019 - Added a validation Lab Step to check the work you perform in the Lab

About the author
Logan Rakai, opens in a new tab
Lead Content Developer - Labs
Learning paths

Logan has been involved in software development and research since 2007 and has been in the cloud since 2012. He is an AWS Certified DevOps Engineer - Professional, AWS Certified Solutions Architect - Professional, Microsoft Certified Azure Solutions Architect Expert, MCSE: Cloud Platform and Infrastructure, Google Cloud Certified Associate Cloud Engineer, Certified Kubernetes Security Specialist (CKS), Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), and Certified OpenStack Administrator (COA). He earned his Ph.D. studying design automation and enjoys all things tech.

LinkedIn, Twitter, GitHub

Covered topics
Lab steps
Logging In to the Amazon Web Services Console
Understanding the DC/OS Cluster Architecture
Connecting to the DC/OS Cluster NAT Instance using SSH
Installing the DC/OS CLI on Linux
Installing the Required Packages in the DC/OS Cluster
Running the Tweeter Application
Simulating Analyzing Tweets in Real-Time with Zeppelin
Validate AWS Lab